[v5] eal: pick IOVA as PA if IOMMU is not available

Message ID c48d88560da0bfaab44e154f8b994f4b63d44306.1564408347.git.anatoly.burakov@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series [v5] eal: pick IOVA as PA if IOMMU is not available |

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/iol-Compile-Testing success Compile Testing PASS
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/checkpatch success coding style OK

Commit Message

Anatoly Burakov July 29, 2019, 1:52 p.m. UTC
  When IOMMU is not available, /sys/kernel/iommu_groups will not be
populated. This is happening since at least 3.6 when VFIO support
was added. If the directory is empty, EAL should not pick IOVA as
VA as the default IOVA mode.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
---

Notes:
    v5:
    - Clarify docs on FreeBSD
    - Move IOMMU detection code out of VFIO sources
    
    v4:
    - Fix indentation in release notes' known issues
    
    v3:
    - Add documentation changes
    - Fix a typo pointed out by checkpatch
    
    v2:
    - Decouple IOMMU from VFIO
    - Add a check for physical addresses availability

 .../prog_guide/env_abstraction_layer.rst      | 27 ++++++----
 doc/guides/rel_notes/known_issues.rst         | 26 ++++++++++
 doc/guides/rel_notes/release_19_08.rst        | 16 ++++++
 lib/librte_eal/linux/eal/eal.c                | 50 ++++++++++++++++++-
 4 files changed, 107 insertions(+), 12 deletions(-)
  

Comments

David Marchand July 30, 2019, 7:21 a.m. UTC | #1
On Mon, Jul 29, 2019 at 5:03 PM Anatoly Burakov
<anatoly.burakov@intel.com> wrote:
>
> When IOMMU is not available, /sys/kernel/iommu_groups will not be
> populated. This is happening since at least 3.6 when VFIO support
> was added. If the directory is empty, EAL should not pick IOVA as
> VA as the default IOVA mode.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> Tested-by: Jerin Jacob <jerinj@marvell.com>
> Reviewed-by: Jerin Jacob <jerinj@marvell.com>
> ---
>
> Notes:
>     v5:
>     - Clarify docs on FreeBSD
>     - Move IOMMU detection code out of VFIO sources
>
>     v4:
>     - Fix indentation in release notes' known issues
>
>     v3:
>     - Add documentation changes
>     - Fix a typo pointed out by checkpatch
>
>     v2:
>     - Decouple IOMMU from VFIO
>     - Add a check for physical addresses availability
>
>  .../prog_guide/env_abstraction_layer.rst      | 27 ++++++----
>  doc/guides/rel_notes/known_issues.rst         | 26 ++++++++++
>  doc/guides/rel_notes/release_19_08.rst        | 16 ++++++
>  lib/librte_eal/linux/eal/eal.c                | 50 ++++++++++++++++++-
>  4 files changed, 107 insertions(+), 12 deletions(-)
>
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 1487ea550..94f30fd5d 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -425,7 +425,8 @@ IOVA Mode Detection
>  IOVA Mode is selected by considering what the current usable Devices on the
>  system require and/or support.
>
> -Below is the 2-step heuristic for this choice.
> +On FreeBSD, RTE_IOVA_PA is always the default. On Linux, the IOVA mode is
> +detected based on a 2-step heuristic detailed below.
>
>  For the first step, EAL asks each bus its requirement in terms of IOVA mode
>  and decides on a preferred IOVA mode.
> @@ -438,20 +439,26 @@ and decides on a preferred IOVA mode.
>    RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
>    check on Physical Addresses availability),
>
> +If the buses have expressed no preference on which IOVA mode to pick, then a
> +default is selected using the following logic:
> +
> +- if physical addresses are not available, RTE_IOVA_VA mode is used
> +- if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used
> +- otherwise, RTE_IOVA_PA mode is used
> +
> +In the case when the buses had disagreed on their preferred IOVA mode, part of
> +the buses won't work because of this decision.
> +
>  The second step checks if the preferred mode complies with the Physical
>  Addresses availability since those are only available to root user in recent
> -kernels.
> -
> -- if the preferred mode is RTE_IOVA_PA but there is no access to Physical
> -  Addresses, then EAL init fails early, since later probing of the devices
> -  would fail anyway,
> -- if the preferred mode is RTE_IOVA_DC then EAL selects the RTE_IOVA_VA mode.
> -  In the case when the buses had disagreed on the IOVA Mode at the first step,
> -  part of the buses won't work because of this decision.
> +kernels. Namely, if the preferred mode is RTE_IOVA_PA but there is no access to
> +Physical Addresses, then EAL init fails early, since later probing of the
> +devices would fail anyway.
>
>  .. note::
>
> -    The RTE_IOVA_VA mode is selected as the default for the following reasons:
> +    The RTE_IOVA_VA mode is preferred as the default in most cases for the
> +    following reasons:
>
>      - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of
>        physical address availability.
> diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst
> index 276327c15..0b50c8306 100644
> --- a/doc/guides/rel_notes/known_issues.rst
> +++ b/doc/guides/rel_notes/known_issues.rst
> @@ -861,3 +861,29 @@ AVX-512 support disabled
>
>  **Driver/Module**:
>      ALL.
> +
> +
> +Unsuitable IOVA mode may be picked as the default
> +----------------------------------------------------------------
> +**Description**
> +   Not all kernel drivers and not all devices support all IOVA modes. EAL will
> +   attempt to pick a reasonable default based on a number of factors, but there
> +   may be cases where the default may be unsuitable (for example, hotplugging
> +   devices using `igb_uio` driver while having picked IOVA as VA mode on EAL
> +   initialization).
> +
> +**Implication**
> +   Some devices (hotplugged or otherwise) may not work due to incompatible IOVA
> +   mode being automatically picked by EAL.
> +
> +**Resolution/Workaround**:
> +   It is possible to force EAL to pick a particular IOVA mode by using the
> +   `--iova-mode` command-line parameter. If conflicting requirements are present
> +   (such as one device requiring IOVA as PA and one requiring IOVA as VA mode),
> +   there is no workaround.
> +
> +**Affected Environment/Platform**:
> +   Linux.
> +
> +**Driver/Module**:
> +   ALL.
> diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
> index c9bd3ce18..b399ca536 100644
> --- a/doc/guides/rel_notes/release_19_08.rst
> +++ b/doc/guides/rel_notes/release_19_08.rst
> @@ -56,6 +56,12 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =========================================================
>
> +* **EAL will now pick IOVA as VA mode as the default in most cases.**
> +
> +  Previously, preferred default IOVA mode was selected to be IOVA as PA. The
> +  behavior has now been changed to handle IOVA mode detection in a more complex
> +  manner, and will default to IOVA as VA in most cases.
> +
>  * **Added MCS lock.**
>
>    MCS lock provides scalability by spinning on a CPU/thread local variable
> @@ -436,6 +442,16 @@ Known Issues
>     =========================================================
>
>
> +* **Unsuitable IOVA mode may be picked as the default**
> +
> +  Not all kernel drivers and not all devices support all IOVA modes. EAL will
> +  attempt to pick a reasonable default based on a number of factors, but
> +  there may be cases where the default may be unsuitable.
> +
> +  It is recommended to use the `--iova-mode` command-line parameter if the
> +  default is not suitable.
> +
> +
>  Tested Platforms
>  ----------------
>
> diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
> index 34db78753..6ed602c90 100644
> --- a/lib/librte_eal/linux/eal/eal.c
> +++ b/lib/librte_eal/linux/eal/eal.c
> @@ -66,6 +66,8 @@
>
>  #define SOCKET_MEM_STRLEN (RTE_MAX_NUMA_NODES * 10)
>
> +#define KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups"
> +
>  /* Allow the application to print its usage message too if set */
>  static rte_usage_hook_t        rte_application_usage_hook = NULL;
>
> @@ -951,6 +953,33 @@ static void rte_eal_init_alert(const char *msg)
>         RTE_LOG(ERR, EAL, "%s\n", msg);
>  }
>
> +/*
> + * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the
> + * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel
> + * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore,
> + * checking if the path is empty will tell us if IOMMU is enabled.
> + */
> +static bool
> +is_iommu_enabled(void)
> +{
> +       DIR *dir = opendir(KERNEL_IOMMU_GROUPS_PATH);
> +       struct dirent *d;
> +       int n = 0;
> +
> +       /* if directory doesn't exist, assume IOMMU is not enabled */
> +       if (dir == NULL)
> +               return false;
> +
> +       while ((d = readdir(dir)) != NULL) {
> +               /* skip dot and dot-dot */
> +               if (++n > 2)
> +                       break;
> +       }
> +       closedir(dir);
> +
> +       return n > 2;
> +}
> +
>  /* Launch threads, called at application init(). */
>  int
>  rte_eal_init(int argc, char **argv)
> @@ -1061,8 +1090,25 @@ rte_eal_init(int argc, char **argv)
>                 enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
>
>                 if (iova_mode == RTE_IOVA_DC) {
> -                       iova_mode = RTE_IOVA_VA;
> -                       RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
> +                       RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode.\n");
> +
> +                       if (!phys_addrs) {
> +                               /* if we have no access to physical addresses,
> +                                * pick IOVA as VA mode.
> +                                */
> +                               iova_mode = RTE_IOVA_VA;
> +                               RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
> +                       } else if (is_iommu_enabled()) {
> +                               /* we have an IOMMU, pick IOVA as VA mode */
> +                               iova_mode = RTE_IOVA_VA;
> +                               RTE_LOG(DEBUG, EAL, "IOMMU is available, selecting IOVA as VA mode.\n");
> +                       } else {
> +                               /* physical addresses available, and no IOMMU
> +                                * found, so pick IOVA as PA.
> +                                */
> +                               iova_mode = RTE_IOVA_PA;
> +                               RTE_LOG(DEBUG, EAL, "IOMMU is not available, selecting IOVA as PA mode.\n");
> +                       }
>                 }
>  #ifdef RTE_LIBRTE_KNI
>                 /* Workaround for KNI which requires physical address to work */
> --
> 2.17.1

Reviewed-by: David Marchand <david.marchand@redhat.com>
  
Thomas Monjalon July 30, 2019, 8:10 a.m. UTC | #2
30/07/2019 09:21, David Marchand:
> On Mon, Jul 29, 2019 at 5:03 PM Anatoly Burakov
> <anatoly.burakov@intel.com> wrote:
> >
> > When IOMMU is not available, /sys/kernel/iommu_groups will not be
> > populated. This is happening since at least 3.6 when VFIO support
> > was added. If the directory is empty, EAL should not pick IOVA as
> > VA as the default IOVA mode.
> >
> > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> > Tested-by: Jerin Jacob <jerinj@marvell.com>
> > Reviewed-by: Jerin Jacob <jerinj@marvell.com>
> 
> Reviewed-by: David Marchand <david.marchand@redhat.com>

Applied, with few spaces and minor changes, thanks.
  

Patch

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 1487ea550..94f30fd5d 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -425,7 +425,8 @@  IOVA Mode Detection
 IOVA Mode is selected by considering what the current usable Devices on the
 system require and/or support.
 
-Below is the 2-step heuristic for this choice.
+On FreeBSD, RTE_IOVA_PA is always the default. On Linux, the IOVA mode is
+detected based on a 2-step heuristic detailed below.
 
 For the first step, EAL asks each bus its requirement in terms of IOVA mode
 and decides on a preferred IOVA mode.
@@ -438,20 +439,26 @@  and decides on a preferred IOVA mode.
   RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
   check on Physical Addresses availability),
 
+If the buses have expressed no preference on which IOVA mode to pick, then a
+default is selected using the following logic:
+
+- if physical addresses are not available, RTE_IOVA_VA mode is used
+- if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used
+- otherwise, RTE_IOVA_PA mode is used
+
+In the case when the buses had disagreed on their preferred IOVA mode, part of
+the buses won't work because of this decision.
+
 The second step checks if the preferred mode complies with the Physical
 Addresses availability since those are only available to root user in recent
-kernels.
-
-- if the preferred mode is RTE_IOVA_PA but there is no access to Physical
-  Addresses, then EAL init fails early, since later probing of the devices
-  would fail anyway,
-- if the preferred mode is RTE_IOVA_DC then EAL selects the RTE_IOVA_VA mode.
-  In the case when the buses had disagreed on the IOVA Mode at the first step,
-  part of the buses won't work because of this decision.
+kernels. Namely, if the preferred mode is RTE_IOVA_PA but there is no access to
+Physical Addresses, then EAL init fails early, since later probing of the
+devices would fail anyway.
 
 .. note::
 
-    The RTE_IOVA_VA mode is selected as the default for the following reasons:
+    The RTE_IOVA_VA mode is preferred as the default in most cases for the
+    following reasons:
 
     - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of
       physical address availability.
diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst
index 276327c15..0b50c8306 100644
--- a/doc/guides/rel_notes/known_issues.rst
+++ b/doc/guides/rel_notes/known_issues.rst
@@ -861,3 +861,29 @@  AVX-512 support disabled
 
 **Driver/Module**:
     ALL.
+
+
+Unsuitable IOVA mode may be picked as the default
+----------------------------------------------------------------
+**Description**
+   Not all kernel drivers and not all devices support all IOVA modes. EAL will
+   attempt to pick a reasonable default based on a number of factors, but there
+   may be cases where the default may be unsuitable (for example, hotplugging
+   devices using `igb_uio` driver while having picked IOVA as VA mode on EAL
+   initialization).
+
+**Implication**
+   Some devices (hotplugged or otherwise) may not work due to incompatible IOVA
+   mode being automatically picked by EAL.
+
+**Resolution/Workaround**:
+   It is possible to force EAL to pick a particular IOVA mode by using the
+   `--iova-mode` command-line parameter. If conflicting requirements are present
+   (such as one device requiring IOVA as PA and one requiring IOVA as VA mode),
+   there is no workaround.
+
+**Affected Environment/Platform**:
+   Linux.
+
+**Driver/Module**:
+   ALL.
diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index c9bd3ce18..b399ca536 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -56,6 +56,12 @@  New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **EAL will now pick IOVA as VA mode as the default in most cases.**
+
+  Previously, preferred default IOVA mode was selected to be IOVA as PA. The
+  behavior has now been changed to handle IOVA mode detection in a more complex
+  manner, and will default to IOVA as VA in most cases.
+
 * **Added MCS lock.**
 
   MCS lock provides scalability by spinning on a CPU/thread local variable
@@ -436,6 +442,16 @@  Known Issues
    =========================================================
 
 
+* **Unsuitable IOVA mode may be picked as the default**
+
+  Not all kernel drivers and not all devices support all IOVA modes. EAL will
+  attempt to pick a reasonable default based on a number of factors, but
+  there may be cases where the default may be unsuitable.
+
+  It is recommended to use the `--iova-mode` command-line parameter if the
+  default is not suitable.
+
+
 Tested Platforms
 ----------------
 
diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
index 34db78753..6ed602c90 100644
--- a/lib/librte_eal/linux/eal/eal.c
+++ b/lib/librte_eal/linux/eal/eal.c
@@ -66,6 +66,8 @@ 
 
 #define SOCKET_MEM_STRLEN (RTE_MAX_NUMA_NODES * 10)
 
+#define KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups"
+
 /* Allow the application to print its usage message too if set */
 static rte_usage_hook_t	rte_application_usage_hook = NULL;
 
@@ -951,6 +953,33 @@  static void rte_eal_init_alert(const char *msg)
 	RTE_LOG(ERR, EAL, "%s\n", msg);
 }
 
+/*
+ * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the
+ * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel
+ * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore,
+ * checking if the path is empty will tell us if IOMMU is enabled.
+ */
+static bool
+is_iommu_enabled(void)
+{
+	DIR *dir = opendir(KERNEL_IOMMU_GROUPS_PATH);
+	struct dirent *d;
+	int n = 0;
+
+	/* if directory doesn't exist, assume IOMMU is not enabled */
+	if (dir == NULL)
+		return false;
+
+	while ((d = readdir(dir)) != NULL) {
+		/* skip dot and dot-dot */
+		if (++n > 2)
+			break;
+	}
+	closedir(dir);
+
+	return n > 2;
+}
+
 /* Launch threads, called at application init(). */
 int
 rte_eal_init(int argc, char **argv)
@@ -1061,8 +1090,25 @@  rte_eal_init(int argc, char **argv)
 		enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
 
 		if (iova_mode == RTE_IOVA_DC) {
-			iova_mode = RTE_IOVA_VA;
-			RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
+			RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode.\n");
+
+			if (!phys_addrs) {
+				/* if we have no access to physical addresses,
+				 * pick IOVA as VA mode.
+				 */
+				iova_mode = RTE_IOVA_VA;
+				RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
+			} else if (is_iommu_enabled()) {
+				/* we have an IOMMU, pick IOVA as VA mode */
+				iova_mode = RTE_IOVA_VA;
+				RTE_LOG(DEBUG, EAL, "IOMMU is available, selecting IOVA as VA mode.\n");
+			} else {
+				/* physical addresses available, and no IOMMU
+				 * found, so pick IOVA as PA.
+				 */
+				iova_mode = RTE_IOVA_PA;
+				RTE_LOG(DEBUG, EAL, "IOMMU is not available, selecting IOVA as PA mode.\n");
+			}
 		}
 #ifdef RTE_LIBRTE_KNI
 		/* Workaround for KNI which requires physical address to work */