> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Thursday, July 25, 2019 1:06 PM
> To: dev@dpdk.org
> Cc: Mcnamara, John <john.mcnamara@intel.com>; Kovacevic, Marko
> <marko.kovacevic@intel.com>; Stojaczyk, Dariusz
> <dariusz.stojaczyk@intel.com>; thomas@monjalon.net;
> david.marchand@redhat.com; jerinj@marvell.com
> Subject: [PATCH v3] eal: pick IOVA as PA if IOMMU is not available
>
> When IOMMU is not available, /sys/kernel/iommu_groups will not be
> populated. This is happening since at least 3.6 when VFIO support
> was added. If the directory is empty, EAL should not pick IOVA as
> VA as the default IOVA mode.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Thanks!
@@ -425,6 +425,9 @@ IOVA Mode Detection
IOVA Mode is selected by considering what the current usable Devices on the
system require and/or support.
+On FreeBSD, RTE_IOVA_VA mode is not supported, so RTE_IOVA_PA is always used.
+On Linux, the IOVA mode is detected based on a heuristic.
+
Below is the 2-step heuristic for this choice.
For the first step, EAL asks each bus its requirement in terms of IOVA mode
@@ -438,20 +441,26 @@ and decides on a preferred IOVA mode.
RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
check on Physical Addresses availability),
+If the buses have expressed no preference on which IOVA mode to pick, then a
+default is selected using the following logic:
+
+- if physical addresses are not available, RTE_IOVA_VA mode is used
+- if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used
+- otherwise, RTE_IOVA_PA mode is used
+
+In the case when the buses had disagreed on their preferred IOVA mode, part of
+the buses won't work because of this decision.
+
The second step checks if the preferred mode complies with the Physical
Addresses availability since those are only available to root user in recent
-kernels.
-
-- if the preferred mode is RTE_IOVA_PA but there is no access to Physical
- Addresses, then EAL init fails early, since later probing of the devices
- would fail anyway,
-- if the preferred mode is RTE_IOVA_DC then EAL selects the RTE_IOVA_VA mode.
- In the case when the buses had disagreed on the IOVA Mode at the first step,
- part of the buses won't work because of this decision.
+kernels. Namely, if the preferred mode is RTE_IOVA_PA but there is no access to
+Physical Addresses, then EAL init fails early, since later probing of the
+devices would fail anyway.
.. note::
- The RTE_IOVA_VA mode is selected as the default for the following reasons:
+ The RTE_IOVA_VA mode is preferred as the default in most cases for the
+ following reasons:
- All drivers are expected to work in RTE_IOVA_VA mode, irrespective of
physical address availability.
@@ -861,3 +861,29 @@ AVX-512 support disabled
**Driver/Module**:
ALL.
+
+
+Unsuitable IOVA mode may be picked as the default
+----------------------------------------------------------------
+**Description**
+ Not all kernel drivers and not all devices support all IOVA modes. EAL will
+ attempt to pick a reasonable default based on a number of factors, but there
+ may be cases where the default may be unsuitable (for example, hotplugging
+ devices using `igb_uio` driver while having picked IOVA as VA mode on EAL
+ initialization).
+
+**Implication**
+ Some devices (hotplugged or otherwise) may not work due to incompatible IOVA
+ mode being automatically picked by EAL.
+
+**Resolution/Workaround**:
+ It is possible to force EAL to pick a particular IOVA mode by using the
+ `--iova-mode` command-line parameter. If conflicting requirements are present
+ (such as one device requiring IOVA as PA and one requiring IOVA as VA mode),
+ there is no workaround.
+
+**Affected Environment/Platform**:
+ Linux.
+
+**Driver/Module**:
+ ALL.
@@ -56,6 +56,12 @@ New Features
Also, make sure to start the actual text at the margin.
=========================================================
+* **EAL will now pick IOVA as VA mode as the default in most cases.**
+
+ Previously, preferred default IOVA mode was selected to be IOVA as PA. The
+ behavior has now been changed to handle IOVA mode detection in a more complex
+ manner, and will default to IOVA as VA in most cases.
+
* **Added MCS lock.**
MCS lock provides scalability by spinning on a CPU/thread local variable
@@ -436,6 +442,16 @@ Known Issues
=========================================================
+ * **Unsuitable IOVA mode may be picked as the default**
+
+ Not all kernel drivers and not all devices support all IOVA modes. EAL will
+ attempt to pick a reasonable default based on a number of factors, but
+ there may be cases where the default may be unsuitable.
+
+ It is recommended to use the `--iova-mode` command-line parameter if the
+ default is not suitable.
+
+
Tested Platforms
----------------
@@ -1061,8 +1061,25 @@ rte_eal_init(int argc, char **argv)
enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
if (iova_mode == RTE_IOVA_DC) {
- iova_mode = RTE_IOVA_VA;
- RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
+ RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode.\n");
+
+ if (!phys_addrs) {
+ /* if we have no access to physical addresses,
+ * pick IOVA as VA mode.
+ */
+ iova_mode = RTE_IOVA_VA;
+ RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
+ } else if (vfio_iommu_enabled()) {
+ /* we have an IOMMU, pick IOVA as VA mode */
+ iova_mode = RTE_IOVA_VA;
+ RTE_LOG(DEBUG, EAL, "IOMMU is available, selecting IOVA as VA mode.\n");
+ } else {
+ /* physical addresses available, and no IOMMU
+ * found, so pick IOVA as PA.
+ */
+ iova_mode = RTE_IOVA_PA;
+ RTE_LOG(DEBUG, EAL, "IOMMU is not available, selecting IOVA as PA mode.\n");
+ }
}
#ifdef RTE_LIBRTE_KNI
/* Workaround for KNI which requires physical address to work */
@@ -2,6 +2,7 @@
* Copyright(c) 2010-2018 Intel Corporation
*/
+#include <dirent.h>
#include <inttypes.h>
#include <string.h>
#include <fcntl.h>
@@ -19,6 +20,8 @@
#include "eal_vfio.h"
#include "eal_private.h"
+#define VFIO_KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups"
+
#ifdef VFIO_PRESENT
#define VFIO_MEM_EVENT_CLB_NAME "vfio_mem_event_clb"
@@ -2147,3 +2150,30 @@ rte_vfio_container_dma_unmap(__rte_unused int container_fd,
}
#endif /* VFIO_PRESENT */
+
+/*
+ * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the
+ * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel
+ * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore,
+ * checking if the path is empty will tell us if IOMMU is enabled.
+ */
+int
+vfio_iommu_enabled(void)
+{
+ DIR *dir = opendir(VFIO_KERNEL_IOMMU_GROUPS_PATH);
+ struct dirent *d;
+ int n = 0;
+
+ /* if directory doesn't exist, assume IOMMU is not enabled */
+ if (dir == NULL)
+ return 0;
+
+ while ((d = readdir(dir)) != NULL) {
+ /* skip dot and dot-dot */
+ if (++n > 2)
+ break;
+ }
+ closedir(dir);
+
+ return n > 2;
+}
@@ -133,6 +133,8 @@ vfio_has_supported_extensions(int vfio_container_fd);
int vfio_mp_sync_setup(void);
+int vfio_iommu_enabled(void);
+
#define EAL_VFIO_MP "eal_vfio_mp_sync"
#define SOCKET_REQ_CONTAINER 0x100