eal: pick IOVA as PA if IOMMU is not available

Message ID 5d8f83fb7dd574d83a044c6a01e2613798f256c3.1563986790.git.anatoly.burakov@intel.com (mailing list archive)
State Superseded, archived
Headers
Series eal: pick IOVA as PA if IOMMU is not available |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-Compile-Testing success Compile Testing PASS
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Performance-Testing success Performance Testing PASS

Commit Message

Anatoly Burakov July 24, 2019, 4:46 p.m. UTC
  When IOMMU is not available, /sys/kernel/iommu_groups will not be
populated. This is happening since at least 3.6 when VFIO support
was added. If the directory is empty, EAL should not pick IOVA as
VA as the default IOVA mode.

We also assume that VFIO equals IOMMU, so if VFIO support is not
compiled, we always assume IOMMU support is not available.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linux/eal/eal.c      | 11 ++++++--
 lib/librte_eal/linux/eal/eal_vfio.c | 39 +++++++++++++++++++++++++++++
 lib/librte_eal/linux/eal/eal_vfio.h |  2 ++
 3 files changed, 50 insertions(+), 2 deletions(-)
  

Comments

David Marchand July 25, 2019, 8:05 a.m. UTC | #1
On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
<anatoly.burakov@intel.com> wrote:
>
> When IOMMU is not available, /sys/kernel/iommu_groups will not be
> populated. This is happening since at least 3.6 when VFIO support
> was added. If the directory is empty, EAL should not pick IOVA as
> VA as the default IOVA mode.
>
> We also assume that VFIO equals IOMMU, so if VFIO support is not
> compiled, we always assume IOMMU support is not available.

Not sure I agree with this statement.
What about unknown (from eal pov) kernel drivers?


>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/librte_eal/linux/eal/eal.c      | 11 ++++++--
>  lib/librte_eal/linux/eal/eal_vfio.c | 39 +++++++++++++++++++++++++++++
>  lib/librte_eal/linux/eal/eal_vfio.h |  2 ++
>  3 files changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
> index 34db78753..584f97a96 100644
> --- a/lib/librte_eal/linux/eal/eal.c
> +++ b/lib/librte_eal/linux/eal/eal.c
> @@ -1061,8 +1061,15 @@ rte_eal_init(int argc, char **argv)
>                 enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
>
>                 if (iova_mode == RTE_IOVA_DC) {
> -                       iova_mode = RTE_IOVA_VA;
> -                       RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
> +                       /* if we have an IOMMU, pick IOVA as VA mode */
> +                       if (vfio_iommu_enabled()) {
> +                               iova_mode = RTE_IOVA_VA;
> +                               RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, selecting IOVA as VA mode.\n");
> +                       } else {
> +                               iova_mode = RTE_IOVA_PA;
> +                               RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, but IOMMU is not available.\n");
> +                               RTE_LOG(DEBUG, EAL, "Selecting IOVA as PA mode.\n");
> +                       }

Here, since the buses don't care, we can check for physical address
availability.
  
Anatoly Burakov July 25, 2019, 9:31 a.m. UTC | #2
On 25-Jul-19 9:05 AM, David Marchand wrote:
> On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
> <anatoly.burakov@intel.com> wrote:
>>
>> When IOMMU is not available, /sys/kernel/iommu_groups will not be
>> populated. This is happening since at least 3.6 when VFIO support
>> was added. If the directory is empty, EAL should not pick IOVA as
>> VA as the default IOVA mode.
>>
>> We also assume that VFIO equals IOMMU, so if VFIO support is not
>> compiled, we always assume IOMMU support is not available.
> 
> Not sure I agree with this statement.
> What about unknown (from eal pov) kernel drivers?

Are there any cases where we can use IOVA as VA mode without having VFIO 
compiled?

> 
> 
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>   lib/librte_eal/linux/eal/eal.c      | 11 ++++++--
>>   lib/librte_eal/linux/eal/eal_vfio.c | 39 +++++++++++++++++++++++++++++
>>   lib/librte_eal/linux/eal/eal_vfio.h |  2 ++
>>   3 files changed, 50 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
>> index 34db78753..584f97a96 100644
>> --- a/lib/librte_eal/linux/eal/eal.c
>> +++ b/lib/librte_eal/linux/eal/eal.c
>> @@ -1061,8 +1061,15 @@ rte_eal_init(int argc, char **argv)
>>                  enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
>>
>>                  if (iova_mode == RTE_IOVA_DC) {
>> -                       iova_mode = RTE_IOVA_VA;
>> -                       RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
>> +                       /* if we have an IOMMU, pick IOVA as VA mode */
>> +                       if (vfio_iommu_enabled()) {
>> +                               iova_mode = RTE_IOVA_VA;
>> +                               RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, selecting IOVA as VA mode.\n");
>> +                       } else {
>> +                               iova_mode = RTE_IOVA_PA;
>> +                               RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, but IOMMU is not available.\n");
>> +                               RTE_LOG(DEBUG, EAL, "Selecting IOVA as PA mode.\n");
>> +                       }
> 
> Here, since the buses don't care, we can check for physical address
> availability.
> 

Good point, if PA are not available, we can't use IOVA as PA mode.
  
David Marchand July 25, 2019, 9:35 a.m. UTC | #3
On Thu, Jul 25, 2019 at 11:31 AM Burakov, Anatoly
<anatoly.burakov@intel.com> wrote:
>
> On 25-Jul-19 9:05 AM, David Marchand wrote:
> > On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
> > <anatoly.burakov@intel.com> wrote:
> >>
> >> When IOMMU is not available, /sys/kernel/iommu_groups will not be
> >> populated. This is happening since at least 3.6 when VFIO support
> >> was added. If the directory is empty, EAL should not pick IOVA as
> >> VA as the default IOVA mode.
> >>
> >> We also assume that VFIO equals IOMMU, so if VFIO support is not
> >> compiled, we always assume IOMMU support is not available.
> >
> > Not sure I agree with this statement.
> > What about unknown (from eal pov) kernel drivers?
>
> Are there any cases where we can use IOVA as VA mode without having VFIO
> compiled?

If a pmd relies on a kernel driver we don't know in EAL.
This is not the case afaik, but I'd prefer we don't mix vfio and iommu.
  
Anatoly Burakov July 25, 2019, 9:38 a.m. UTC | #4
On 25-Jul-19 10:35 AM, David Marchand wrote:
> On Thu, Jul 25, 2019 at 11:31 AM Burakov, Anatoly
> <anatoly.burakov@intel.com> wrote:
>>
>> On 25-Jul-19 9:05 AM, David Marchand wrote:
>>> On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
>>> <anatoly.burakov@intel.com> wrote:
>>>>
>>>> When IOMMU is not available, /sys/kernel/iommu_groups will not be
>>>> populated. This is happening since at least 3.6 when VFIO support
>>>> was added. If the directory is empty, EAL should not pick IOVA as
>>>> VA as the default IOVA mode.
>>>>
>>>> We also assume that VFIO equals IOMMU, so if VFIO support is not
>>>> compiled, we always assume IOMMU support is not available.
>>>
>>> Not sure I agree with this statement.
>>> What about unknown (from eal pov) kernel drivers?
>>
>> Are there any cases where we can use IOVA as VA mode without having VFIO
>> compiled?
> 
> If a pmd relies on a kernel driver we don't know in EAL.
> This is not the case afaik, but I'd prefer we don't mix vfio and iommu.
> 

OK, i can drop that.
  
Anatoly Burakov July 25, 2019, 9:40 a.m. UTC | #5
On 25-Jul-19 10:38 AM, Burakov, Anatoly wrote:
> On 25-Jul-19 10:35 AM, David Marchand wrote:
>> On Thu, Jul 25, 2019 at 11:31 AM Burakov, Anatoly
>> <anatoly.burakov@intel.com> wrote:
>>>
>>> On 25-Jul-19 9:05 AM, David Marchand wrote:
>>>> On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
>>>> <anatoly.burakov@intel.com> wrote:
>>>>>
>>>>> When IOMMU is not available, /sys/kernel/iommu_groups will not be
>>>>> populated. This is happening since at least 3.6 when VFIO support
>>>>> was added. If the directory is empty, EAL should not pick IOVA as
>>>>> VA as the default IOVA mode.
>>>>>
>>>>> We also assume that VFIO equals IOMMU, so if VFIO support is not
>>>>> compiled, we always assume IOMMU support is not available.
>>>>
>>>> Not sure I agree with this statement.
>>>> What about unknown (from eal pov) kernel drivers?
>>>
>>> Are there any cases where we can use IOVA as VA mode without having VFIO
>>> compiled?
>>
>> If a pmd relies on a kernel driver we don't know in EAL.
>> This is not the case afaik, but I'd prefer we don't mix vfio and iommu.
>>
> 
> OK, i can drop that.
> 

By the way, would kernel report IOMMU groups in that case? As in, would 
/sys/kernel/iommu_groups be populated?
  
Thomas Monjalon July 25, 2019, 6:58 p.m. UTC | #6
Jeu 25 juil 2019, à 10:05, David Marchand a écrit :
> On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
> <anatoly.burakov@intel.com> wrote:
> >
> > When IOMMU is not available, /sys/kernel/iommu_groups will not be
> > populated. This is happening since at least 3.6 when VFIO support
> > was added. If the directory is empty, EAL should not pick IOVA as
> > VA as the default IOVA mode.
> >
> > We also assume that VFIO equals IOMMU, so if VFIO support is not
> > compiled, we always assume IOMMU support is not available.
> 
> Not sure I agree with this statement.
> What about unknown (from eal pov) kernel drivers?

Exactly, this is the case of Mellanox drivers.
  

Patch

diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
index 34db78753..584f97a96 100644
--- a/lib/librte_eal/linux/eal/eal.c
+++ b/lib/librte_eal/linux/eal/eal.c
@@ -1061,8 +1061,15 @@  rte_eal_init(int argc, char **argv)
 		enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
 
 		if (iova_mode == RTE_IOVA_DC) {
-			iova_mode = RTE_IOVA_VA;
-			RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
+			/* if we have an IOMMU, pick IOVA as VA mode */
+			if (vfio_iommu_enabled()) {
+				iova_mode = RTE_IOVA_VA;
+				RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, selecting IOVA as VA mode.\n");
+			} else {
+				iova_mode = RTE_IOVA_PA;
+				RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, but IOMMU is not available.\n");
+				RTE_LOG(DEBUG, EAL, "Selecting IOVA as PA mode.\n");
+			}
 		}
 #ifdef RTE_LIBRTE_KNI
 		/* Workaround for KNI which requires physical address to work */
diff --git a/lib/librte_eal/linux/eal/eal_vfio.c b/lib/librte_eal/linux/eal/eal_vfio.c
index 501c74f23..6d5ca7903 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.c
+++ b/lib/librte_eal/linux/eal/eal_vfio.c
@@ -2,6 +2,7 @@ 
  * Copyright(c) 2010-2018 Intel Corporation
  */
 
+#include <dirent.h>
 #include <inttypes.h>
 #include <string.h>
 #include <fcntl.h>
@@ -23,6 +24,8 @@ 
 
 #define VFIO_MEM_EVENT_CLB_NAME "vfio_mem_event_clb"
 
+#define VFIO_KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups"
+
 /* hot plug/unplug of VFIO groups may cause all DMA maps to be dropped. we can
  * recreate the mappings for DPDK segments, but we cannot do so for memory that
  * was registered by the user themselves, so we need to store the user mappings
@@ -2026,6 +2029,33 @@  rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, uint64_t iova,
 	return container_dma_unmap(vfio_cfg, vaddr, iova, len);
 }
 
+/*
+ * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the
+ * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel
+ * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore,
+ * checking if the path is empty will tell us if IOMMU is enabled.
+ */
+int
+vfio_iommu_enabled(void)
+{
+	DIR *dir = opendir(VFIO_KERNEL_IOMMU_GROUPS_PATH);
+	struct dirent *d;
+	int n = 0;
+
+	/* if directory doesn't exist, assume IOMMU is not enabled */
+	if (dir == NULL)
+		return 0;
+
+	while ((d = readdir(dir)) != NULL) {
+		/* skip dot and dot-dot */
+		if (++n > 2)
+			break;
+	}
+	closedir(dir);
+
+	return n > 2;
+}
+
 #else
 
 int
@@ -2146,4 +2176,13 @@  rte_vfio_container_dma_unmap(__rte_unused int container_fd,
 	return -1;
 }
 
+/*
+ * VFIO not compiled, so IOMMU unsupported.
+ */
+int
+vfio_iommu_enabled(void)
+{
+	return 0;
+}
+
 #endif /* VFIO_PRESENT */
diff --git a/lib/librte_eal/linux/eal/eal_vfio.h b/lib/librte_eal/linux/eal/eal_vfio.h
index cb2d35fb1..58c7a7309 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.h
+++ b/lib/librte_eal/linux/eal/eal_vfio.h
@@ -133,6 +133,8 @@  vfio_has_supported_extensions(int vfio_container_fd);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_iommu_enabled(void);
+
 #define EAL_VFIO_MP "eal_vfio_mp_sync"
 
 #define SOCKET_REQ_CONTAINER 0x100