eal: pick IOVA as PA if IOMMU is not available
Checks
Commit Message
When IOMMU is not available, /sys/kernel/iommu_groups will not be
populated. This is happening since at least 3.6 when VFIO support
was added. If the directory is empty, EAL should not pick IOVA as
VA as the default IOVA mode.
We also assume that VFIO equals IOMMU, so if VFIO support is not
compiled, we always assume IOMMU support is not available.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/linux/eal/eal.c | 11 ++++++--
lib/librte_eal/linux/eal/eal_vfio.c | 39 +++++++++++++++++++++++++++++
lib/librte_eal/linux/eal/eal_vfio.h | 2 ++
3 files changed, 50 insertions(+), 2 deletions(-)
Comments
On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
<anatoly.burakov@intel.com> wrote:
>
> When IOMMU is not available, /sys/kernel/iommu_groups will not be
> populated. This is happening since at least 3.6 when VFIO support
> was added. If the directory is empty, EAL should not pick IOVA as
> VA as the default IOVA mode.
>
> We also assume that VFIO equals IOMMU, so if VFIO support is not
> compiled, we always assume IOMMU support is not available.
Not sure I agree with this statement.
What about unknown (from eal pov) kernel drivers?
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> lib/librte_eal/linux/eal/eal.c | 11 ++++++--
> lib/librte_eal/linux/eal/eal_vfio.c | 39 +++++++++++++++++++++++++++++
> lib/librte_eal/linux/eal/eal_vfio.h | 2 ++
> 3 files changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
> index 34db78753..584f97a96 100644
> --- a/lib/librte_eal/linux/eal/eal.c
> +++ b/lib/librte_eal/linux/eal/eal.c
> @@ -1061,8 +1061,15 @@ rte_eal_init(int argc, char **argv)
> enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
>
> if (iova_mode == RTE_IOVA_DC) {
> - iova_mode = RTE_IOVA_VA;
> - RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
> + /* if we have an IOMMU, pick IOVA as VA mode */
> + if (vfio_iommu_enabled()) {
> + iova_mode = RTE_IOVA_VA;
> + RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, selecting IOVA as VA mode.\n");
> + } else {
> + iova_mode = RTE_IOVA_PA;
> + RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, but IOMMU is not available.\n");
> + RTE_LOG(DEBUG, EAL, "Selecting IOVA as PA mode.\n");
> + }
Here, since the buses don't care, we can check for physical address
availability.
On 25-Jul-19 9:05 AM, David Marchand wrote:
> On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
> <anatoly.burakov@intel.com> wrote:
>>
>> When IOMMU is not available, /sys/kernel/iommu_groups will not be
>> populated. This is happening since at least 3.6 when VFIO support
>> was added. If the directory is empty, EAL should not pick IOVA as
>> VA as the default IOVA mode.
>>
>> We also assume that VFIO equals IOMMU, so if VFIO support is not
>> compiled, we always assume IOMMU support is not available.
>
> Not sure I agree with this statement.
> What about unknown (from eal pov) kernel drivers?
Are there any cases where we can use IOVA as VA mode without having VFIO
compiled?
>
>
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>> lib/librte_eal/linux/eal/eal.c | 11 ++++++--
>> lib/librte_eal/linux/eal/eal_vfio.c | 39 +++++++++++++++++++++++++++++
>> lib/librte_eal/linux/eal/eal_vfio.h | 2 ++
>> 3 files changed, 50 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
>> index 34db78753..584f97a96 100644
>> --- a/lib/librte_eal/linux/eal/eal.c
>> +++ b/lib/librte_eal/linux/eal/eal.c
>> @@ -1061,8 +1061,15 @@ rte_eal_init(int argc, char **argv)
>> enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
>>
>> if (iova_mode == RTE_IOVA_DC) {
>> - iova_mode = RTE_IOVA_VA;
>> - RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
>> + /* if we have an IOMMU, pick IOVA as VA mode */
>> + if (vfio_iommu_enabled()) {
>> + iova_mode = RTE_IOVA_VA;
>> + RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, selecting IOVA as VA mode.\n");
>> + } else {
>> + iova_mode = RTE_IOVA_PA;
>> + RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, but IOMMU is not available.\n");
>> + RTE_LOG(DEBUG, EAL, "Selecting IOVA as PA mode.\n");
>> + }
>
> Here, since the buses don't care, we can check for physical address
> availability.
>
Good point, if PA are not available, we can't use IOVA as PA mode.
On Thu, Jul 25, 2019 at 11:31 AM Burakov, Anatoly
<anatoly.burakov@intel.com> wrote:
>
> On 25-Jul-19 9:05 AM, David Marchand wrote:
> > On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
> > <anatoly.burakov@intel.com> wrote:
> >>
> >> When IOMMU is not available, /sys/kernel/iommu_groups will not be
> >> populated. This is happening since at least 3.6 when VFIO support
> >> was added. If the directory is empty, EAL should not pick IOVA as
> >> VA as the default IOVA mode.
> >>
> >> We also assume that VFIO equals IOMMU, so if VFIO support is not
> >> compiled, we always assume IOMMU support is not available.
> >
> > Not sure I agree with this statement.
> > What about unknown (from eal pov) kernel drivers?
>
> Are there any cases where we can use IOVA as VA mode without having VFIO
> compiled?
If a pmd relies on a kernel driver we don't know in EAL.
This is not the case afaik, but I'd prefer we don't mix vfio and iommu.
On 25-Jul-19 10:35 AM, David Marchand wrote:
> On Thu, Jul 25, 2019 at 11:31 AM Burakov, Anatoly
> <anatoly.burakov@intel.com> wrote:
>>
>> On 25-Jul-19 9:05 AM, David Marchand wrote:
>>> On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
>>> <anatoly.burakov@intel.com> wrote:
>>>>
>>>> When IOMMU is not available, /sys/kernel/iommu_groups will not be
>>>> populated. This is happening since at least 3.6 when VFIO support
>>>> was added. If the directory is empty, EAL should not pick IOVA as
>>>> VA as the default IOVA mode.
>>>>
>>>> We also assume that VFIO equals IOMMU, so if VFIO support is not
>>>> compiled, we always assume IOMMU support is not available.
>>>
>>> Not sure I agree with this statement.
>>> What about unknown (from eal pov) kernel drivers?
>>
>> Are there any cases where we can use IOVA as VA mode without having VFIO
>> compiled?
>
> If a pmd relies on a kernel driver we don't know in EAL.
> This is not the case afaik, but I'd prefer we don't mix vfio and iommu.
>
OK, i can drop that.
On 25-Jul-19 10:38 AM, Burakov, Anatoly wrote:
> On 25-Jul-19 10:35 AM, David Marchand wrote:
>> On Thu, Jul 25, 2019 at 11:31 AM Burakov, Anatoly
>> <anatoly.burakov@intel.com> wrote:
>>>
>>> On 25-Jul-19 9:05 AM, David Marchand wrote:
>>>> On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
>>>> <anatoly.burakov@intel.com> wrote:
>>>>>
>>>>> When IOMMU is not available, /sys/kernel/iommu_groups will not be
>>>>> populated. This is happening since at least 3.6 when VFIO support
>>>>> was added. If the directory is empty, EAL should not pick IOVA as
>>>>> VA as the default IOVA mode.
>>>>>
>>>>> We also assume that VFIO equals IOMMU, so if VFIO support is not
>>>>> compiled, we always assume IOMMU support is not available.
>>>>
>>>> Not sure I agree with this statement.
>>>> What about unknown (from eal pov) kernel drivers?
>>>
>>> Are there any cases where we can use IOVA as VA mode without having VFIO
>>> compiled?
>>
>> If a pmd relies on a kernel driver we don't know in EAL.
>> This is not the case afaik, but I'd prefer we don't mix vfio and iommu.
>>
>
> OK, i can drop that.
>
By the way, would kernel report IOMMU groups in that case? As in, would
/sys/kernel/iommu_groups be populated?
Jeu 25 juil 2019, à 10:05, David Marchand a écrit :
> On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
> <anatoly.burakov@intel.com> wrote:
> >
> > When IOMMU is not available, /sys/kernel/iommu_groups will not be
> > populated. This is happening since at least 3.6 when VFIO support
> > was added. If the directory is empty, EAL should not pick IOVA as
> > VA as the default IOVA mode.
> >
> > We also assume that VFIO equals IOMMU, so if VFIO support is not
> > compiled, we always assume IOMMU support is not available.
>
> Not sure I agree with this statement.
> What about unknown (from eal pov) kernel drivers?
Exactly, this is the case of Mellanox drivers.
@@ -1061,8 +1061,15 @@ rte_eal_init(int argc, char **argv)
enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
if (iova_mode == RTE_IOVA_DC) {
- iova_mode = RTE_IOVA_VA;
- RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
+ /* if we have an IOMMU, pick IOVA as VA mode */
+ if (vfio_iommu_enabled()) {
+ iova_mode = RTE_IOVA_VA;
+ RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, selecting IOVA as VA mode.\n");
+ } else {
+ iova_mode = RTE_IOVA_PA;
+ RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, but IOMMU is not available.\n");
+ RTE_LOG(DEBUG, EAL, "Selecting IOVA as PA mode.\n");
+ }
}
#ifdef RTE_LIBRTE_KNI
/* Workaround for KNI which requires physical address to work */
@@ -2,6 +2,7 @@
* Copyright(c) 2010-2018 Intel Corporation
*/
+#include <dirent.h>
#include <inttypes.h>
#include <string.h>
#include <fcntl.h>
@@ -23,6 +24,8 @@
#define VFIO_MEM_EVENT_CLB_NAME "vfio_mem_event_clb"
+#define VFIO_KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups"
+
/* hot plug/unplug of VFIO groups may cause all DMA maps to be dropped. we can
* recreate the mappings for DPDK segments, but we cannot do so for memory that
* was registered by the user themselves, so we need to store the user mappings
@@ -2026,6 +2029,33 @@ rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, uint64_t iova,
return container_dma_unmap(vfio_cfg, vaddr, iova, len);
}
+/*
+ * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the
+ * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel
+ * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore,
+ * checking if the path is empty will tell us if IOMMU is enabled.
+ */
+int
+vfio_iommu_enabled(void)
+{
+ DIR *dir = opendir(VFIO_KERNEL_IOMMU_GROUPS_PATH);
+ struct dirent *d;
+ int n = 0;
+
+ /* if directory doesn't exist, assume IOMMU is not enabled */
+ if (dir == NULL)
+ return 0;
+
+ while ((d = readdir(dir)) != NULL) {
+ /* skip dot and dot-dot */
+ if (++n > 2)
+ break;
+ }
+ closedir(dir);
+
+ return n > 2;
+}
+
#else
int
@@ -2146,4 +2176,13 @@ rte_vfio_container_dma_unmap(__rte_unused int container_fd,
return -1;
}
+/*
+ * VFIO not compiled, so IOMMU unsupported.
+ */
+int
+vfio_iommu_enabled(void)
+{
+ return 0;
+}
+
#endif /* VFIO_PRESENT */
@@ -133,6 +133,8 @@ vfio_has_supported_extensions(int vfio_container_fd);
int vfio_mp_sync_setup(void);
+int vfio_iommu_enabled(void);
+
#define EAL_VFIO_MP "eal_vfio_mp_sync"
#define SOCKET_REQ_CONTAINER 0x100