[dpdk-dev,v9,3/9] linuxapp/eal_pci: get iommu class

Message ID 20170920112356.17629-4-santosh.shukla@caviumnetworks.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Santosh Shukla Sept. 20, 2017, 11:23 a.m. UTC
  Get iommu class of PCI device on the bus and returns preferred iova
mapping mode for that bus.

Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
Flag used when driver needs to operate in iova=va mode.

Algorithm for iova scheme selection for PCI bus:
0. If no device bound then return with RTE_IOVA_DC mapping mode,
else goto 1).
1. Look for device attached to vfio kdrv and has .drv_flag set
to RTE_PCI_DRV_IOVA_AS_VA.
2. Look for any device attached to UIO class of driver.
3. Check for vfio-noiommu mode enabled.

If 2) & 3) is false and 1) is true then select
mapping scheme as RTE_IOVA_VA. Otherwise use default
mapping scheme (RTE_IOVA_PA).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
v7 --> v8:
- Replaced 0/1 with false/true boolean value (Suggested by Anatoly)

v6 --> v7:
- squashed v6 series patch no [01/12] & [05/12]..
    i.e.. moved RTE_PCI_DRV_IOVA_AS_VA flag into this patch. (Aaron comment).

 lib/librte_eal/common/include/rte_pci.h |  2 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 89 ++++++++++++++++++++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_vfio.c  | 19 +++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.h  |  4 ++
 4 files changed, 113 insertions(+), 1 deletion(-)
  

Comments

Thomas Monjalon Oct. 6, 2017, 12:17 a.m. UTC | #1
20/09/2017 13:23, Santosh Shukla:
> +/** Device driver supports iova as va */
> +#define RTE_PCI_DRV_IOVA_AS_VA 0X0040

This flag name is surprizing and the comment does not help.
For the comment:
	"Device driver supports I/O virtual addressing" ?
For the flag:
	RTE_PCI_DRV_IOVA ?

[...]
>  /*
> - * Get iommu class of pci devices on the bus.

This line has been added in previous patch.
Please fix it earlier.

[...]
> +/*
> + * Any one of the device has iova as va
> + */
> +static inline int
> +pci_device_has_iova_va(void)

The name of this function does not suggest that it scans
every devices.

> +{
> +	struct rte_pci_device *dev = NULL;
> +	struct rte_pci_driver *drv = NULL;
> +
> +	FOREACH_DRIVER_ON_PCIBUS(drv) {
> +		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
> +			FOREACH_DEVICE_ON_PCIBUS(dev) {
> +				if (dev->kdrv == RTE_KDRV_VFIO &&
> +				    rte_pci_match(drv, dev))
> +					return 1;
> +			}

This is the reason of exporting the match function?
(note: match() is bus driver function, so it should not be exported)
Just because you get every devices without driver filtering?
There should be a better solution.
Please try to compare drv with dev->driver.
  
Santosh Shukla Oct. 6, 2017, 3:22 a.m. UTC | #2
On Friday 06 October 2017 05:47 AM, Thomas Monjalon wrote:
> 20/09/2017 13:23, Santosh Shukla:
>> +/** Device driver supports iova as va */
>> +#define RTE_PCI_DRV_IOVA_AS_VA 0X0040
> This flag name is surprizing and the comment does not help.
> For the comment:
> 	"Device driver supports I/O virtual addressing" ?
> For the flag:
> 	RTE_PCI_DRV_IOVA ?

Read [1].

V9 series went through evolution as a result of thorough review process.
That name kept like above is - "Not for FUN", its for reason and its purpose
to be explicit by saying that "driver need iova as va" mode. So as comment
aligned on top says so.

Aron suggested to remove [1] and squash into this patch and that I did.

Your proposition is incorrect, Should says IOVA_AS_VA explicitly!.
Request to follow work history, sorry I agains can't find you comment 
logical.

[1] http://dpdk.org/dev/patchwork/patch/27000/

> [...]
>>  /*
>> - * Get iommu class of pci devices on the bus.
> This line has been added in previous patch.
> Please fix it earlier.

What to fix? Be more explicit, can;t understand your comment.

> [...]
>> +/*
>> + * Any one of the device has iova as va
>> + */
>> +static inline int
>> +pci_device_has_iova_va(void)
> The name of this function does not suggest that it scans
> every devices.

Its not scanning, It search for kdrv match. You misunderstood.
disagree.

>> +{
>> +	struct rte_pci_device *dev = NULL;
>> +	struct rte_pci_driver *drv = NULL;
>> +
>> +	FOREACH_DRIVER_ON_PCIBUS(drv) {
>> +		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
>> +			FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +				if (dev->kdrv == RTE_KDRV_VFIO &&
>> +				    rte_pci_match(drv, dev))
>> +					return 1;
>> +			}
> This is the reason of exporting the match function?
> (note: match() is bus driver function, so it should not be exported)
> Just because you get every devices without driver filtering?

I disagree, It is a bus function abstraction code w.r.t iommu class of device, 
in case you missed reading source code and Implementation is correct.
That needs exporting rte_pci_match(). Or else
write code and show your code snippet as illustration, I doubt that you really
understood this whole topic and its design theme.

Thanks.

> There should be a better solution.
> Please try to compare drv with dev->driver.
>
  
Thomas Monjalon Oct. 6, 2017, 7:56 a.m. UTC | #3
06/10/2017 05:22, santosh:
> 
> On Friday 06 October 2017 05:47 AM, Thomas Monjalon wrote:
> > 20/09/2017 13:23, Santosh Shukla:
> >> +/** Device driver supports iova as va */
> >> +#define RTE_PCI_DRV_IOVA_AS_VA 0X0040
> > This flag name is surprizing and the comment does not help.
> > For the comment:
> > 	"Device driver supports I/O virtual addressing" ?
> > For the flag:
> > 	RTE_PCI_DRV_IOVA ?
> 
> Read [1].
> 
> V9 series went through evolution as a result of thorough review process.
> That name kept like above is - "Not for FUN", its for reason and its purpose
> to be explicit by saying that "driver need iova as va" mode. So as comment
> aligned on top says so.
> 
> Aron suggested to remove [1] and squash into this patch and that I did.
> 
> Your proposition is incorrect, Should says IOVA_AS_VA explicitly!.
> Request to follow work history, sorry I agains can't find you comment 
> logical.

Yes my proposal is not good.

> [1] http://dpdk.org/dev/patchwork/patch/27000/
> 
> > [...]
> >>  /*
> >> - * Get iommu class of pci devices on the bus.
> > This line has been added in previous patch.
> > Please fix it earlier.
> 
> What to fix? Be more explicit, can;t understand your comment.

You make this change:
- * Get iommu class of pci devices on the bus.
+ * Get iommu class of PCI devices on the bus.

It is better to write squash this uppercase change in
previous commit where you introduce this comment.

> > [...]
> >> +/*
> >> + * Any one of the device has iova as va
> >> + */
> >> +static inline int
> >> +pci_device_has_iova_va(void)
> > The name of this function does not suggest that it scans
> > every devices.
> 
> Its not scanning, It search for kdrv match. You misunderstood.
> disagree.

Yes my wording is not understandable.
By "scanning", I mean interating on lists.

About the function name, it could be:
	pci_one_device_has_iova_va
It better shows that the function check every devices.

> >> +{
> >> +	struct rte_pci_device *dev = NULL;
> >> +	struct rte_pci_driver *drv = NULL;
> >> +
> >> +	FOREACH_DRIVER_ON_PCIBUS(drv) {
> >> +		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
> >> +			FOREACH_DEVICE_ON_PCIBUS(dev) {
> >> +				if (dev->kdrv == RTE_KDRV_VFIO &&
> >> +				    rte_pci_match(drv, dev))
> >> +					return 1;
> >> +			}
> > This is the reason of exporting the match function?
> > (note: match() is bus driver function, so it should not be exported)
> > Just because you get every devices without driver filtering?
> 
> I disagree, It is a bus function abstraction code w.r.t iommu class of device, 
> in case you missed reading source code and Implementation is correct.
> That needs exporting rte_pci_match(). Or else
> write code and show your code snippet as illustration, I doubt that you really
> understood this whole topic and its design theme.

OK, let's imagine I don't understand the whole topic.

> > There should be a better solution.
> > Please try to compare drv with dev->driver.

You could have answered that dev->driver is filled on probing
and you are doing the check before probing.

I don't want to continue this discussion.
We will rework which functions are exported when moving the PCI driver
out of EAL.
  

Patch

diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 0e36de093..a67d77f22 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -202,6 +202,8 @@  struct rte_pci_bus {
 #define RTE_PCI_DRV_INTR_RMV 0x0010
 /** Device driver needs to keep mapped resources if unsupported dev detected */
 #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
+/** Device driver supports iova as va */
+#define RTE_PCI_DRV_IOVA_AS_VA 0X0040
 
 /**
  * A structure describing a PCI mapping.
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 26f2be822..2971f1d4f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -45,6 +45,7 @@ 
 #include "eal_filesystem.h"
 #include "eal_private.h"
 #include "eal_pci_init.h"
+#include "eal_vfio.h"
 
 /**
  * @file
@@ -488,11 +489,97 @@  rte_pci_scan(void)
 }
 
 /*
- * Get iommu class of pci devices on the bus.
+ * Is pci device bound to any kdrv
+ */
+static inline int
+pci_device_is_bound(void)
+{
+	struct rte_pci_device *dev = NULL;
+	int ret = 0;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
+		    dev->kdrv == RTE_KDRV_NONE) {
+			continue;
+		} else {
+			ret = 1;
+			break;
+		}
+	}
+	return ret;
+}
+
+/*
+ * Any one of the device bound to uio
+ */
+static inline int
+pci_device_bound_uio(void)
+{
+	struct rte_pci_device *dev = NULL;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
+		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Any one of the device has iova as va
+ */
+static inline int
+pci_device_has_iova_va(void)
+{
+	struct rte_pci_device *dev = NULL;
+	struct rte_pci_driver *drv = NULL;
+
+	FOREACH_DRIVER_ON_PCIBUS(drv) {
+		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
+			FOREACH_DEVICE_ON_PCIBUS(dev) {
+				if (dev->kdrv == RTE_KDRV_VFIO &&
+				    rte_pci_match(drv, dev))
+					return 1;
+			}
+		}
+	}
+	return 0;
+}
+
+/*
+ * Get iommu class of PCI devices on the bus.
  */
 enum rte_iova_mode
 rte_pci_get_iommu_class(void)
 {
+	bool is_bound;
+	bool is_vfio_noiommu_enabled = true;
+	bool has_iova_va;
+	bool is_bound_uio;
+
+	is_bound = pci_device_is_bound();
+	if (!is_bound)
+		return RTE_IOVA_DC;
+
+	has_iova_va = pci_device_has_iova_va();
+	is_bound_uio = pci_device_bound_uio();
+#ifdef VFIO_PRESENT
+	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == true ?
+					true : false;
+#endif
+
+	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
+		return RTE_IOVA_VA;
+
+	if (has_iova_va) {
+		RTE_LOG(WARNING, EAL, "Some devices want iova as va but pa will be used because.. ");
+		if (is_vfio_noiommu_enabled)
+			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
+		if (is_bound_uio)
+			RTE_LOG(WARNING, EAL, "few device bound to UIO\n");
+	}
+
 	return RTE_IOVA_PA;
 }
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 946df7e31..c8a97b7e7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -816,4 +816,23 @@  vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
 	return 0;
 }
 
+int
+vfio_noiommu_is_enabled(void)
+{
+	int fd, ret, cnt __rte_unused;
+	char c;
+
+	ret = -1;
+	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
+	if (fd < 0)
+		return -1;
+
+	cnt = read(fd, &c, 1);
+	if (c == 'Y')
+		ret = 1;
+
+	close(fd);
+	return ret;
+}
+
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 5ff63e5d7..26ea8e119 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -150,6 +150,8 @@  struct vfio_config {
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
+#define VFIO_NOIOMMU_MODE      \
+	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
 
 /* DMA mapping function prototype.
  * Takes VFIO container fd as a parameter.
@@ -210,6 +212,8 @@  int pci_vfio_is_enabled(void);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_noiommu_is_enabled(void);
+
 #define SOCKET_REQ_CONTAINER 0x100
 #define SOCKET_REQ_GROUP 0x200
 #define SOCKET_CLR_GROUP 0x300