[dpdk-dev,v2] eal: sPAPR IOMMU support in pci probing for vfio-pci in ppc64le

Message ID 89825b7a9e0758dea19a01eb347c0753bf2c4134.1488480000.git.gowrishankar.m@linux.vnet.ibm.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Gowrishankar March 3, 2017, 3:45 a.m. UTC
  From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>

Below changes adds pci probing support for vfio-pci devices in power8.

Changes:
v2 - kernel version checked and doc updated

Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
---
 doc/guides/rel_notes/release_17_05.rst |  4 ++
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 90 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.h |  6 +++
 3 files changed, 100 insertions(+)
  

Comments

Anatoly Burakov March 3, 2017, 9:08 a.m. UTC | #1
Hi Muthurkrishnan,

> From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
> 
> Below changes adds pci probing support for vfio-pci devices in power8.
> 
> Changes:
> v2 - kernel version checked and doc updated
> 
> Signed-off-by: Gowrishankar Muthukrishnan
> <gowrishankar.m@linux.vnet.ibm.com>
> ---
>  doc/guides/rel_notes/release_17_05.rst |  4 ++
>  lib/librte_eal/linuxapp/eal/eal_vfio.c | 90
> ++++++++++++++++++++++++++++++++++
>  lib/librte_eal/linuxapp/eal/eal_vfio.h |  6 +++
>  3 files changed, 100 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_17_05.rst
> b/doc/guides/rel_notes/release_17_05.rst
> index e25ea9f..4b90036 100644
> --- a/doc/guides/rel_notes/release_17_05.rst
> +++ b/doc/guides/rel_notes/release_17_05.rst
> @@ -42,6 +42,10 @@ New Features
> 
> =========================================================
> 
> 
> +* **Added powerpc support in pci probing for vfio-pci devices.**
> +
> +  sPAPR IOMMU based pci probing enabled for vfio-pci devices.
> +
>  Resolved Issues
>  ---------------
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index 702f7a2..9377a66 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -50,12 +50,15 @@
>  static struct vfio_config vfio_cfg;
> 
>  static int vfio_type1_dma_map(int);
> +static int vfio_spapr_dma_map(int);
>  static int vfio_noiommu_dma_map(int);
> 
>  /* IOMMU types we support */
>  static const struct vfio_iommu_type iommu_types[] = {
>  	/* x86 IOMMU, otherwise known as type 1 */
>  	{ RTE_VFIO_TYPE1, "Type 1", &vfio_type1_dma_map},
> +	/* ppc64 IOMMU, otherwise known as spapr */
> +	{ RTE_VFIO_SPAPR, "sPAPR", &vfio_spapr_dma_map},
>  	/* IOMMU-less mode */
>  	{ RTE_VFIO_NOIOMMU, "No-IOMMU",
> &vfio_noiommu_dma_map},
>  };
> @@ -540,6 +543,93 @@ int vfio_setup_device(const char *sysfs_base, const
> char *dev_addr,
>  }
> 
>  static int
> +vfio_spapr_dma_map(int vfio_container_fd)
> +{
> +	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
> +	int i, ret;
> +
> +	struct vfio_iommu_spapr_register_memory reg = {
> +		.argsz = sizeof(reg),
> +		.flags = 0
> +	};
> +	struct vfio_iommu_spapr_tce_info info = {
> +		.argsz = sizeof(info),
> +	};
> +	struct vfio_iommu_spapr_tce_create create = {
> +		.argsz = sizeof(create),
> +	};
> +	struct vfio_iommu_spapr_tce_remove remove = {
> +		.argsz = sizeof(remove),
> +	};
> +
> +	/* query spapr iommu info */
> +	ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO,

Please correct me if I'm wrong here, but wouldn't all of these SPAPR-specific defines
and structures not be available for pre-4.2? So the kernel check should also
contain all the definitions and structs as well. Maybe it's better to just not compile
SPAPR support on older kernels, rather than duplicating all the VFIO code.

Any opinions?
  
Gowrishankar March 3, 2017, 12:31 p.m. UTC | #2
Hi Anatoly,


On Friday 03 March 2017 02:38 PM, Burakov, Anatoly wrote:
>
> Please correct me if I'm wrong here, but wouldn't all of these SPAPR-specific defines
> and structures not be available for pre-4.2? So the kernel check should also
> contain all the definitions and structs as well. Maybe it's better to just not compile
> SPAPR support on older kernels, rather than duplicating all the VFIO code.
>
> Any opinions?
>
>
Thanks for this check.

As far as its trace in linux main stream, I see it was merged in 4.2. 
But, it depends on distro when we go back in older kernels.
Some distros may have back-ported it too - eg. linux-3.10.0-514.6.2.el7 
in RHEL 7.2 supports it. So, we might realise whether not supported, 
only in run time (atleast without autoconf sort of stuff in dpdk) IMO. 
any thoughts ?.

Regards,
Gowrishankar
  
Anatoly Burakov March 3, 2017, 12:55 p.m. UTC | #3
Hi Gowrishankar,

> Hi Anatoly,
> 
> 
> On Friday 03 March 2017 02:38 PM, Burakov, Anatoly wrote:
> >
> > Please correct me if I'm wrong here, but wouldn't all of these
> > SPAPR-specific defines and structures not be available for pre-4.2? So
> > the kernel check should also contain all the definitions and structs
> > as well. Maybe it's better to just not compile SPAPR support on older
> kernels, rather than duplicating all the VFIO code.
> >
> > Any opinions?
> >
> >
> Thanks for this check.
> 
> As far as its trace in linux main stream, I see it was merged in 4.2.
> But, it depends on distro when we go back in older kernels.
> Some distros may have back-ported it too - eg. linux-3.10.0-514.6.2.el7 in
> RHEL 7.2 supports it. So, we might realise whether not supported, only in run
> time (atleast without autoconf sort of stuff in dpdk) IMO.
> any thoughts ?.
> 
> Regards,
> Gowrishankar

I guess the best way to go would be something like:

#ifndef SPAPR_IOMMU_TYPE
#define RTE_SPAPR_IOMMU 6
#define SPAPR_MAP_FOO bar
struct spapr_foo {};
struct spapr_bar {};
#else
#define RTE_SPAPR_IOMMU SPAPR_IOMMU_TYPE
#endif

Even though it's a bit messy, this way we won't be dependent on kernel version or if any distro has backported SPAPR support. Does that sound reasonable?

Thanks,
Anatoly
  

Patch

diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index e25ea9f..4b90036 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -42,6 +42,10 @@  New Features
      =========================================================
 
 
+* **Added powerpc support in pci probing for vfio-pci devices.**
+
+  sPAPR IOMMU based pci probing enabled for vfio-pci devices.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 702f7a2..9377a66 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -50,12 +50,15 @@ 
 static struct vfio_config vfio_cfg;
 
 static int vfio_type1_dma_map(int);
+static int vfio_spapr_dma_map(int);
 static int vfio_noiommu_dma_map(int);
 
 /* IOMMU types we support */
 static const struct vfio_iommu_type iommu_types[] = {
 	/* x86 IOMMU, otherwise known as type 1 */
 	{ RTE_VFIO_TYPE1, "Type 1", &vfio_type1_dma_map},
+	/* ppc64 IOMMU, otherwise known as spapr */
+	{ RTE_VFIO_SPAPR, "sPAPR", &vfio_spapr_dma_map},
 	/* IOMMU-less mode */
 	{ RTE_VFIO_NOIOMMU, "No-IOMMU", &vfio_noiommu_dma_map},
 };
@@ -540,6 +543,93 @@  int vfio_setup_device(const char *sysfs_base, const char *dev_addr,
 }
 
 static int
+vfio_spapr_dma_map(int vfio_container_fd)
+{
+	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
+	int i, ret;
+
+	struct vfio_iommu_spapr_register_memory reg = {
+		.argsz = sizeof(reg),
+		.flags = 0
+	};
+	struct vfio_iommu_spapr_tce_info info = {
+		.argsz = sizeof(info),
+	};
+	struct vfio_iommu_spapr_tce_create create = {
+		.argsz = sizeof(create),
+	};
+	struct vfio_iommu_spapr_tce_remove remove = {
+		.argsz = sizeof(remove),
+	};
+
+	/* query spapr iommu info */
+	ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  cannot get iommu info, "
+				"error %i (%s)\n", errno, strerror(errno));
+		return -1;
+	}
+
+	/* remove default DMA of 32 bit window */
+	remove.start_addr = info.dma32_window_start;
+	ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  cannot remove default DMA window, "
+				"error %i (%s)\n", errno, strerror(errno));
+		return -1;
+	}
+
+	/* calculate window size based on number of hugepages configured */
+	create.window_size = rte_eal_get_physmem_size();
+	create.page_shift = __builtin_ctzll(ms->hugepage_sz);
+	create.levels = 2;
+
+	ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  cannot create new DMA window, "
+				"error %i (%s)\n", errno, strerror(errno));
+		return -1;
+	}
+
+	/* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
+	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+		struct vfio_iommu_type1_dma_map dma_map;
+
+		if (ms[i].addr == NULL)
+			break;
+
+		reg.vaddr = (uintptr_t) ms[i].addr;
+		reg.size = ms[i].len;
+		ret = ioctl(vfio_container_fd,
+			VFIO_IOMMU_SPAPR_REGISTER_MEMORY, &reg);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  cannot register vaddr for IOMMU, "
+				"error %i (%s)\n", errno, strerror(errno));
+			return -1;
+		}
+
+		memset(&dma_map, 0, sizeof(dma_map));
+		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
+		dma_map.vaddr = ms[i].addr_64;
+		dma_map.size = ms[i].len;
+		dma_map.iova = ms[i].phys_addr;
+		dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
+				 VFIO_DMA_MAP_FLAG_WRITE;
+
+		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
+
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  cannot set up DMA remapping, "
+				"error %i (%s)\n", errno, strerror(errno));
+			return -1;
+		}
+
+	}
+
+	return 0;
+}
+
+static int
 vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
 {
 	/* No-IOMMU mode does not need DMA mapping */
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 29f7f3e..1dafe57 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -54,6 +54,12 @@ 
 
 #define RTE_VFIO_TYPE1 VFIO_TYPE1_IOMMU
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 2, 0)
+#define RTE_VFIO_SPAPR 7
+#else
+#define RTE_VFIO_SPAPR VFIO_SPAPR_TCE_v2_IOMMU
+#endif
+
 #if LINUX_VERSION_CODE < KERNEL_VERSION(4, 5, 0)
 #define RTE_VFIO_NOIOMMU 8
 #else