[dpdk-dev,v4] vfio: fix sPAPR IOMMU DMA window size

Message ID 1502191002-13988-1-git-send-email-jpf@zurich.ibm.com (mailing list archive)
State Accepted, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Jonas Pfefferle1 Aug. 8, 2017, 11:16 a.m. UTC
  DMA window size needs to be big enough to span all memory segment's
physical addresses. We do not need multiple levels of IOMMU tables
as we already span ~70TB of physical memory with 16MB hugepages.

Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
---
v2:
* roundup to next power 2 function without loop.

v3:
* Replace roundup_next_pow2 with rte_align64pow2

v4:
* do not assume ordering of physical addresses of memsegs

 lib/librte_eal/linuxapp/eal/eal_vfio.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)
  

Comments

Anatoly Burakov Aug. 8, 2017, 12:06 p.m. UTC | #1
> From: Jonas Pfefferle [mailto:jpf@zurich.ibm.com]
> Sent: Tuesday, August 8, 2017 12:17 PM
> To: Burakov, Anatoly <anatoly.burakov@intel.com>
> Cc: dev@dpdk.org; aik@ozlabs.ru; Jonas Pfefferle <jpf@zurich.ibm.com>
> Subject: [PATCH v4] vfio: fix sPAPR IOMMU DMA window size
> 
> DMA window size needs to be big enough to span all memory segment's
> physical addresses. We do not need multiple levels of IOMMU tables as we
> already span ~70TB of physical memory with 16MB hugepages.
> 
> Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
> ---
> v2:
> * roundup to next power 2 function without loop.
> 
> v3:
> * Replace roundup_next_pow2 with rte_align64pow2
> 
> v4:
> * do not assume ordering of physical addresses of memsegs
> 
>  lib/librte_eal/linuxapp/eal/eal_vfio.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index 946df7e..7d5d61d 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -759,10 +759,19 @@ vfio_spapr_dma_map(int vfio_container_fd)
>  		return -1;
>  	}
> 
> -	/* calculate window size based on number of hugepages configured
> */
> -	create.window_size = rte_eal_get_physmem_size();
> +	/* create DMA window from 0 to max(phys_addr + len) */
> +	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> +		if (ms[i].addr == NULL)
> +			break;
> +
> +		create.window_size = RTE_MAX(create.window_size,
> +				ms[i].phys_addr + ms[i].len);
> +	}
> +
> +	/* sPAPR requires window size to be a power of 2 */
> +	create.window_size = rte_align64pow2(create.window_size);
>  	create.page_shift = __builtin_ctzll(ms->hugepage_sz);
> -	create.levels = 2;
> +	create.levels = 1;
> 
>  	ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE,
> &create);
>  	if (ret) {
> @@ -771,6 +780,11 @@ vfio_spapr_dma_map(int vfio_container_fd)
>  		return -1;
>  	}
> 
> +	if (create.start_addr != 0) {
> +		RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
> +		return -1;
> +	}
> +
>  	/* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>  	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>  		struct vfio_iommu_type1_dma_map dma_map;
> --
> 2.7.4

Acked by: Anatoly Burakov <anatoly.burakov@intel.com>

Thanks,
Anatoly
  
Alexey Kardashevskiy Aug. 10, 2017, 6:03 a.m. UTC | #2
On 08/08/17 21:16, Jonas Pfefferle wrote:
> DMA window size needs to be big enough to span all memory segment's
> physical addresses. We do not need multiple levels of IOMMU tables
> as we already span ~70TB of physical memory with 16MB hugepages.
> 
> Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> ---
> v2:
> * roundup to next power 2 function without loop.
> 
> v3:
> * Replace roundup_next_pow2 with rte_align64pow2
> 
> v4:
> * do not assume ordering of physical addresses of memsegs
> 
>  lib/librte_eal/linuxapp/eal/eal_vfio.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index 946df7e..7d5d61d 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -759,10 +759,19 @@ vfio_spapr_dma_map(int vfio_container_fd)
>  		return -1;
>  	}
>  
> -	/* calculate window size based on number of hugepages configured */
> -	create.window_size = rte_eal_get_physmem_size();
> +	/* create DMA window from 0 to max(phys_addr + len) */
> +	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> +		if (ms[i].addr == NULL)
> +			break;
> +
> +		create.window_size = RTE_MAX(create.window_size,
> +				ms[i].phys_addr + ms[i].len);
> +	}
> +
> +	/* sPAPR requires window size to be a power of 2 */
> +	create.window_size = rte_align64pow2(create.window_size);
>  	create.page_shift = __builtin_ctzll(ms->hugepage_sz);
> -	create.levels = 2;
> +	create.levels = 1;
>  
>  	ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
>  	if (ret) {
> @@ -771,6 +780,11 @@ vfio_spapr_dma_map(int vfio_container_fd)
>  		return -1;
>  	}
>  
> +	if (create.start_addr != 0) {
> +		RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
> +		return -1;
> +	}
> +
>  	/* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>  	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>  		struct vfio_iommu_type1_dma_map dma_map;
>
  
Jonas Pfefferle1 Oct. 10, 2017, 8:02 a.m. UTC | #3
Hi Thomas,

Can you please apply this patch?

Thanks,
Jonas

Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 08/10/2017 08:03:17 AM:

> From: Alexey Kardashevskiy <aik@ozlabs.ru>
> To: Jonas Pfefferle <jpf@zurich.ibm.com>, anatoly.burakov@intel.com
> Cc: dev@dpdk.org
> Date: 08/10/2017 08:03 AM
> Subject: Re: [PATCH v4] vfio: fix sPAPR IOMMU DMA window size
>
> On 08/08/17 21:16, Jonas Pfefferle wrote:
> > DMA window size needs to be big enough to span all memory segment's
> > physical addresses. We do not need multiple levels of IOMMU tables
> > as we already span ~70TB of physical memory with 16MB hugepages.
> >
> > Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
>
>
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
>
>
> > ---
> > v2:
> > * roundup to next power 2 function without loop.
> >
> > v3:
> > * Replace roundup_next_pow2 with rte_align64pow2
> >
> > v4:
> > * do not assume ordering of physical addresses of memsegs
> >
> >  lib/librte_eal/linuxapp/eal/eal_vfio.c | 20 +++++++++++++++++---
> >  1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
> librte_eal/linuxapp/eal/eal_vfio.c
> > index 946df7e..7d5d61d 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > @@ -759,10 +759,19 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >        return -1;
> >     }
> >
> > -   /* calculate window size based on number of hugepages configured */
> > -   create.window_size = rte_eal_get_physmem_size();
> > +   /* create DMA window from 0 to max(phys_addr + len) */
> > +   for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> > +      if (ms[i].addr == NULL)
> > +         break;
> > +
> > +      create.window_size = RTE_MAX(create.window_size,
> > +            ms[i].phys_addr + ms[i].len);
> > +   }
> > +
> > +   /* sPAPR requires window size to be a power of 2 */
> > +   create.window_size = rte_align64pow2(create.window_size);
> >     create.page_shift = __builtin_ctzll(ms->hugepage_sz);
> > -   create.levels = 2;
> > +   create.levels = 1;
> >
> >     ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE,
&create);
> >     if (ret) {
> > @@ -771,6 +780,11 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >        return -1;
> >     }
> >
> > +   if (create.start_addr != 0) {
> > +      RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
> > +      return -1;
> > +   }
> > +
> >     /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
> >     for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> >        struct vfio_iommu_type1_dma_map dma_map;
> >
>
>
> --
> Alexey
>
  
Thomas Monjalon Oct. 10, 2017, 1:33 p.m. UTC | #4
08/08/2017 14:06, Burakov, Anatoly:
> > From: Jonas Pfefferle [mailto:jpf@zurich.ibm.com]
> > Sent: Tuesday, August 8, 2017 12:17 PM
> > To: Burakov, Anatoly <anatoly.burakov@intel.com>
> > Cc: dev@dpdk.org; aik@ozlabs.ru; Jonas Pfefferle <jpf@zurich.ibm.com>
> > Subject: [PATCH v4] vfio: fix sPAPR IOMMU DMA window size
> > 
> > DMA window size needs to be big enough to span all memory segment's
> > physical addresses. We do not need multiple levels of IOMMU tables as we
> > already span ~70TB of physical memory with 16MB hugepages.
> > 
> > Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
> 
> Acked by: Anatoly Burakov <anatoly.burakov@intel.com>

Applied, thanks
  

Patch

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 946df7e..7d5d61d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -759,10 +759,19 @@  vfio_spapr_dma_map(int vfio_container_fd)
 		return -1;
 	}
 
-	/* calculate window size based on number of hugepages configured */
-	create.window_size = rte_eal_get_physmem_size();
+	/* create DMA window from 0 to max(phys_addr + len) */
+	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+		if (ms[i].addr == NULL)
+			break;
+
+		create.window_size = RTE_MAX(create.window_size,
+				ms[i].phys_addr + ms[i].len);
+	}
+
+	/* sPAPR requires window size to be a power of 2 */
+	create.window_size = rte_align64pow2(create.window_size);
 	create.page_shift = __builtin_ctzll(ms->hugepage_sz);
-	create.levels = 2;
+	create.levels = 1;
 
 	ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
 	if (ret) {
@@ -771,6 +780,11 @@  vfio_spapr_dma_map(int vfio_container_fd)
 		return -1;
 	}
 
+	if (create.start_addr != 0) {
+		RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
+		return -1;
+	}
+
 	/* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
 	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
 		struct vfio_iommu_type1_dma_map dma_map;