[dpdk-dev] [PATCH v7 2/5] vfio: add multi container support

Wang, Xiao W xiao.w.wang at intel.com
Mon Apr 16 14:44:13 CEST 2018


Hi Anatoly,

> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Monday, April 16, 2018 6:03 PM
> To: Wang, Xiao W <xiao.w.wang at intel.com>; Yigit, Ferruh
> <ferruh.yigit at intel.com>
> Cc: dev at dpdk.org; maxime.coquelin at redhat.com; Wang, Zhihong
> <zhihong.wang at intel.com>; Bie, Tiwei <tiwei.bie at intel.com>; Tan, Jianfeng
> <jianfeng.tan at intel.com>; Liang, Cunming <cunming.liang at intel.com>; Daly,
> Dan <dan.daly at intel.com>; thomas at monjalon.net; Chen, Junjie J
> <junjie.j.chen at intel.com>
> Subject: Re: [PATCH v7 2/5] vfio: add multi container support
> 
> On 15-Apr-18 4:33 PM, Xiao Wang wrote:
> > This patch adds APIs to support container create/destroy and device
> > bind/unbind with a container. It also provides API for IOMMU programing
> > on a specified container.
> >
> > A driver could use "rte_vfio_create_container" helper to create a
> 
> ^^ wrong API name in commit message :)

Thanks for the catch. Will fix it.

> 
> > new container from eal, use "rte_vfio_bind_group" to bind a device
> > to the newly created container. During rte_vfio_setup_device the
> > container bound with the device will be used for IOMMU setup.
> >
> > Signed-off-by: Junjie Chen <junjie.j.chen at intel.com>
> > Signed-off-by: Xiao Wang <xiao.w.wang at intel.com>
> > Reviewed-by: Maxime Coquelin <maxime.coquelin at redhat.com>
> > Reviewed-by: Ferruh Yigit <ferruh.yigit at intel.com>
> > ---
> >   lib/librte_eal/bsdapp/eal/eal.c          |  52 +++++
> >   lib/librte_eal/common/include/rte_vfio.h | 119 ++++++++++++
> >   lib/librte_eal/linuxapp/eal/eal_vfio.c   | 316
> +++++++++++++++++++++++++++++++
> >   lib/librte_eal/rte_eal_version.map       |   6 +
> >   4 files changed, 493 insertions(+)
> >
> > diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
> > index 727adc5d2..c5106d0d6 100644
> > --- a/lib/librte_eal/bsdapp/eal/eal.c
> > +++ b/lib/librte_eal/bsdapp/eal/eal.c
> > @@ -769,6 +769,14 @@ int rte_vfio_noiommu_is_enabled(void);
> >   int rte_vfio_clear_group(int vfio_group_fd);
> >   int rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len);
> >   int rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
> > +int rte_vfio_container_create(void);
> > +int rte_vfio_container_destroy(int container_fd);
> > +int rte_vfio_bind_group(int container_fd, int iommu_group_no);
> > +int rte_vfio_unbind_group(int container_fd, int iommu_group_no);
> 
> Maybe have these under "container" too? e.g.
> rte_vfio_container_group_bind/unbind? Seems like it would be more
> consistent that way - anything to do with custom containers would be
> under rte_vfio_container_* namespace.

Agree.

> 
> > +int rte_vfio_container_dma_map(int container_fd, uint64_t vaddr,
> > +		uint64_t iova, uint64_t len);
> > +int rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr,
> > +		uint64_t iova, uint64_t len);
> >
> >   int rte_vfio_setup_device(__rte_unused const char *sysfs_base,
> >   		      __rte_unused const char *dev_addr,
> > @@ -818,3 +826,47 @@ rte_vfio_dma_unmap(uint64_t __rte_unused vaddr,
> uint64_t __rte_unused iova,
> >   {
> >   	return -1;
> >   }
> > +
> 
> <...>
> 
> > diff --git a/lib/librte_eal/common/include/rte_vfio.h
> b/lib/librte_eal/common/include/rte_vfio.h
> > index d26ab01cb..0c1509b29 100644
> > --- a/lib/librte_eal/common/include/rte_vfio.h
> > +++ b/lib/librte_eal/common/include/rte_vfio.h
> > @@ -168,6 +168,125 @@ rte_vfio_dma_map(uint64_t vaddr, uint64_t iova,
> uint64_t len);
> >   int __rte_experimental
> >   rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Create a new container for device binding.
> 
> I would add a note that any newly allocated DPDK memory will not be
> mapped into these containers by default.

Will add it.

> 
> > + *
> > + * @return
> > + *   the container fd if successful
> > + *   <0 if failed
> > + */
> > +int __rte_experimental
> > +rte_vfio_container_create(void);
> > +
> 
> <...>
> 
> > + *    0 if successful
> > + *   <0 if failed
> > + */
> > +int __rte_experimental
> > +rte_vfio_unbind_group(int container_fd, int iommu_group_no);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Perform dma mapping for devices in a conainer.
> 
> Here and in other places: "dma" should be DMA, and typo: "conainer" :)
> 
> I think you should also add a note to the original API (not this one,
> but the old one) that DMA maps done via that API will only apply to
> default container and will not apply to any of the containers created
> via container_create(). IOW, documentation should make it clear that if
> you use this functionality, you're on your own and you have to manage
> your own DMA mappings for any containers you create.

OK, will add note to clearly describe it.

> 
> > + *
> > + * @param container_fd
> > + *   the specified container fd
> > + *
> > + * @param vaddr
> > + *   Starting virtual address of memory to be mapped.
> > + *
> 
> <...>
> 
> > +
> > +int __rte_experimental
> > +rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t
> iova,
> > +		uint64_t len)
> > +{
> > +	struct user_mem_map *new_map;
> > +	struct vfio_config *vfio_cfg;
> > +	struct user_mem_maps *user_mem_maps;
> > +	int ret = 0;
> > +
> > +	if (len == 0) {
> > +		rte_errno = EINVAL;
> > +		return -1;
> > +	}
> > +
> > +	vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
> > +	if (vfio_cfg == NULL) {
> > +		RTE_LOG(ERR, EAL, "Invalid container fd\n");
> > +		return -1;
> > +	}
> > +
> > +	user_mem_maps = &vfio_cfg->mem_maps;
> > +	rte_spinlock_recursive_lock(&user_mem_maps->lock);
> > +	if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) {
> > +		RTE_LOG(ERR, EAL, "No more space for user mem maps\n");
> > +		rte_errno = ENOMEM;
> > +		ret = -1;
> > +		goto out;
> > +	}
> > +	/* map the entry */
> > +	if (vfio_dma_mem_map(vfio_cfg, vaddr, iova, len, 1)) {
> > +		/* technically, this will fail if there are currently no devices
> > +		 * plugged in, even if a device were added later, this mapping
> > +		 * might have succeeded. however, since we cannot verify if
> this
> > +		 * is a valid mapping without having a device attached,
> consider
> > +		 * this to be unsupported, because we can't just store any old
> > +		 * mapping and pollute list of active mappings willy-nilly.
> > +		 */
> > +		RTE_LOG(ERR, EAL, "Couldn't map new region for DMA\n");
> > +		ret = -1;
> > +		goto out;
> > +	}
> > +	/* create new user mem map entry */
> > +	new_map = &user_mem_maps->maps[user_mem_maps->n_maps++];
> > +	new_map->addr = vaddr;
> > +	new_map->iova = iova;
> > +	new_map->len = len;
> > +
> > +	compact_user_maps(user_mem_maps);
> > +out:
> > +	rte_spinlock_recursive_unlock(&user_mem_maps->lock);
> > +	return ret;
> 
> Please correct me if i'm wrong, but it looks like you've just duplicated
> the code for rte_vfio_dma_map() here and made a few small changes. It
> would be better if you moved most of this into a static function (e.g.
> static int container_dma_map(vfio_cfg, vaddr, iova, len)) and called it
> with either default vfio_cfg from rte_vfio_dma_map, or found vfio_cfg
> from rte_vfio_container_dma_map. Same applies to function below.

Agree, will do it in v8.

BRs,
Xiao

> 
> > +}
> > +
> > +int __rte_experimental
> > +rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, uint64_t
> iova,
> > +		uint64_t len)
> > +{
> > +	struct user_mem_map *map, *new_map = NULL;
> > +	struct vfio_config *vfio_cfg;
> > +	struct user_mem_maps *user_mem_maps;
> > +	int ret = 0;
> > +
> > +	if (len == 0) {
> > +		rte_errno = EINVAL;
> > +		return -1;
> > +	}
> > +
> 
> <...>
> 
> --
> Thanks,
> Anatoly


More information about the dev mailing list