[dpdk-dev] [PATCH v2 3/3] net/ifcvf: add ifcvf driver

Wang, Xiao W xiao.w.wang at intel.com
Thu Mar 22 18:23:44 CET 2018


Hi Ferruh,

> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Thursday, March 22, 2018 4:51 PM
> To: Wang, Xiao W <xiao.w.wang at intel.com>; maxime.coquelin at redhat.com;
> yliu at fridaylinux.org
> Cc: dev at dpdk.org; Wang, Zhihong <zhihong.wang at intel.com>; Bie, Tiwei
> <tiwei.bie at intel.com>; Chen, Junjie J <junjie.j.chen at intel.com>; Xu, Rosen
> <rosen.xu at intel.com>; Daly, Dan <dan.daly at intel.com>; Liang, Cunming
> <cunming.liang at intel.com>; Burakov, Anatoly <anatoly.burakov at intel.com>;
> gaetan.rivet at 6wind.com
> Subject: Re: [dpdk-dev] [PATCH v2 3/3] net/ifcvf: add ifcvf driver
> 
> On 3/21/2018 1:21 PM, Xiao Wang wrote:
> > ifcvf driver uses vdev as a control domain to manage ifc VFs that belong
> > to it. It registers vDPA device ops to vhost lib to enable these VFs to be
> > used as vhost data path accelerator.
> >
> > Live migration feature is supported by ifc VF and this driver enables
> > it based on vhost lib.
> >
> > Because vDPA driver needs to set up MSI-X vector to interrupt the guest,
> > only vfio-pci is supported currently.
> >
> > Signed-off-by: Xiao Wang <xiao.w.wang at intel.com>
> > Signed-off-by: Rosen Xu <rosen.xu at intel.com>
> > ---
> > v2:
> > - Rebase on Zhihong's vDPA v3 patch set.
> > ---
> >  config/common_base                      |    6 +
> >  config/common_linuxapp                  |    1 +
> >  drivers/net/Makefile                    |    1 +
> >  drivers/net/ifcvf/Makefile              |   40 +
> >  drivers/net/ifcvf/base/ifcvf.c          |  329 ++++++++
> >  drivers/net/ifcvf/base/ifcvf.h          |  156 ++++
> >  drivers/net/ifcvf/base/ifcvf_osdep.h    |   52 ++
> >  drivers/net/ifcvf/ifcvf_ethdev.c        | 1240
> +++++++++++++++++++++++++++++++
> >  drivers/net/ifcvf/rte_ifcvf_version.map |    4 +
> >  mk/rte.app.mk                           |    1 +
> 
> Need .ini file to represent driver features.
> Also it is good to add driver documentation and a note into release note to
> announce new driver.

Will do.

> 
> >  10 files changed, 1830 insertions(+)
> >  create mode 100644 drivers/net/ifcvf/Makefile
> >  create mode 100644 drivers/net/ifcvf/base/ifcvf.c
> >  create mode 100644 drivers/net/ifcvf/base/ifcvf.h
> >  create mode 100644 drivers/net/ifcvf/base/ifcvf_osdep.h
> >  create mode 100644 drivers/net/ifcvf/ifcvf_ethdev.c
> >  create mode 100644 drivers/net/ifcvf/rte_ifcvf_version.map
> >
> > diff --git a/config/common_base b/config/common_base
> > index ad03cf433..06fce1ebf 100644
> > --- a/config/common_base
> > +++ b/config/common_base
> > @@ -791,6 +791,12 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
> >  #
> >  CONFIG_RTE_LIBRTE_PMD_VHOST=n
> >
> > +#
> > +# Compile IFCVF driver
> > +# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
> > +#
> > +CONFIG_RTE_LIBRTE_IFCVF=n
> > +
> >  #
> >  # Compile the test application
> >  #
> > diff --git a/config/common_linuxapp b/config/common_linuxapp
> > index ff98f2355..358d00468 100644
> > --- a/config/common_linuxapp
> > +++ b/config/common_linuxapp
> > @@ -15,6 +15,7 @@ CONFIG_RTE_LIBRTE_PMD_KNI=y
> >  CONFIG_RTE_LIBRTE_VHOST=y
> >  CONFIG_RTE_LIBRTE_VHOST_NUMA=y
> >  CONFIG_RTE_LIBRTE_PMD_VHOST=y
> > +CONFIG_RTE_LIBRTE_IFCVF=y
> 
> Current syntax for PMD config options:
> Virtual ones: CONFIG_RTE_LIBRTE_PMD_XXX
> Physical ones: CONFIG_RTE_LIBRTE_XXX_PMD
> 
> Virtual / Physical difference most probably not done intentionally but that is
> what it is right now.
> 
> Is "PMD" not added intentionally to the config option?

I think vDPA driver is not polling mode, so I didn't put a "PMD" here. Do you think CONFIG_RTE_LIBRTE_VDPA_IFCVF is better?

> 
> And what is the config time dependency of the driver, I assume VHOST is one
> of
> them but are there more?

This dependency is described in drivers/net/Makefile, CONFIG_RTE_EAL_VFIO is another one, will add it.

> 
> >  CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
> >  CONFIG_RTE_LIBRTE_PMD_TAP=y
> >  CONFIG_RTE_LIBRTE_AVP_PMD=y
> > diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> > index e1127326b..496acf2d2 100644
> > --- a/drivers/net/Makefile
> > +++ b/drivers/net/Makefile
> > @@ -53,6 +53,7 @@ endif # $(CONFIG_RTE_LIBRTE_SCHED)
> >
> >  ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
> >  DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
> > +DIRS-$(CONFIG_RTE_LIBRTE_IFCVF) += ifcvf
> 
> Since this is mainly vpda driver, does it make sense to put it under
> drivers/net/virtio/vpda/ifcvf
> 
> When there are more vpda driver they can go into drivers/net/virtio/vpda/*

vDPA is for vhost offloading/acceleration, the device can be quite different from virtio,
they just need to be virtio ring compatible, and the usage model is quite different from virtio pmd.
I think vDPA driver should not go into drivers/net/virtio dir.

> 
> Combining with below not registering ethdev comment, virtual driver can
> register
> itself as vpda_ifcvf:
> RTE_PMD_REGISTER_VDEV(vpda_ifcvf, ifcvf_drv);

Yes, very limited ethdev APIs can be implemented for this ethdev, I'll try to remove the ethdev registering.

> 
> >  endif # $(CONFIG_RTE_LIBRTE_VHOST)
> >
> >  ifeq ($(CONFIG_RTE_LIBRTE_MRVL_PMD),y)
> > diff --git a/drivers/net/ifcvf/Makefile b/drivers/net/ifcvf/Makefile
> > new file mode 100644
> > index 000000000..f3670cdf2
> > --- /dev/null
> > +++ b/drivers/net/ifcvf/Makefile
> > @@ -0,0 +1,40 @@
> > +# SPDX-License-Identifier: BSD-3-Clause
> > +# Copyright(c) 2018 Intel Corporation
> > +
> > +include $(RTE_SDK)/mk/rte.vars.mk
> > +
> > +#
> > +# library name
> > +#
> > +LIB = librte_ifcvf.a
> > +
> > +LDLIBS += -lpthread
> > +LDLIBS += -lrte_eal -lrte_mempool -lrte_pci
> > +LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_vhost
> > +LDLIBS += -lrte_bus_vdev -lrte_bus_pci
> > +
> > +CFLAGS += -O3
> > +CFLAGS += $(WERROR_FLAGS)
> > +CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
> > +CFLAGS += -I$(RTE_SDK)/drivers/bus/pci/linux
> > +CFLAGS += -DALLOW_EXPERIMENTAL_API
> > +
> > +#
> > +# Add extra flags for base driver source files to disable warnings in them
> > +#
> > +BASE_DRIVER_OBJS=$(sort $(patsubst %.c,%.o,$(notdir $(wildcard
> $(SRCDIR)/base/*.c))))
> > +$(foreach obj, $(BASE_DRIVER_OBJS), $(eval
> CFLAGS_$(obj)+=$(CFLAGS_BASE_DRIVER)))
> 
> It seems no CFLAGS_BASE_DRIVER defined yet, above lines can be removed for
> now.

Will remove it.

> 
> > +
> > +VPATH += $(SRCDIR)/base
> > +
> > +EXPORT_MAP := rte_ifcvf_version.map
> > +
> > +LIBABIVER := 1
> > +
> > +#
> > +# all source are stored in SRCS-y
> > +#
> > +SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += ifcvf_ethdev.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += ifcvf.c
> 
> Is it intentionally used "RTE_LIBRTE_PMD_VHOST" because of dependency or
> typo?

Sorry for the typo.

> 
> > +
> > +include $(RTE_SDK)/mk/rte.lib.mk
> <...>
> 
> > +static int
> > +eth_dev_ifcvf_create(struct rte_vdev_device *dev,
> > +		struct rte_pci_addr *pci_addr, int devices)
> > +{
> > +	const char *name = rte_vdev_device_name(dev);
> > +	struct rte_eth_dev *eth_dev = NULL;
> > +	struct ether_addr *eth_addr = NULL;
> > +	struct ifcvf_internal *internal = NULL;
> > +	struct internal_list *list = NULL;
> > +	struct rte_eth_dev_data *data = NULL;
> > +	struct rte_pci_addr pf_addr = *pci_addr;
> > +	int i;
> > +
> > +	list = rte_zmalloc_socket(name, sizeof(*list), 0,
> > +			dev->device.numa_node);
> > +	if (list == NULL)
> > +		goto error;
> > +
> > +	/* reserve an ethdev entry */
> > +	eth_dev = rte_eth_vdev_allocate(dev, sizeof(*internal));
> 
> Is this eth_dev used at all? It looks like it is only used for its private data,
> if so can it be possible to use something like:
> 
> struct ifdev {
> 	void *private;
> 	struct rte_device *dev;
> }
> 
> allocate memory for private and add this struct to the list, this may save
> ethdev overhead.
> 
> But I can see dev_start() and dev_stop() are used, not sure if they are the
> reason of the ethdev.

Registering an ethdev allows to dev_start/stop, but it seems this overhead doesn’t bring much benefit.
Your suggestion looks good.

> 
> > +	if (eth_dev == NULL)
> > +		goto error;
> > +
> > +	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0,
> > +			dev->device.numa_node);
> > +	if (eth_addr == NULL)
> > +		goto error;
> > +
> > +	*eth_addr = base_eth_addr;
> > +	eth_addr->addr_bytes[5] = eth_dev->data->port_id;
> > +
> > +	internal = eth_dev->data->dev_private;
> > +	internal->dev_name = strdup(name);
> 
> Need to free this later and on error paths

The error path has it:
        if (internal && internal->dev_name)
                free(internal->dev_name);

> 
> > +	if (internal->dev_name == NULL)
> > +		goto error;
> > +
> > +	internal->eng_addr.pci_addr = *pci_addr;
> > +	for (i = 0; i < devices; i++) {
> > +		pf_addr.domain = pci_addr->domain;
> > +		pf_addr.bus = pci_addr->bus;
> > +		pf_addr.devid = pci_addr->devid + (i + 1) / 8;
> > +		pf_addr.function = pci_addr->function + (i + 1) % 8;
> > +		internal->vf_info[i].pdev.addr = pf_addr;
> > +		rte_spinlock_init(&internal->vf_info[i].lock);
> > +	}
> > +	internal->max_devices = devices;
> 
> is it max_devices or number of devices?

It's a field to describe how many devices are contained in this vDPA engine. The value is min(user argument, IFCVF MAX VFs).
Rename it to dev_num looks better.

> 
> <...>
> 
> > +/*
> > + * If this vdev is created by user, then ifcvf will be taken by
> 
> created by user?

I mean when app creates this vdev, we can assume app wants ifcvf to be used as vDPA device.
Ifcvf has virtio's vendor ID and device ID, but it has its specific subsystem vendor ID and device ID.
So virtio pmd can take ifcvf first, then app stops the virtio port, and creates ifcvf vdev to drive ifcvf.

> 
> > + * this vdev.
> > + */
> > +static int
> > +ifcvf_take_over(struct rte_pci_addr *pci_addr, int num)
> > +{
> > +	uint16_t port_id;
> > +	int i, ret;
> > +	char devname[RTE_DEV_NAME_MAX_LEN];
> > +	struct rte_pci_addr vf_addr = *pci_addr;
> > +
> > +	for (i = 0; i < num; i++) {
> > +		vf_addr.function += i % 8;
> > +		vf_addr.devid += i / 8;
> > +		rte_pci_device_name(&vf_addr, devname,
> RTE_DEV_NAME_MAX_LEN);
> > +		ret = rte_eth_dev_get_port_by_name(devname, &port_id);
> 
> Who probed this device at first place?

If no whitelist specified, virtio pmd will probe it first.

> 
> > +		if (ret == 0) {
> > +			rte_eth_dev_close(port_id);
> > +			if (rte_eth_dev_detach(port_id, devname) < 0)
> 
> This will call the driver remov() also will remove device from device list, is
> it OK?

Or we can just call rte_eth_dev_release_port, to keep the device in the device list.
This will be better.

> 
> > +				return -1;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int
> > +rte_ifcvf_probe(struct rte_vdev_device *dev)
> > +{
> > +	struct rte_kvargs *kvlist = NULL;
> > +	int ret = 0;
> > +	struct rte_pci_addr pci_addr;
> > +	int devices;
> 
> devices can't be negative, and according open_int() it is uint16_t, it is
> possible to pick an unsigned storage type for it.

Will use unsigned type.

> 
> <...>
> 
> > +static int
> > +rte_ifcvf_remove(struct rte_vdev_device *dev)
> > +{
> > +	const char *name;
> > +	struct rte_eth_dev *eth_dev = NULL;
> > +
> > +	name = rte_vdev_device_name(dev);
> > +	RTE_LOG(INFO, PMD, "Un-Initializing ifcvf for %s\n", name);
> > +
> > +	/* find an ethdev entry */
> > +	eth_dev = rte_eth_dev_allocated(name);
> > +	if (eth_dev == NULL)
> > +		return -ENODEV;
> > +
> > +	eth_dev_close(eth_dev);
> > +	rte_free(eth_dev->data);
> > +	rte_eth_dev_release_port(eth_dev);
> 
> This does memset(ethdev->data, ..), so should be called before rte_free(data)

Agree, will change it .

> 
> > +
> > +	return 0;
> > +}
> > +
> > +static struct rte_vdev_driver ifcvf_drv = {
> > +	.probe = rte_ifcvf_probe,
> > +	.remove = rte_ifcvf_remove,
> > +};
> > +
> > +RTE_PMD_REGISTER_VDEV(net_ifcvf, ifcvf_drv);
> > +RTE_PMD_REGISTER_ALIAS(net_ifcvf, eth_ifcvf);
> 
> Alias for backport support, not needed for new drivers.

OK, will remove it.

> 
> > +RTE_PMD_REGISTER_PARAM_STRING(net_ifcvf,
> > +	"bdf=<bdf> "
> > +	"devices=<int>");
> 
> Above says:
>   #define ETH_IFCVF_DEVICES_ARG	"int"
> 
> Is argument "int" or "devices"? Using macro here helps preventing errors.

It's "devices", will fix it with using macro.

> 
> > diff --git a/drivers/net/ifcvf/rte_ifcvf_version.map
> b/drivers/net/ifcvf/rte_ifcvf_version.map
> > new file mode 100644
> > index 000000000..33d237913
> > --- /dev/null
> > +++ b/drivers/net/ifcvf/rte_ifcvf_version.map
> > @@ -0,0 +1,4 @@
> > +EXPERIMENTAL {
> 
> Please put release version here.

OK, will put "DPDK_18.05"

Thanks for the comments,
-Xiao

> 
> <...>


More information about the dev mailing list