[dpdk-dev] [PATCH v3 2/5] Link Bonding PMD Library (librte_eal/librte_ether link bonding support changes)

Neil Horman nhorman at tuxdriver.com
Mon Jun 16 13:07:58 CEST 2014


On Mon, Jun 16, 2014 at 08:59:25AM +0000, Doherty, Declan wrote:
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Friday, June 13, 2014 8:38 PM
> > To: Doherty, Declan
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v3 2/5] Link Bonding PMD Library
> > (librte_eal/librte_ether link bonding support changes)
> > 
> > On Fri, Jun 13, 2014 at 06:34:04PM +0000, Doherty, Declan wrote:
> > > > -----Original Message-----
> > > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > Sent: Friday, June 13, 2014 5:08 PM
> > > > To: Doherty, Declan
> > > > Cc: dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH v3 2/5] Link Bonding PMD Library
> > > > (librte_eal/librte_ether link bonding support changes)
> > > >
> > > > On Fri, Jun 13, 2014 at 03:41:59PM +0100, Declan Doherty wrote:
> > > > > Updating functionality in EAL to support adding link bonding
> > > > > devices via –vdev option. Link bonding devices will be
> > > > > initialized after all physical devices have been probed and
> > > > > initialized.
> > > > >
> > > > > Signed-off-by: Declan Doherty <declan.doherty at intel.com>
> > > > > ---
> > > > >  lib/librte_eal/common/eal_common_dev.c      | 66
> > > > +++++++++++++++++++++++++++--
> > > > >  lib/librte_eal/common/eal_common_pci.c      |  6 +++
> > > > >  lib/librte_eal/common/include/eal_private.h |  7 +++
> > > > >  lib/librte_eal/common/include/rte_dev.h     |  1 +
> > > > >  lib/librte_ether/rte_ethdev.c               | 34 +++++++++++++--
> > > > >  lib/librte_ether/rte_ethdev.h               |  7 ++-
> > > > >  lib/librte_pmd_pcap/rte_eth_pcap.c          | 22 +++++-----
> > > > >  lib/librte_pmd_ring/rte_eth_ring.c          | 32 +++++++-------
> > > > >  lib/librte_pmd_ring/rte_eth_ring.h          |  3 +-
> > > > >  lib/librte_pmd_xenvirt/rte_eth_xenvirt.c    |  2 +-
> > > > >  10 files changed, 144 insertions(+), 36 deletions(-)
> > > > >
> > > > > diff --git a/lib/librte_eal/common/eal_common_dev.c
> > > > b/lib/librte_eal/common/eal_common_dev.c
> > > > > index eae5656..b50c908 100644
> > > > > --- a/lib/librte_eal/common/eal_common_dev.c
> > > > > +++ b/lib/librte_eal/common/eal_common_dev.c
> > > > > @@ -75,14 +75,28 @@ rte_eal_dev_init(void)
> > > > >
> > > > >  	/* call the init function for each virtual device */
> > > > >  	TAILQ_FOREACH(devargs, &devargs_list, next) {
> > > > > +		uint8_t bdev = 0;
> > > > >
> > > > >  		if (devargs->type != RTE_DEVTYPE_VIRTUAL)
> > > > >  			continue;
> > > > >
> > > > >  		TAILQ_FOREACH(driver, &dev_driver_list, next) {
> > > > > -			if (driver->type != PMD_VDEV)
> > > > > +			/* RTE_DEVTYPE_VIRTUAL can only be a virtual or bonded
> > > > device*/
> > > > > +			if (driver->type != PMD_VDEV && driver->type !=
> > > > PMD_BDEV)
> > > > >  				continue;
> > > > >
> > > > > +			/*
> > > > > +			 * Bonded devices are not initialize here, we do it later in
> > > > > +			 * rte_eal_bonded_dev_init() after all physical devices
> > > > have been
> > > > > +			 * probed and initialized
> > > > > +			 */
> > > > > +			if (driver->type == PMD_BDEV &&
> > > > > +					!strncmp(driver->name, devargs-
> > > > >virtual.drv_name,
> > > > > +							strlen(driver->name))) {
> > > > > +				bdev = 1;
> > > > > +				break;
> > > > > +			}
> > > > > +
> > > > I really don't think you need to add a new device type for bonded devs.  Its got
> > > > no specific hardware that it drives, and you configure it with a --vdev
> > command,
> > > > so treat it as one here.  I understand that you need to pass additional
> > > > information about slaves to a bonded device, which is fine, but you can do that
> > > > with kvargs pretty easily, at which point its just another vdev. The only other
> > > > requirement is that you initilize the bonded vdev after the slave vdevs have
> > > > been created, which you can do by any of several methods (a priority field to
> > > > indicate that bonded drivers should be initilized last/later, a deferral return
> > > > code from the init routine, or by dead reckoning via the careful construction
> > of
> > > > the application command line (placed the bonded --vdev option last on the
> > > > command line argument list at run time).
> > > >
> > > It was my initial intent to do as you have describe above, but the physical devices
> > > cause a real issue here, physical devices don't call through to
> > rte_eth_dev_allocate until
> > > during rte_eal_pci_probe call, so it's not possible to initialize the bonded device
> > from
> > > within rte_eal_dev_init as the physical devices have not been fully initialized at
> > this
> > > point, as a port_id has not been allocated and can't be added as bonding slaves. I
> > don't
> > > see away around this without changing the EAL API, which I've tried to avoid with
> > this solution.
> > >
> > > Ordering isn't an issue, and can easily be solved if the above problem didn't
> > exist, and although
> > > a new device type isn't technically required, I think it's a cleaner solution than
> > doing
> > > string comparisons.
> > >
> > 
> > 
> > Ugh, you're right. That stinks.  We still shouldn't just drop the init code in
> > with the pci probe code though, especially if dpdk starts supporting more
> > stacked network devices. A notification system for interface creation that you
> > could monitor for pci_device addresses (B:D:F tupples) would be nice, but might
> > be overkill since dpdk doesn't really support device freeing yet.  what might be
> > useful is a generic ordering mechanism for rte_eal_dev_init.  If you added a
> > priority field to the rte_driver structure, you could create an enumeration for
> > the field to define when the init routine for that driver was called something
> > like:
> > typdef enum {
> > 	PMD_INIT_FIRST = 0,
> > 	PMD_INIT_POST_PROBE = 1,
> > }
> > 
> > Then you could make multiple calls to rte_eal_dev_init from rte_eal_init at the
> > appropriate places in the initalization process, each time passing one of the
> > above values.  rte_eal_dev_init would only initalize those drivers whos priority
> > matched the passed in priority.  That would be helpful for interfaces like
> > vlans, vxlans, tunnels, etc.
> > 
> I'm not sure I understand how you intend this to work, are you proposing that 
> rte_eal_dev_init is call multiple times by the user pre and post rte_eal_pci_probe
Mostly, I'm proposing that rte_eal_pci_probe be called by rte_eal_init, and that
we then call rte_eal_dev_init multiple times from within rte_eal_dev_init,
passing a parameter to indicate the priority level of device that should be
initalized within that call.

> this doesn't seem like an idea solution either. I'm not 100% clear why 
> rte_eal_pci_probe is currently called by  the application code and not initiated
>  from within rte_eal_dev_init, if this was the case we would be able to figure out 
> a dependency tree for initialization of devices, based on the parsed input 
> arguments without having extra user input to or multiple calls of rte_eal_dev_init.
> 
I agree, I think you should move it into its proper place within rte_eal_init,
though I think multiple calls to rte_eal_dev_init is perfectly acceptible in
this scenario.  You certainly could parse the command line to build a dependency
tree if you wish, though I don't think thats crruently overly complex. The current
dependency tree is static at (physical devices, virtual devices, probe, stacked
devices).  If you need something more complex later, you can re-write it freely
as it will be an internal library mechanism with no external visibility.

> Either way I don't think that any of these solution are trivial and they would 
> need to be thoroughly tested, which I don't think will be possible in the time
> frame for 1.7. If my current implementation calling rte_eal_bonded_dev_init
> from within rte_eal_pci_probe is unacceptable, then I propose that I remove 
> the EAL changes for --vdev initialization of bonding devices from this release and
> only support the C API, and that an overall strategy is discussed for static device
>  initialization that is developed for 1.7.1 or 1.8.
>
I would assert that if you can't write the pmd to support the existing pmd
interfaces, then this probably needs to wait until 1.8.  I'll bow to consensus
on that, but I would think that, if it gets in now, the
additional work won't happen. 

Neil

> > > > >
> > > > > +static int
> > > > > +rte_eth_dev_name_unique(const char* name)
> > > > > +{
> > > > > +	unsigned i;
> > > > > +
> > > > > +	for (i = 0; i < nb_ports; i++) {
> > > > > +		if (strcmp(rte_eth_devices[i].data->name, name) == 0)
> > > > > +			return -1;
> > > > > +	}
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > >  struct rte_eth_dev *
> > > > > -rte_eth_dev_allocate(void)
> > > > > +rte_eth_dev_allocate(const char* name)
> > > > >  {
> > > > >  	struct rte_eth_dev *eth_dev;
> > > > >
> > > > > @@ -165,23 +179,37 @@ rte_eth_dev_allocate(void)
> > > > >  	if (rte_eth_dev_data == NULL)
> > > > >  		rte_eth_dev_data_alloc();
> > > > >
> > > > > +	if (rte_eth_dev_name_unique(name)) {
> > > > > +		PMD_DEBUG_TRACE("Ethernet Device with name %s already
> > > > allocated!\n");
> > > > > +		return NULL;
> > > > > +	}
> > > > > +
> > > > This seems fairly racy if you allow dynamic device creation at run time from
> > the
> > > > application, if multiple threads attempt to create bonds in parallel.
> > >
> > > True but if this is an issue then there probably should be some locking around
> > this allocation anyway.
> > Agreed the allocation of the port id is also racy.
> > 
> > > >
> > > > >  /**
> > > > > diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c
> > > > b/lib/librte_pmd_pcap/rte_eth_pcap.c
> > > > Hmm, we're modifying other pmds for the naming feature, I think it would be
> > best
> > > > split out into a separate patch.  Something entitled "support unique interface
> > > > naming for virtual pmds" or something.
> > >
> > > True, but if I do that, I will need to break support for virtual devices in this patch
> > set, and
> > > I would prefer to keep this functionality working initial release of link bonding.
> > Also another
> > > patchset will be required at some point in the near future to tidy up/rationalize
> > the current
> > > handling of virtual devices identification and these changes do not change
> > functionally how
> > > these devices work.
> > 
> > I'm not sure I'm following what you're saying here.  If you split this into
> > another patch set, and order earlier in this series, then you won't break
> > anything.  I'm not suggesting a functional change here, just a organizational
> > one, so that this change doesn't do more than the changelog indicates.
> > 
> 
> Sorry I misunderstood, I thought that you where proposing that I create a completely new 
> patchset separate to the link bonding release set for these changes.
> 
> > Neil
> 
> Declan


More information about the dev mailing list