[dpdk-dev] [PATCH v2 00/13] introduce fail-safe PMD

Neil Horman nhorman at tuxdriver.com
Sat Mar 18 20:51:47 CET 2017


On Fri, Mar 17, 2017 at 11:56:21AM +0100, Gaëtan Rivet wrote:
> On Thu, Mar 16, 2017 at 04:50:43PM -0400, Neil Horman wrote:
> > On Wed, Mar 15, 2017 at 03:25:37PM +0100, Gaëtan Rivet wrote:
> > > On Wed, Mar 15, 2017 at 12:15:56PM +0100, Thomas Monjalon wrote:
> > > > 2017-03-15 03:28, Bruce Richardson:
> > > > > On Tue, Mar 14, 2017 at 03:49:47PM +0100, Gaëtan Rivet wrote:
> > > > > > - In the bonding, the init and configuration steps are still the
> > > > > >  responsibility of the application and no one else. The bonding PMD
> > > > > >  captures the device, re-applies its configuration upon dev_configure()
> > > > > >  which is actually re-applying part of the configuration already  present
> > > > > > within the slave eth_dev (cf rte_eth_dev_config_restore).
> > > > > >
> > > > > > - In the fail-safe, the init and configuration are both the
> > > > > >  responsibilities of the fail-safe PMD itself, not the application
> > > > > >  anymore. This handling of these responsibilities in lieu of the
> > > > > >  application is the whole point of the "deferred hot-plug" support, of
> > > > > >  proposing a simple implementation to the user.
> > > > > >
> > > > > > This change in responsibilities is the bulk of the fail-safe code. It
> > > > > > would have to be added as-is to the bonding. Verifying the correctness
> > > > > > of the sync of the initialization phase (acceptable states of a device
> > > > > > following several events registered by the fail-safe PMD) and the
> > > > > > configuration items between the state the application believes it is in
> > > > > > and the fail-safe knows it is in, is the bulk of the fail-safe code.
> > > > > >
> > > > > > This function is not overlapping with that of the bonding. The reason I
> > > > > > did not add this whole architecture to the bonding is that when I tried
> > > > > > to do so, I found that I only had two possibilities:
> > > > > >
> > > > > > - The current slave handling path is kept, and we only add a new one
> > > > > >  with additional functionalities: full init and conf handling with
> > > > > >  extended parsing capabilities.
> > > > > >
> > > > > > - The current slave handling is scraped and replaced entirely by the new
> > > > > >  slave management. The old capturing of existing device is not done
> > > > > >  anymore.
> > > > > >
> > > > > > The first solution is not acceptable, because we effectively end-up with
> > > > > > a maintenance nightmare by having to validate two types of slaves with
> > > > > > differing capabilities, differing initialization paths and differing
> > > > > > configuration code.  This is extremely awkward and architecturally
> > > > > > unsound. This is essentially the same as having the exact code of the
> > > > > > fail-safe as an aside in the bonding, maintening exactly the same
> > > > > > breadth of code while having muddier interfaces and organization.
> > > > > >
> > > > > > The second solution is not acceptable, because we are bending the whole
> > > > > > existing bonding API to our whim. We could just as well simply rename
> > > > > > the fail-safe PMD as bonding, add a few grouping capabilities and call
> > > > > > it a day. This is not acceptable for users.
> > > > > >
> > > > > If the first solution is indeed not an option, why do you think this
> > > > > second one would be unacceptable for users? If the functionality remains
> > > > > the same, I don't see how it matters much for users which driver
> > > > > provides it or where the code originates.
> > > > >
> > > 
> > > The problem with the second solution is also that bonding is not only a PMD.
> > > It exposes its own public API that existing applications rely on, see
> > > rte_eth_bond_*() definitions in rte_eth_bond.h.
> > > 
> > > Although bonding instances can be set up through command-line options,
> > > target "users" are mainly applications explicitly written to use it.
> > > This must be preserved for no other reason that it hasn't been deprecated.
> > > 
> > I fail to see how either of your points are relevant.  The fact that the bonding
> > pmd exposes an api to the application has no bearing on its ability to implement
> > a hot plug function.
> > 
> 
> This depends on the API making sense in the context of the new
> functionality.
> 
Well, the api should always make sense in the context of any added
functionality, but it seems to me thats just another way of saying you might
need to make some modifications to the bonding api.  I'm not saying thats
necessecarily going to have to be the case, but while I'm a big proponent of ABI
stability, I would support an API change to add valid functionality if it were a
big enough feature.  Though again, I don't think thats really necessecary if you
rethink the model a little bit

> This API offers to add and remove slaves to a bonding and to configure them.
> In the fail-safe arch, it is not possible to add and remove slaves from the
> grouping. Doing so would mean adding and removing devices from internal EAL
> structures.
> 
Ok, so you update the bonding option parser to include a fail-safe mode, which
only supports two slaves, one of which is an instance of the null-pmd and the
other is to be added at a later date in response to a hot plug event.  In the
fail safe mode, after initial option parsing, bonds operating in this mode
return an error to the application when/if it attempts to remove a slave from a
bond.  That doesn't seem so hard to me.

> It is also invalid to try to configure a fail-safe slave. An application
> only configures a fail-safe device, which will in turn configure its slaves.
> This separation follows from the nature of a device failover.
> 
See my previous mail, you augment the null pmd to allow storage of aribtrary
configuration strings or key/value pairs when operating as a bonded slave.  The
bond is then capable of retrieving configuration from that null instance and
pushing it to the real slave on a hot plug event.

> As seen previously, the fail-safe PMD handles different responsibilities
> from the bonding PMD. It is thus necessary to make different assumptions
> concerning what it can and cannot do with a slave.
> 
Only because you seem to refuse to think about the failsafe model in any other
way than the one you have implemented.

> > > Also, trying to implement this API for the device failover function would
> > > implies a device capture down to the devargs parsing level. This means that
> > > a PMD could request taking over a device, messing with the internals of the
> > > EAL: devargs list and busses lists of devices. This seems unacceptable.
> > > 
> > Why?  You just said yourself above that, while there is a devargs interface to
> > the bonding driver, there is also an api, which is the more used method to
> > configure bonding.  I'm not sure I agree with that, but I think its beside the
> > point.  Your PMD also requires configuration, and it appears necessecary that
> > you do so from the command line (you need to specifically ennumerate the
> > subdevices that you intend to provide failsafe behavior to).  I see no reason
> > why such a feature cant' be added to bonding, and the null pmd used as a
> > standin device, should the ennumerated device not yet exist).
> > 
> > To your argument regarding about taking over a device, I don't see how you find
> > that unacceptable, as it is precisely what the bonding driver does today, in the
> > sense that it allows an application to assign a master/slave relationship to
> > devices right now.  I see no reason that we can't convey the right and ability
> > for bonding to do that dynamically based on configuration.
> > 
> 
> No, the bonding PMD does not take over a device. It only cares about the
> ether layer for its link failover. It does not care about parsing parameters
> of a slave, probing devices, detaching drivers. It does not remove a device
> from the pci_device_list in the EAL for example.
> 
Again, please take a moment and think about how else your failsafe model might
be implemented in the context of bonding.  Right now, you are asserting that
your failsafe model can't be implemented in any other way becausee of the
decisions you have made.  If you change how you think of failsafe, you'll see it
can be done in other ways.

> Doing so would imply exposing private internals structures from the EAL,
> messing with elements reserved while doing the EAL init. This requires
> controlling a priority in the device initialization order to create the
> over-arching ones last (which is a hacky solution). It would wreak havoc
> with the DPDK arch.
> 
> The fail-safe PMD does not rely on the EAL for handling its slaves. This is
> what I explained before, when I touched upon the differing responsibilities
> implied by the differences in nature between a link failover and a device
> failover.
> 
> > > The bonding API is thus in conflict with the concept of a device failover in
> > > the context of the current DPDK arch.
> > > 
> > I really don't see how you get to this from your argument above.
> > 
> 
> The current DPDK arch does not expose EAL elements to be modified by PMDs,
> and with good reasons. In this context, it is not possible to handle slaves
> correctly for a device failover in the bonding PMD,
> because the bonding PMD from the get-go expects the EAL to handle its slaves
> on a device level.
> 
> > > > > Despite all the discussion, it still just doesn't make sense to me to
> > > > > have more than one DPDK driver to handle failover - be it link or
> > > > > device. If nothing else, it's going to be awkward to explain to users
> > > > > that if they want fail-over for when a link goes down they have to use
> > > > > driver A, but if they want fail-over when a NIC gets hotplugged they use
> > > > > driver B, and if they want both kinds of failover - which would surely
> > > > > be the expected case - they need to use both drivers. The usability is
> > > > > a problem here.
> > > 
> > > Having both kind of failovers in the same PMD will always lead to the first
> > > solution in some form or another.
> > > 
> > It really isn't because you can model hotplug behavior as a trival form of the
> > failover that bonding does now (i.e. failover between a null device and a
> > preferred real device).
> > 
> 
> The preferred real device still has to be created / destroyed. It still
> relies on EAL entry points for handling. It still puts additional
> responsibilities on a PMD. Those responsibilities are expressed in sub
> layers clearly defined in the fail-safe PMD. You would have to create these
> sub-layers in some form in the bonding for it to be able to create a
> preferred real device at some point. This additional way of handling slaves
> has already been discussed as inducing a messy architecture to the bonding
> PMD.
> 
> > > I am sure we can document all this in a way that does no cause users
> > > confusion, with the help of community feedback such as yours.
> > > 
> > > Perhaps "net_failsafe" is a misnomer? We also thought about "net_persistent"
> > > or "net_hotplug". Any other ideas?
> > > 
> > > It is also possible for me to remove the failover support from this series,
> > > only providing deferred hot-plug handling at first. I could then send the
> > > failover support as separate patches to better assert that it is a useful,
> > > secondary feature that is essentially free to implement.
> > > 
> > I think thats solving the wrong problem.  I've no issue with the functionality
> > in this patch, its really the implementation that we are all arguing against.
> > 
> > > >
> > > > It seems everybody agrees on the need for the failsafe code.
> > > > We are just discussing the right place to implement it.
> > > >
> > > > Gaetan, moving this code in the bonding PMD means replacing the bonding
> > > > API design by the failsafe design, right?
> > > > With the failsafe design in the bonding PMD, is it possible to keep other
> > > > bonding features?
> > > 
> > > As seen previously, the bonding API is incompatible with device failover.
> > > 
> > Its not been seen previously, you asserted it to be so, and I certainly disagree
> > with that assertion.  I think others might too.
> > 
> 
> I also explained at length my assertion. I can certainly expand further if
> necessary, but you need to point the elements you disagree with.
> 
> > Additionally, its not really in line with this discussion, but in looking at
> > your hotplug detection code, I think somewhat lacking.  Currently you seem to
> > implement this with a timer that wakes up and checks for device existance, which
> > is pretty substandard in my mind.  Thats going to waste cpu cycles that might
> > lead to packet loss.  I'd really prefer to see you augment the eal library with
> > an event handling code (it can tie into udev in linux and kqueue in bsd), and
> > create a generic event hook, that we can use to detect device adds/removes
> > without having to wake up constantly to see if anything has changed.
> > 
> > 
> 
> I think it's fine. We can discuss it further once we agree on the form the
> hot-plug implementation will take in the DPDK.
> 


Well, until then, I feel like we're talking past one another at this point, and
so I'll just say this driver is a NAK for me.

Neil


> > > Having some features enabled solely for one kind of failover, while
> > > having
> > > specific code paths for both, seems unecessarily complicated to me ;
> > > following suite with my previous points about the first solution.
> > > 
> > > >
> > > > In case we do not have a consensus in the following days, I suggest to add
> > > > this topic in the next techboard meeting agenda.
> 
> Best regards,
> -- 
> Gaëtan Rivet
> 6WIND
> 


More information about the dev mailing list