[dpdk-dev] [PATCH v2 4/4] net/failsafe: fix removed device handling

Matan Azrad matan at mellanox.com
Thu Dec 14 14:07:31 CET 2017


Hi Gaetan

> -----Original Message-----
> From: Gaëtan Rivet [mailto:gaetan.rivet at 6wind.com]
> Sent: Thursday, December 14, 2017 12:49 PM
> To: Matan Azrad <matan at mellanox.com>
> Cc: Adrien Mazarguil <adrien.mazarguil at 6wind.com>; Thomas Monjalon
> <thomas at monjalon.net>; dev at dpdk.org; stable at dpdk.org
> Subject: Re: [PATCH v2 4/4] net/failsafe: fix removed device handling
> 
> On Thu, Dec 14, 2017 at 10:40:22AM +0000, Matan Azrad wrote:
> > Hi Gaetan
> >
> 
> <snip>
> 
> > > >
> > > > If you add this check in the iterator itself, you would skip
> > > > removed devices before attempting operating upon them, right?
> > > >
> > > > Then it should probably help with your issue, unless you tested it
> > > > and verified that it didnt?
> > > >
> > > > Something like this:
> > > >
> > > > ---8<---
> > > >
> > > > diff --git a/drivers/net/failsafe/failsafe_private.h
> > > > b/drivers/net/failsafe/failsafe_private.h
> > > > index d81cc3ca6..62ddc0689 100644
> > > > --- a/drivers/net/failsafe/failsafe_private.h
> > > > +++ b/drivers/net/failsafe/failsafe_private.h
> > > > @@ -316,8 +316,12 @@ fs_find_next(struct rte_eth_dev *dev,
> > > >         subs = PRIV(dev)->subs;
> > > >         tail = PRIV(dev)->subs_tail;
> > > >         while (sid < tail) {
> > > > +               if (min_state > DEV_PROBED &&
> > > > +                   fs_is_removed(&sub[sid]))
> > > > +                       goto next;
> > > >                 if (subs[sid].state >= min_state)
> > > >                         break;
> > > > +next:
> > > >                 sid++;
> > > >         }
> > > >         *sid_out = sid;
> > > >
> > > > --->8---
> > > >
> > > > Only issue being that it is completely racy, but as this MT-unsafe
> > > > property is inescapable we might as well ignore it and go for KISS.
> > > >
> > > > If that's enough, I would prefer instead of having this additional
> > > > check added to all rte_eth operations.
> > > >
> > >
> > > Ok, actually you were right here to do it this way. The "is_removed"
> > > check needs to happen after the operation attempt to effectively
> > > mitigate the possible race. Checking before attempting the call will
> > > be much less effective.
> > >
> > > That being said, would it be cleaner to have eth_dev ops return
> > > -ENODEV directly, and check against it within fail-safe?
> > >
> >
> > I think that according to "is_removed" semantic we must return a Boolean
> value (Each value different from '0' means that the device is removed) like
> other functions in c library (for example isspace()).
> >
> 
> Sure, I wasn't discussing the interface proposed by
> rte_eth_dev_is_removed().
> 
> What I meant was to ask whether checking rte_eth_dev_is_removed()
> would be more interesting in the ethdev layer, making the eth_dev_ops
> return -ENODEV regardless of the previous error if this check is supported by
> the driver and signal that the port is removed.
> 
> I think this information could be interesting to other systems, not just fail-
> safe.
> 

Ok. Got you now.
Interesting approach - plan:
	1. update fs_link_update to use rte_eth* functions.
	2. maybe -EIO is preferred because -ENODEV is used for no port error?
	3. update all relevant rte_eth* to use "is_removed" in error flows(1 patch for flow APIs and 1 for the others).
	4. Change fs checks in error flows to check rte_eth* return values.
	5. Remove CC stable from commit massage.

What do you think?

> --
> Gaëtan Rivet
> 6WIND


More information about the dev mailing list