[dpdk-dev] [PATCH] net/mlx4: workaround to verbs wrong error return

Adrien Mazarguil adrien.mazarguil at 6wind.com
Tue Aug 1 12:50:37 CEST 2017


Hi Matan,

(snipping a bit of unnecessary context)

On Tue, Aug 01, 2017 at 10:12:29AM +0000, Matan Azrad wrote:
[...]
> > From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
[...]
> > On Mon, Jul 31, 2017 at 04:56:33PM +0000, Matan Azrad wrote:
[...]
> > > > From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
[...]
> > > > On Mon, Jul 31, 2017 at 02:15:09PM +0300, Matan Azrad wrote:
[...]
> > > > > @@ -1278,7 +1287,10 @@ struct rte_flow *
> > > > >  	for (flow = LIST_FIRST(&priv->flows);
> > > > >  	     flow;
> > > > >  	     flow = LIST_NEXT(flow, next)) {
> > > > > -		claim_zero(ibv_destroy_flow(flow->ibv_flow));
> > > > > +		/* Current verbs does not allow to check real
> > > > > +		 * errors when the device was plugged out.
> > > > > +		 */
> > > > > +		ibv_destroy_flow(flow->ibv_flow);
> > > > >  		flow->ibv_flow = NULL;
> > > > >  		DEBUG("Flow %p removed", (void *)flow);
> > > > >  	}
> > > > > --
> > > > > 1.8.3.1
> > > > >
> > > >
> > > > This approach looks way too intrusive. How about making the
> > > > claim_zero() definition not fail but still complain when compiled
> > > > against a broken Verbs version instead?
> > > >
> > > >  #include "mlx4_autoconf.h"
> > > >
> > > >  [...]
> > > >
> > > >  #ifndef HAVE_BROKEN_VERBS
> > > >  #define claim_zero(...) assert((__VA_ARGS__) == 0)  #else /*
> > > > HAVE_BROKEN_VERBS */  #define claim_zero(...) \
> > > >      (void)(((__VA_ARGS__) == 0) || \
> > > >             DEBUG("Assertion `" # __VA_ARGS__ "' failed (IGNORED)"))
> > > > #endif /* HAVE_BROKEN_VERBS */
> > > >
> > > > You could use auto-config-h.sh to generate the HAVE_BROKEN_VERBS
> > > > definition in mlx4_autoconf.h (see mlx4 Makefile) based on some
> > > > symbol, macro or type that only exists or doesn't exist yet in
> > > > problematic releases for instance.
> > > >
> > >
> > > I agree with the dependence on broken verbs but there are other places
> > > in mlx4 code which use claim_zero assertion, So this suggestion will
> > > hurt other validations.
> > 
> > Well, half broken is no better than completely broken in my opinion, so while
> > Verbs is being repaired, users debugging the mlx4 PMD will temporarily get
> > debug traces without the ensuing abort(). At least the behavior will be
> > consistent.
> > 
> > Think about it, they already have to go out of their way to enable
> > CONFIG_RTE_LIBRTE_MLX4_DEBUG, if they know they aren't using hot-plug
> > but still use a buggy Verbs version, they can disable HAVE_BROKEN_VERBS to
> > revert to the normal behavior.
> > 
> 
> priv_flow_validate and priv_mac_addr_add functions calls also are wrapped by claim_zero,
> These are not ibv_destroy functions and don't depend only in broken verbs,
> The user want to be aborted in those cases otherwise he would have put there trace print 
> as you suggest.

As the only exceptions, if you had to change something it would be these
instance in order to be less intrusive. But I suggest you not to since, 
again, this is a workaround for a problem that is not under PMD control.

> > > What's about to create new define depend on broken verbs for the specific
> > assertions?
> > > It will be still intrusive but more accurate.
> > 
> > One reason I prefer the code to remain unchanged is that I'm currently
> > refactoring the entire PMD. Maintaining the above patch (picking the right
> > ibv_*() calls that return a consistent value) will be difficult and an intrusive
> > patch won't be reverted easily once Verbs is fixed.
> >
> 
> You can just find all claim_zero_new and replace it with claim_zero. 

So what if I'm adding new ibv_destroy_qp() calls, can I expect them to work
consistently or will each of them have to be validated against a possible
assert() failure during hot-plug?

Note that ibv_destroy_qp() is only one example among many, the work you've
done for this patch will have to be performed every time new code is
added. I don't find it particularly convenient.

> > All these claim_zero() checks ensure the PMD destroys Verbs resources in
> > the proper order (e.g. a flow before the QP it is associated with). If the
> > return value of any of these cannot be relied on, it's useless to only check
> > some of them.
> 
> 
> priv_flow_validate and priv_mac_addr_add functions are not destroy verbs resources.

Right, and that's not a problem. Unfortunate users still get a nice
debugging message in the unlikely case of a failure for these.

> > Moreover if ibv_destroy_something() wrongly returns an error when the
> > device is unplugged, I think this can happen to the calls not part of your
> > patch, i.e. all of them, so working around it at the macro definition level
> > makes sense.
> 
> I checked with failsafe tests and found that only the specific destroy functions are problematic.

What about other applications and corner cases, such as when the device is
unplugged while the PMD is being initialized? Since the control path is
bound by system calls, the PMD might actually sleep for a non negligible
amount of time (there is really no upper limit to how long) and the device
could disappear at any point. Subsequent ibv_*() calls would return
unexpected errors.

I'm sure you cannot verify all corner cases, so let's avoid them.

> > If you don't know what symbol can be relied on in OFED 4.2 to define
> > HAVE_BROKEN_VERBS (which is just an example, you can use another name
> > BTW), maybe you can add a compilation option to enable manually in case of
> > trouble? Something verbose like:
> > 
> >  CONFIG_RTE_LIBRTE_MLX4_DEBUG_BROKEN_VERBS_ASSERT=n
> > 
> > Which will have to be documented.

What about the above suggestion?

-- 
Adrien Mazarguil
6WIND


More information about the dev mailing list