[dpdk-dev] [PATCH] net/mlx4: workaround to verbs wrong error return

Matan Azrad matan at mellanox.com
Tue Aug 1 14:18:36 CEST 2017



> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> Sent: Tuesday, August 1, 2017 1:51 PM
> To: Matan Azrad <matan at mellanox.com>
> Cc: dev at dpdk.org; Thomas Monjalon <thomas at monjalon.net>; Olga Shern
> <olgas at mellanox.com>; stable at dpdk.org
> Subject: Re: [PATCH] net/mlx4: workaround to verbs wrong error return
> 
> Hi Matan,
> 
> (snipping a bit of unnecessary context)
> 
> On Tue, Aug 01, 2017 at 10:12:29AM +0000, Matan Azrad wrote:
> [...]
> > > From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> [...]
> > > On Mon, Jul 31, 2017 at 04:56:33PM +0000, Matan Azrad wrote:
> [...]
> > > > > From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> [...]
> > > > > On Mon, Jul 31, 2017 at 02:15:09PM +0300, Matan Azrad wrote:
> [...]
> > > > > > @@ -1278,7 +1287,10 @@ struct rte_flow *
> > > > > >  	for (flow = LIST_FIRST(&priv->flows);
> > > > > >  	     flow;
> > > > > >  	     flow = LIST_NEXT(flow, next)) {
> > > > > > -		claim_zero(ibv_destroy_flow(flow->ibv_flow));
> > > > > > +		/* Current verbs does not allow to check real
> > > > > > +		 * errors when the device was plugged out.
> > > > > > +		 */
> > > > > > +		ibv_destroy_flow(flow->ibv_flow);
> > > > > >  		flow->ibv_flow = NULL;
> > > > > >  		DEBUG("Flow %p removed", (void *)flow);
> > > > > >  	}
> > > > > > --
> > > > > > 1.8.3.1
> > > > > >
> > > > >
> > > > > This approach looks way too intrusive. How about making the
> > > > > claim_zero() definition not fail but still complain when
> > > > > compiled against a broken Verbs version instead?
> > > > >
> > > > >  #include "mlx4_autoconf.h"
> > > > >
> > > > >  [...]
> > > > >
> > > > >  #ifndef HAVE_BROKEN_VERBS
> > > > >  #define claim_zero(...) assert((__VA_ARGS__) == 0)  #else /*
> > > > > HAVE_BROKEN_VERBS */  #define claim_zero(...) \
> > > > >      (void)(((__VA_ARGS__) == 0) || \
> > > > >             DEBUG("Assertion `" # __VA_ARGS__ "' failed
> > > > > (IGNORED)")) #endif /* HAVE_BROKEN_VERBS */
> > > > >
> > > > > You could use auto-config-h.sh to generate the
> HAVE_BROKEN_VERBS
> > > > > definition in mlx4_autoconf.h (see mlx4 Makefile) based on some
> > > > > symbol, macro or type that only exists or doesn't exist yet in
> > > > > problematic releases for instance.
> > > > >
> > > >
> > > > I agree with the dependence on broken verbs but there are other
> > > > places in mlx4 code which use claim_zero assertion, So this
> > > > suggestion will hurt other validations.
> > >
> > > Well, half broken is no better than completely broken in my opinion,
> > > so while Verbs is being repaired, users debugging the mlx4 PMD will
> > > temporarily get debug traces without the ensuing abort(). At least
> > > the behavior will be consistent.
> > >
> > > Think about it, they already have to go out of their way to enable
> > > CONFIG_RTE_LIBRTE_MLX4_DEBUG, if they know they aren't using
> > > hot-plug but still use a buggy Verbs version, they can disable
> > > HAVE_BROKEN_VERBS to revert to the normal behavior.
> > >
> >
> > priv_flow_validate and priv_mac_addr_add functions calls also are
> > wrapped by claim_zero, These are not ibv_destroy functions and don't
> > depend only in broken verbs, The user want to be aborted in those
> > cases otherwise he would have put there trace print as you suggest.
> 
> As the only exceptions, if you had to change something it would be these
> instance in order to be less intrusive. But I suggest you not to since, again,
> this is a workaround for a problem that is not under PMD control.
> 
> > > > What's about to create new define depend on broken verbs for the
> > > > specific
> > > assertions?
> > > > It will be still intrusive but more accurate.
> > >
> > > One reason I prefer the code to remain unchanged is that I'm
> > > currently refactoring the entire PMD. Maintaining the above patch
> > > (picking the right
> > > ibv_*() calls that return a consistent value) will be difficult and
> > > an intrusive patch won't be reverted easily once Verbs is fixed.
> > >
> >
> > You can just find all claim_zero_new and replace it with claim_zero.
> 
> So what if I'm adding new ibv_destroy_qp() calls, can I expect them to work
> consistently or will each of them have to be validated against a possible
> assert() failure during hot-plug?
> 
> Note that ibv_destroy_qp() is only one example among many, the work
> you've done for this patch will have to be performed every time new code is
> added. I don't find it particularly convenient.
> 
> > > All these claim_zero() checks ensure the PMD destroys Verbs
> > > resources in the proper order (e.g. a flow before the QP it is
> > > associated with). If the return value of any of these cannot be
> > > relied on, it's useless to only check some of them.
> >
> >
> > priv_flow_validate and priv_mac_addr_add functions are not destroy
> verbs resources.
> 
> Right, and that's not a problem. Unfortunate users still get a nice debugging
> message in the unlikely case of a failure for these.
> 

I don't think the user want debugging message for this functions - he want to be aborted
and to stop the program.
He could add DEBUG prints if he had want, those behaviors are really different.


> > > Moreover if ibv_destroy_something() wrongly returns an error when
> > > the device is unplugged, I think this can happen to the calls not
> > > part of your patch, i.e. all of them, so working around it at the
> > > macro definition level makes sense.
> >
> > I checked with failsafe tests and found that only the specific destroy
> functions are problematic.
> 
> What about other applications and corner cases, such as when the device is
> unplugged while the PMD is being initialized? Since the control path is bound
> by system calls, the PMD might actually sleep for a non negligible amount of
> time (there is really no upper limit to how long) and the device could
> disappear at any point. Subsequent ibv_*() calls would return unexpected
> errors.
> 
> I'm sure you cannot verify all corner cases, so let's avoid them.
> 
> > > If you don't know what symbol can be relied on in OFED 4.2 to define
> > > HAVE_BROKEN_VERBS (which is just an example, you can use another
> > > name BTW), maybe you can add a compilation option to enable manually
> > > in case of trouble? Something verbose like:
> > >
> > >  CONFIG_RTE_LIBRTE_MLX4_DEBUG_BROKEN_VERBS_ASSERT=n
> > >
> > > Which will have to be documented.
> 
> What about the above suggestion?
> 

So, I don't think there is perfect solution to this issue. 

I will take your suggestion to depend claim_zero in verbs version.
Firstly, I will check if I can get the information for broken verbs from somewhere,
If not, I will use compilation option.


> --
> Adrien Mazarguil
> 6WIND


More information about the dev mailing list