[dpdk-dev] [PATCH] net/failsafe: fix exec parameter parsing error flow

Matan Azrad matan at mellanox.com
Wed Aug 30 17:32:46 CEST 2017


Hi Gaetan

> -----Original Message-----
> From: Gaëtan Rivet [mailto:gaetan.rivet at 6wind.com]
> Sent: Wednesday, August 30, 2017 5:25 PM
> To: Matan Azrad <matan at mellanox.com>
> Cc: dev at dpdk.org; Raslan Darawsheh <rasland at mellanox.com>;
> stable at dpdk.org
> Subject: Re: [PATCH] net/failsafe: fix exec parameter parsing error flow
> 
> On Wed, Aug 30, 2017 at 06:11:47AM +0000, Matan Azrad wrote:
> > Hi Gaetan
> >
> > > -----Original Message-----
> > > From: Gaëtan Rivet [mailto:gaetan.rivet at 6wind.com]
> > > Sent: Tuesday, August 29, 2017 7:34 PM
> > > To: Matan Azrad <matan at mellanox.com>
> > > Cc: dev at dpdk.org; Raslan Darawsheh <rasland at mellanox.com>;
> > > stable at dpdk.org
> > > Subject: Re: [PATCH] net/failsafe: fix exec parameter parsing error
> > > flow
> > >
> > > Hi Matan,
> > >
> > > On Tue, Aug 29, 2017 at 05:59:08PM +0300, Matan Azrad wrote:
> > > > The corrupted code returns success value in case of the execution
> > > > process output stream is empty(EOF).
> > > > It causes to segmentation fault while failsafe polls this command
> > > > line again, than gets success and tries to do hotplug add to the
> > > > sub device by uninitialized pointer dereferencing.
> > > >
> > >
> > > This is a bug and should be fixed, thanks.
> > >
> 
> Actually I am unable to reproduce this bug.
> 
> Do you have a fail-safe command line that would showcase this behavior?

testpmd -n 4  --vdev="net_failsafe0,mac=00:15:5d:44:4b:17,exec(/root/dfailsafe.sh,preferred,00:15:5d:44:4b:17),exec(/root/dfailsafe.sh,fallback,00:15:5d:44:4b:17,0)"  -w 0000:00:00.0 --  --burst=64 --mbcache=512 --portmask 0xf -i  --txd=4096 --rxd=4096  --enable-scatter  --nb-cores=7  --rxq=2 --txq=2 --rss-udp  --txqflags=0

just run the exec with non exists sh script.
 
> 
> > > > Morever, when the output is not empty but uncorrect, failsafe
> > > > returns error for its probe function while the expected behavior
> > > > is to do polling until the output is correct.
> > > >
> > >
> > > The expected behavior is for the fail-safe to return an error if the
> > > execution of the given command returns an error.
> > >
> > > The intention is that users writing such script would be able to
> > > output a blank lines in case there is nothing to probe, but still
> > > remain aware of issues during the execution of the command.
> > >
> > > The fail-safe ignores errors pertaining to absent devices due to its nature.
> > > This does not mean that it should ignore all errors and try to keep
> > > on going while everything else is on fire.
> > >
> > > The contract with the user is that "blank line" without other errors
> > > means "absent device". Garbled output or return code != 0 means
> > > runtime error and should be thrown to the user / application.
> > >
> >
> > OK, good, I would have signed this contract :)
> >
> > What's about if the parsing is not empty and out with error in the polling
> process?
> > I think in current code failsafe just continues normally and tries again on
> next polling time.
> > Because of this code I thought that if error occurs we should poll it again...
> >
> 
> It depends whether the fail-safe has already been initialized or not.
> During the initialization phase, any errors other than -ENODEV means that it
> must stop and force the user to look into it.
> 
> When initialization has finished, if polling errors occurs, the fail-safe will try to
> minimize service disruption to the potentially existing sub-devices. It thus
> discards the error and will try again later.
> 
> > Can you please add it (the contract) in failsafe documentation for exec
> parameter?
> >

Can you answer to the above question?

> > > > The fix changes the return value to be -ENODEV for this sub device
> > > > in the two cases.
> > > > By this way, failsafe tries to parse this sub device parameter by
> > > > exec method until the output is correct.
> > > >
> > >
> > > The issue is that this portion of the code will be heavily modified
> > > anyway. The errno handling is erroneous and must be fixed, which is
> > > in conflict with your patch.
> > >
> > > I will send the intended fix shortly, referencing this patch and the
> > > issue your highlighted, but both patch won't be compatible.
> > >
> >
> > Good, no problems.
> >
> > > > Fixes: a0194d828100 ("net/failsafe: add flexible device
> > > > definition")
> > > > Fixes: 35ffe4208140 ("net/failsafe: fix missing pclose after
> > > > popen")
> > > > Cc: stable at dpdk.org
> > > >
> > > > Signed-off-by: Matan Azrad <matan at mellanox.com>
> > > > ---
> > > >  drivers/net/failsafe/failsafe_args.c | 6 +++++-
> > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/failsafe/failsafe_args.c
> > > > b/drivers/net/failsafe/failsafe_args.c
> > > > index 645c885..61c55df 100644
> > > > --- a/drivers/net/failsafe/failsafe_args.c
> > > > +++ b/drivers/net/failsafe/failsafe_args.c
> > > > @@ -157,12 +157,16 @@ fs_execute_cmd(struct sub_device *sdev,
> char
> > > *cmdline)
> > > >  	ret = fs_parse_device(sdev, output);
> > > >  	if (ret) {
> > > >  		ERROR("Parsing device '%s' failed", output);
> > > > +		ret = -ENODEV;
> >
> > Remove the above line for probe function error report.
> >
> > > >  		goto ret_pclose;
> > > >  	}
> > > >  ret_pclose:
> > > >  	pclose_ret = pclose(fp);
> > > >  	if (pclose_ret) {
> > > > -		pclose_ret = errno;
> > > > +		if (errno == 0)
> > > > +			errno = -(pclose_ret = ret);
> > > > +		else
> > > > +			pclose_ret = errno;
> > > >  		ERROR("pclose: %s", strerror(errno));
> > > >  		errno = old_err;
> > > >  		return pclose_ret;
> > > > --
> > > > 2.7.4
> > > >
> > >
> > > Best regards,
> > > --
> > > Gaëtan Rivet
> > > 6WIND
> >
> > Thanks,
> > Matan Azrad
> 
> --
> Gaëtan Rivet
> 6WIND

Regards
Matan Azrad


More information about the dev mailing list