[dpdk-stable] [dpdk-dev] [PATCH] net/failsafe: check correct error code while handling sub-device add

Gaëtan Rivet grive at u256.net
Fri Oct 9 18:20:03 CEST 2020


On 05/10/20 11:42 +0200, Gaëtan Rivet wrote:
> Hi,
> 
> On 02/10/20 17:01 -0700, Long Li wrote:
> > From: Long Li <longli at microsoft.com>
> > 
> > When adding a sub-device, it's possible that the sub-device is configured
> > successfully but later fails to start. This error should not be masked.
> 
> Some of those errors are meant to be masked: -EIO, when the device is
> marked as removed at the ethdev level (see eth_err() in rte_ethdev.c:819).
> 
> > The driver needs to check the error status to prevent endless loop of
> > trying to start the sub-device.
> 
> If the ethdev layer error is due to the device being removed, and
> failsafe loops on trying to sync the eth device to its own state, then
> an RMV event should have been emitted but wasn't or it was missed by
> failsafe.
> 
> If the ethdev layer error is *not* due to the device being removed, the
> error should be != -EIO, and sdev->remove should not be set, so fs_err()
> should not mask it and it should be seen by the app.
> 
> Can you provide the following details:
> 
>  * What is the return code of rte_eth_dev_start() that is masked in your
>    start loop?
> 
>  * Is the device marked as removed in failsafe?
> 
>  * Is the device marked as removed in ethdev?
> 
>  * Was there an RMV event generated for the device? Whether yes or no,
>    is it correct?
> 
> Thanks,
> 

Hello Li,

I've found the previous mail thread [1] where you described how you got this
error. In your description, you say that you try unplug then quick
replug, before any event is processed?

If that's the case, it seems a clear race condition, and an issue of
missing the removal event of the device. I would not say yet that the
bug is in failsafe, but it could be in ethdev.

Can you please check whether the device removal event was properly
generated in rte_ethdev? Failsafe (and any other hotplug support layer
actually) will depend on it so it should be first checked to work.

Thanks,

[1]: http://mails.dpdk.org/archives/dev/2020-September/182977.html

-- 
Gaëtan


More information about the stable mailing list