[dpdk-dev] [PATCH] net/failsafe: fix Rx clean race

Ferruh Yigit ferruh.yigit at intel.com
Thu Oct 26 21:10:21 CEST 2017


On 10/26/2017 9:20 AM, Gaëtan Rivet wrote:
> Hello Matan,
> 
> I think the commit log could be shorter.
> Proposing this, feel free to expand it if you prefer.
> 
> ---8<---
> 
> When removing a device, the fail-safe checks that it is not within its
> datapath before cleaning it.
> 
> When checking whether an Rx burst should be performed on a device, the
> remove flag is not checked. Thus the port could still enter its datapath
> and miss a removal round. Furthermore, there is a race between the
> thread removing the device and the polling thread.
> 
> Check the remove flag before entering a sub-device Rx burst when in safe
> mode. This check mitigates the aforementioned race condition.
> 
> --->8---
> 
> Otherwise,
> 
> On Sun, Oct 22, 2017 at 05:51:08AM +0000, Matan Azrad wrote:
>> In case of plug out, the RMV interrupt callback sets the remove flag of
>> the removed sub-device. The next hotplug alarm cycle should read this
>> flag and if the data path are clean it should remove the sub-device.
>>
>> In case of fail-safe RX burst calling from application, fail-afe tries
>> to call to all STARTED sub-device rx_burst functions. The remove flag
>> is not checked here and fail-safe may call to the removed sub-device
>> rx_burst function.
>>
>> The above 2 cases run in different threads and there is a race between
>> the removed sub-device RX clean check to the removed sub-device
>> rx_burst call makes the sub device RX unclean.
>>
>> If the application calls to rx_burst in loop, the probability to get RX
>> clean is not enough, especially when there are few sub-devices or if the
>> rx_burst function of the removed sub-device takes a lot of time.
>>
>> Each time the sub-device data path is unclean, the second oportunity to
>> check it again should be only in the hotplug alarm next cycle; the
>> default time between cycles is 2 seconds.
>>
>> In this loop when fail-safe tries to remove the sub-device, the
>> sub-device may appear back and fail-safe cannot plug it in back until
>> the removal process is completted. In this time fail-safe may lose the
>> primary sub-device services and may hurt application performance.
>>
>> This patch adds a remove flag check in safe rx_burst function.
>> By this way, at most one more hotplug alarm cycle is necessary
>> to get the sub-device clean for actual removal.
>>
>> Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions")
>> Cc: stable at dpdk.org
>>
>> Signed-off-by: Matan Azrad <matan at mellanox.com>
> 
> Acked-by: Gaetan Rivet <gaetan.rivet at 6wind.com>

Applied to dpdk-next-net/master, thanks.

(used suggested commit log, thanks.)


More information about the dev mailing list