[dpdk-dev] [dpdk-stable] [PATCH v3] net/failsafe: fix calling device during RMV events

Ferruh Yigit ferruh.yigit at intel.com
Wed Nov 29 20:17:50 CET 2017


On 10/23/2017 1:36 AM, Gaëtan Rivet wrote:
> On Mon, Oct 23, 2017 at 07:17:41AM +0000, Ophir Munk wrote:
>> Hi Gaetan,
>> Thanks for your quick reply. Please see comments inline.
>>
>> Regards,
>> Ophir
>>
>>> -----Original Message-----
>>> From: Gaëtan Rivet [mailto:gaetan.rivet at 6wind.com]
>>> Sent: Friday, October 20, 2017 1:35 PM
>>> To: Ophir Munk <ophirmu at mellanox.com>
>>> Cc: dev at dpdk.org; Thomas Monjalon <thomas at monjalon.net>; Olga Shern
>>> <olgas at mellanox.com>; stable at dpdk.org
>>> Subject: Re: [PATCH v3] net/failsafe: fix calling device during RMV events
>>>
>>> Hi Ophir,
>>>
>>> Sorry about the delay,
>>> I have a few remarks, I think this patch could be simpler.
>>>
>>> First, about the commit logline:
>>> "calling device" is not descriptive enough. I'd suggest
>>>
>>>     net/failsafe: fix device configuration during RMV events
>>>
>>> But I'm not a native speaker either, so use it if you think it is better, or don't,
>>> it's only a suggestion :).
>>>
>>> On Thu, Oct 05, 2017 at 10:42:08PM +0000, Ophir Munk wrote:
>>>> This commit prevents control path operations from failing after a sub
>>>> device removal.
>>>>
>>>> Following are the failure steps:
>>>> 1. The physical device is removed due to change in one of PF
>>>> parameters (e.g. MTU) 2. The interrupt thread flags the device 3.
>>>> Within 2 seconds Interrupt thread initializes the actual device
>>>> removal, then every 2 seconds it tries to re-sync (plug in) the
>>>> device. The trials fail as long as VF parameter mismatches the PF
>>> parameter.
>>>> 4. A control thread initiates a control operation on failsafe which
>>>> initiates this operation on the device.
>>>> 5. A race condition occurs between the control thread and interrupt
>>>> thread when accessing the device data structures.
>>>>
>>>> This commit prevents the race condition in step 5. Before this commit
>>>> if a device was removed and then a control thread operation was
>>>> initiated on failsafe - in some cases failsafe called the sub device
>>>> operation instead of avoiding it. Such cases could lead to operations
>>> failures.
>>>>
>>>
>>> This is a nitpick, but as said earlier, this is not preventing the race condition.
>>> This race is still present and can still wreak havok on unsuspecting users.
>>>
>>> If an application has a weak threading model, it will be subject to this race
>>> condition still. It is possible to prevent it fully with proper care from the
>>> application standpoint, but this is not specific to fail-safe and does not
>>> concern us here.
>>>
>>> Anyway, it's really a nitpick, I just wanted to point it out. This is not too
>>> important for this patch.
>>>
>>>> This commit fixes failsafe criteria to determine when the device is
>>>> removed such that it will avoid calling the sub device operations
>>>> during that time and will only call them otherwise.
>>>>
>>>> Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
>>>> Cc: stable at dpdk.org
>>>>
>>>> Signed-off-by: Ophir Munk <ophirmu at mellanox.com>
>>>> ---
>>>> v3:
>>>> 1. Rebase v2
>>>>
>>>> 2. Please ignore checkpatch checks on arguments re-usage - they are
>>> confirmed.
>>>> 	CHECK:MACRO_ARG_REUSE: Macro argument reuse ... possible side-
>>> effects?
>>>> 	#217: FILE: drivers/net/failsafe/failsafe_private.h:241:
>>>>
>>>> 3. Add rationales (copy from an email which accompanied v2):
>>>>
>>>> On Monday, September 11, 2017 11:31 AM, Gaetan Rivet wrote:
>>>>>
>>>>> Hi Ophir,
>>>>>
>>>>> On Sat, Sep 09, 2017 at 07:27:11PM +0000, Ophir Munk wrote:
>>>>>> This commit prevents control path operations from failing after a
>>>>>> sub device has informed failsafe it has been removed.
>>>>>>
>>>>>> Before this commit if a device was removed and then a control path
>>>>>
>>>>> Here are the steps if I understood correctly:
>>>>>
>>>>> 0. The physical device is removed
>>>>> 1. The interrupt thread flags the device 2. A control lcore
>>>>> initiates a control operation 3. The alarm triggers, waking up the eal-intr-
>>> thread,
>>>>>    initiating the actual device removal.
>>>>> 4. Race condition occurs between control lcore and interrupt thread.
>>>>>
>>>>> "if a device was removed" is ambiguous I think (are we speaking
>>>>> about the physical port? Is it only flagged? Is it after the removal of the
>>> device itself?).
>>>>> From the context I gather that you mean the device is flagged to be
>>>>> removed, but it won't be as clear in a few month when we revisit this bug
>>> :) .
>>>>>
>>>>> Could you please rephrase this so that the whole context of the
>>>>> issue is available?
>>>>>
>>>>
>>>> Done. Commit message was rephrased based on your comments
>>>>
>>>>>> operations was initiated on failsafe - in some cases failsafe
>>>>>> called the sub device operation instead of avoiding it. Such cases
>>>>>> could lead to operations failures.
>>>>>>
>>>>>> This commit fixes failsafe criteria to determine when the device
>>>>>> is removed such that it will avoid calling the sub device
>>>>>> operations during that time and will only call them otherwise.
>>>>>>
>>>>>
>>>>> This commit mitigates the race condition, reducing the probability
>>>>> for it to have an effect. It does not, however, remove this race
>>>>> condition, which is inherent to the DPDK architecture at the moment.
>>>>>
>>>>> A proper fix, a more detailed workaround and additional
>>>>> documentation warning users writing applications to mind their threads
>>> could be interesting.
>>>>>
>>>>
>>>> The race condition occurs in the last step and may lead to
>>>> segmentation faults (accessing data structures of the same device by 2
>>>> threads) The previous steps ("the physical device is removed", etc) were not
>>> recreated and tested but probably cannot lead to segmentation fault.
>>>>
>>>>> But let's focus on this patch for the time being.
>>>>>
>>>>>> Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
>>>>>> Cc: stable at dpdk.org
>>>>>>
>>>>>> Signed-off-by: Ophir Munk <ophirmu at mellanox.com>
>>>>>> ---
>>>>>>  drivers/net/failsafe/failsafe_ether.c |  1 +
>>>>>>  drivers/net/failsafe/failsafe_ops.c   | 52
>>>>> +++++++++++++++++++++++++++++------
>>>>>>  2 files changed, 45 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/failsafe/failsafe_ether.c
>>>>>> b/drivers/net/failsafe/failsafe_ether.c
>>>>>> index a3a8cce..1def110 100644
>>>>>> --- a/drivers/net/failsafe/failsafe_ether.c
>>>>>> +++ b/drivers/net/failsafe/failsafe_ether.c
>>>>>> @@ -378,6 +378,7 @@
>>>>>
>>>>> Could you please generate your patches with the function name in the
>>> diff?
>>>>
>>>> Done
>>>>
>>>>>
>>>>>>  				      i);
>>>>>>  				goto err_remove;
>>>>>>  			}
>>>>>> +			sdev->remove = 0;
>>>>>
>>>>> You are adding this here, within failsafe_eth_dev_state_sync, and
>>>>> removing it from the dev_configure ops.
>>>>>
>>>>> 10 lines above, the call to dev_configure is done, meaning that the
>>>>> remove flag was resetted at this point.
>>>>>
>>>>> Can you explain why you prefer resetting the flag here?
>>>>>
>>>>> The position of this flag reset will be dependent upon my subsequent
>>>>> remarks anyway, so hold that thought :) .
>>>>>
>>>>
>>>> The motivation for resetting the "remove" flag within
>>> failsafe_eth_dev_state_sync is as follows:
>>>> Previously to this patch the "remove" flag was designed to signal the need
>>> to remove the sub device.
>>>> Once the sub device was removed and before being reconfigured the
>>> "remove" flag was reset.
>>>>
>>>> After this patch the scope of the "remove" flag was *extended* to
>>>> indicate the sub device status as being "plugged out" by resetting this flag
>>> only after a successful call to failsafe_eth_dev_state_sync().
>>>> The "plug out" status could last a very long time (seconds, minutes, days,
>>> weeks, ...).
>>>>
>>>> Previously to this patch failsafe based the "plugged out" status on
>>>> the sub device state as being below ACTIVE however every 2 seconds
>>>> dev_configure() was called where the sub device was assigned sdev-
>>>>> state = DEV_ACTIVE; therefore the sub device state became ACTIVE for
>>> some time every 2 seconds.
>>>> This is where the race condition occurred: failsafe considered the sub
>>>> device as "Plugged in" for some time every 2 seconds (based on its ACTIVE
>>> state) while it was actually plugged out.
>>>>
>>>> After this patch the "Plugged out" status is based on the "remove" flag.
>>>>
>>>
>>> Sorry, I do not agree with this semantical change on the "remove" flag.
>>> You are essentially adding a new device state, which could be fine per se, but
>>> should not be done here.
>>>
>>> The enum dev_state is there for this purpose.
>>>
>>> The flag dev->remove, calls for an operation to be done upon the concerned
>>> device. It is not meant to become a new device state.
>>>
>>> A point about the work methodoly here: if you wanted to change this
>>> semantic, which could be legitimate and sometimes called for, you should
>>> have proposed it either during a discussion in a response to my previous
>>> email, or introducing the change as a separate patch. This point is important
>>> enough for it to have its own patch, meaning we would have a whole thread
>>> dedicated to it instead of having to interleave commentaries between
>>> related-but-separate diffs on the code.
>>>
>>> But anyway, if you think you need to express a PLUGOUT state, I'd suggest
>>> adding a state between DEV_UNDEFINED and DEV_PARSED.
>>> DEV_UNDEFINED means that the device is in limbo and has no existence per
>>> se (its parsing failed for example, it is not clear whether the parameters are
>>> correct, etc...). DEV_PLUGOUT could mean then that the device has been
>>> successfully probed at least once, meaning that it could possibly have
>>> residuals from this probing still there, or specific care to be taken when
>>> manipulating it.
>>>
>>> However, I'm not yet convinced that this new state is necessary. I think you
>>> can mitigate this race condition without having to add it. If you insist in
>>> introducing this state, please do so in a separate patch, with proper
>>> definition about the meaning of this state:
>>>
>>>   + When it should be valid for a device to be in this state.
>>>   + Which operation corresponds to getting into and out of this state.
>>>   + Why this state is interesting and what could not be expressed before
>>>     that is thus being fixed by introducing this state.
>>>
>>> But please verify twice whether you absolutely need to complexify the
>>> current fail-safe internals before going all in and basing your work upon it :)
>>>
>>
>> Indeed what I am currently missing in failsafe is knowing the device is in a PLUGOUT state.
>> Your suggestion to add a new state DEV_PLUGOUT cannot be used with the current implementation
>> as the device states are modified during an alarm hotplug handling every 2 seconds.
>> In fs_hotplug_alarm() we call failsafe_eth_dev_state_sync(dev) which eventually calls ->dev_configure(dev) 
>> where we assign: sdev->state = DEV_ACTIVE; 
>> then when sync fails fs_hotplug_alarm() calls failsafe_dev_remove(dev) which will call fs_dev_remove(sdev); where the sub devices
>> states are changed from ACTIVE down to DEV_UNDEFINED.
>> Having said that it means that during a PLUGOUT event - the states are modified with each invocation of the fs_hotplug_alarm
>> every 2 seconds. So even if we added DEV_PLUGOUT state - it will not remain fixed during the hotplug alarm handling.
>> I have also verified all of this with printouts.
>> When seeing a sub device in state "DEV_ACTIVE" - we cannot tell whether the device is currently in "PLUGOUT situation" or "PLUGIN situation"
>> This allows operations such as fs_mtu_set() on sub-devices which are in "PLUGOUT situation" while their state is 
>> DEV_ACTIVE to be manipulated, which I think should have been avoided.
>>
>> Please note fs_mtu_set() implementation:
>> tatic int fs_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
>> {
>>   ..
>>         FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
>> 	// ***** We are here while the device can be in a "PLUGOUT situation" ***
>>
>> To summarize:
>> I am missing a way to know in failsafe that a sub-device is currently plugged out
> 
> (sdev->state < DEV_ACTIVE && !sdev->remove) means that the device is
> plugged out.
> 
>> 1. I suggested extending the "remove" flag scope for this purpose. It has minimal changes with current failsafe implementation. You prefer not using "remove".
> 
> I prefer using it, but as a flag, not as a device state.
> 
>> 2. You suggested adding a new state DEV_PLUGOUT. I don't think it will work with current implementation (as explained above) or may require a redesign of current implementation.
>>
> 
> I do not suggest adding a new state DEV_PLUGOUT.
> I suggest using sdev->remove properly.
> 
>> Can you suggest another way?
>>
> 
> 0. In a separate commit, move the
>       sdev->remove = 0;
>    from fs_dev_configure, into the case DEV_UNDEFINED of the switch
>    within fs_dev_remove. This is cleaner and more logical anyway.
> 
> 1. Check that sdev->remove is not set in fs_find_next.
>    If sdev->remove is set, then the device should be skipped.
> 
> 2. In failsafe_dev_remove, do not use the FOREACH_SUBDEV_STATE iterator,
>    but manually iterate over all sub-devices using the subs_tail and
>    subs_head values.
>    As the generic iterator would skip over devices which have
>    sdev->remove set, this function would not work anymore.
> 
> 3. Find the places I have missed that needs this manual iterator to be
>    used instead of the FOREACH_SUBDEV{,_STATE} ones. I think there is at
>    least other place that I cannot recall, there might be more.
> 
> If you think this does not work, please tell me why, because then it
> means that I have misunderstood something about the race condition you
> are trying to fix.

Reminder of this patch remaining from previous release.

> 



More information about the dev mailing list