[dpdk-dev] [PATCH v3] net/failsafe: fix calling device during RMV events

Ophir Munk ophirmu at mellanox.com
Mon Oct 23 09:17:41 CEST 2017


Hi Gaetan,
Thanks for your quick reply. Please see comments inline.

Regards,
Ophir

> -----Original Message-----
> From: Gaëtan Rivet [mailto:gaetan.rivet at 6wind.com]
> Sent: Friday, October 20, 2017 1:35 PM
> To: Ophir Munk <ophirmu at mellanox.com>
> Cc: dev at dpdk.org; Thomas Monjalon <thomas at monjalon.net>; Olga Shern
> <olgas at mellanox.com>; stable at dpdk.org
> Subject: Re: [PATCH v3] net/failsafe: fix calling device during RMV events
> 
> Hi Ophir,
> 
> Sorry about the delay,
> I have a few remarks, I think this patch could be simpler.
> 
> First, about the commit logline:
> "calling device" is not descriptive enough. I'd suggest
> 
>     net/failsafe: fix device configuration during RMV events
> 
> But I'm not a native speaker either, so use it if you think it is better, or don't,
> it's only a suggestion :).
> 
> On Thu, Oct 05, 2017 at 10:42:08PM +0000, Ophir Munk wrote:
> > This commit prevents control path operations from failing after a sub
> > device removal.
> >
> > Following are the failure steps:
> > 1. The physical device is removed due to change in one of PF
> > parameters (e.g. MTU) 2. The interrupt thread flags the device 3.
> > Within 2 seconds Interrupt thread initializes the actual device
> > removal, then every 2 seconds it tries to re-sync (plug in) the
> > device. The trials fail as long as VF parameter mismatches the PF
> parameter.
> > 4. A control thread initiates a control operation on failsafe which
> > initiates this operation on the device.
> > 5. A race condition occurs between the control thread and interrupt
> > thread when accessing the device data structures.
> >
> > This commit prevents the race condition in step 5. Before this commit
> > if a device was removed and then a control thread operation was
> > initiated on failsafe - in some cases failsafe called the sub device
> > operation instead of avoiding it. Such cases could lead to operations
> failures.
> >
> 
> This is a nitpick, but as said earlier, this is not preventing the race condition.
> This race is still present and can still wreak havok on unsuspecting users.
> 
> If an application has a weak threading model, it will be subject to this race
> condition still. It is possible to prevent it fully with proper care from the
> application standpoint, but this is not specific to fail-safe and does not
> concern us here.
> 
> Anyway, it's really a nitpick, I just wanted to point it out. This is not too
> important for this patch.
> 
> > This commit fixes failsafe criteria to determine when the device is
> > removed such that it will avoid calling the sub device operations
> > during that time and will only call them otherwise.
> >
> > Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
> > Cc: stable at dpdk.org
> >
> > Signed-off-by: Ophir Munk <ophirmu at mellanox.com>
> > ---
> > v3:
> > 1. Rebase v2
> >
> > 2. Please ignore checkpatch checks on arguments re-usage - they are
> confirmed.
> > 	CHECK:MACRO_ARG_REUSE: Macro argument reuse ... possible side-
> effects?
> > 	#217: FILE: drivers/net/failsafe/failsafe_private.h:241:
> >
> > 3. Add rationales (copy from an email which accompanied v2):
> >
> > On Monday, September 11, 2017 11:31 AM, Gaetan Rivet wrote:
> > >
> > > Hi Ophir,
> > >
> > > On Sat, Sep 09, 2017 at 07:27:11PM +0000, Ophir Munk wrote:
> > > > This commit prevents control path operations from failing after a
> > > > sub device has informed failsafe it has been removed.
> > > >
> > > > Before this commit if a device was removed and then a control path
> > >
> > > Here are the steps if I understood correctly:
> > >
> > > 0. The physical device is removed
> > > 1. The interrupt thread flags the device 2. A control lcore
> > > initiates a control operation 3. The alarm triggers, waking up the eal-intr-
> thread,
> > >    initiating the actual device removal.
> > > 4. Race condition occurs between control lcore and interrupt thread.
> > >
> > > "if a device was removed" is ambiguous I think (are we speaking
> > > about the physical port? Is it only flagged? Is it after the removal of the
> device itself?).
> > > From the context I gather that you mean the device is flagged to be
> > > removed, but it won't be as clear in a few month when we revisit this bug
> :) .
> > >
> > > Could you please rephrase this so that the whole context of the
> > > issue is available?
> > >
> >
> > Done. Commit message was rephrased based on your comments
> >
> > > > operations was initiated on failsafe - in some cases failsafe
> > > > called the sub device operation instead of avoiding it. Such cases
> > > > could lead to operations failures.
> > > >
> > > > This commit fixes failsafe criteria to determine when the device
> > > > is removed such that it will avoid calling the sub device
> > > > operations during that time and will only call them otherwise.
> > > >
> > >
> > > This commit mitigates the race condition, reducing the probability
> > > for it to have an effect. It does not, however, remove this race
> > > condition, which is inherent to the DPDK architecture at the moment.
> > >
> > > A proper fix, a more detailed workaround and additional
> > > documentation warning users writing applications to mind their threads
> could be interesting.
> > >
> >
> > The race condition occurs in the last step and may lead to
> > segmentation faults (accessing data structures of the same device by 2
> > threads) The previous steps ("the physical device is removed", etc) were not
> recreated and tested but probably cannot lead to segmentation fault.
> >
> > > But let's focus on this patch for the time being.
> > >
> > > > Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
> > > > Cc: stable at dpdk.org
> > > >
> > > > Signed-off-by: Ophir Munk <ophirmu at mellanox.com>
> > > > ---
> > > >  drivers/net/failsafe/failsafe_ether.c |  1 +
> > > >  drivers/net/failsafe/failsafe_ops.c   | 52
> > > +++++++++++++++++++++++++++++------
> > > >  2 files changed, 45 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > > > b/drivers/net/failsafe/failsafe_ether.c
> > > > index a3a8cce..1def110 100644
> > > > --- a/drivers/net/failsafe/failsafe_ether.c
> > > > +++ b/drivers/net/failsafe/failsafe_ether.c
> > > > @@ -378,6 +378,7 @@
> > >
> > > Could you please generate your patches with the function name in the
> diff?
> >
> > Done
> >
> > >
> > > >  				      i);
> > > >  				goto err_remove;
> > > >  			}
> > > > +			sdev->remove = 0;
> > >
> > > You are adding this here, within failsafe_eth_dev_state_sync, and
> > > removing it from the dev_configure ops.
> > >
> > > 10 lines above, the call to dev_configure is done, meaning that the
> > > remove flag was resetted at this point.
> > >
> > > Can you explain why you prefer resetting the flag here?
> > >
> > > The position of this flag reset will be dependent upon my subsequent
> > > remarks anyway, so hold that thought :) .
> > >
> >
> > The motivation for resetting the "remove" flag within
> failsafe_eth_dev_state_sync is as follows:
> > Previously to this patch the "remove" flag was designed to signal the need
> to remove the sub device.
> > Once the sub device was removed and before being reconfigured the
> "remove" flag was reset.
> >
> > After this patch the scope of the "remove" flag was *extended* to
> > indicate the sub device status as being "plugged out" by resetting this flag
> only after a successful call to failsafe_eth_dev_state_sync().
> > The "plug out" status could last a very long time (seconds, minutes, days,
> weeks, ...).
> >
> > Previously to this patch failsafe based the "plugged out" status on
> > the sub device state as being below ACTIVE however every 2 seconds
> > dev_configure() was called where the sub device was assigned sdev-
> > >state = DEV_ACTIVE; therefore the sub device state became ACTIVE for
> some time every 2 seconds.
> > This is where the race condition occurred: failsafe considered the sub
> > device as "Plugged in" for some time every 2 seconds (based on its ACTIVE
> state) while it was actually plugged out.
> >
> > After this patch the "Plugged out" status is based on the "remove" flag.
> >
> 
> Sorry, I do not agree with this semantical change on the "remove" flag.
> You are essentially adding a new device state, which could be fine per se, but
> should not be done here.
> 
> The enum dev_state is there for this purpose.
> 
> The flag dev->remove, calls for an operation to be done upon the concerned
> device. It is not meant to become a new device state.
> 
> A point about the work methodoly here: if you wanted to change this
> semantic, which could be legitimate and sometimes called for, you should
> have proposed it either during a discussion in a response to my previous
> email, or introducing the change as a separate patch. This point is important
> enough for it to have its own patch, meaning we would have a whole thread
> dedicated to it instead of having to interleave commentaries between
> related-but-separate diffs on the code.
> 
> But anyway, if you think you need to express a PLUGOUT state, I'd suggest
> adding a state between DEV_UNDEFINED and DEV_PARSED.
> DEV_UNDEFINED means that the device is in limbo and has no existence per
> se (its parsing failed for example, it is not clear whether the parameters are
> correct, etc...). DEV_PLUGOUT could mean then that the device has been
> successfully probed at least once, meaning that it could possibly have
> residuals from this probing still there, or specific care to be taken when
> manipulating it.
> 
> However, I'm not yet convinced that this new state is necessary. I think you
> can mitigate this race condition without having to add it. If you insist in
> introducing this state, please do so in a separate patch, with proper
> definition about the meaning of this state:
> 
>   + When it should be valid for a device to be in this state.
>   + Which operation corresponds to getting into and out of this state.
>   + Why this state is interesting and what could not be expressed before
>     that is thus being fixed by introducing this state.
> 
> But please verify twice whether you absolutely need to complexify the
> current fail-safe internals before going all in and basing your work upon it :)
> 

Indeed what I am currently missing in failsafe is knowing the device is in a PLUGOUT state.
Your suggestion to add a new state DEV_PLUGOUT cannot be used with the current implementation
as the device states are modified during an alarm hotplug handling every 2 seconds.
In fs_hotplug_alarm() we call failsafe_eth_dev_state_sync(dev) which eventually calls ->dev_configure(dev) 
where we assign: sdev->state = DEV_ACTIVE; 
then when sync fails fs_hotplug_alarm() calls failsafe_dev_remove(dev) which will call fs_dev_remove(sdev); where the sub devices
states are changed from ACTIVE down to DEV_UNDEFINED.
Having said that it means that during a PLUGOUT event - the states are modified with each invocation of the fs_hotplug_alarm
every 2 seconds. So even if we added DEV_PLUGOUT state - it will not remain fixed during the hotplug alarm handling.
I have also verified all of this with printouts.
When seeing a sub device in state "DEV_ACTIVE" - we cannot tell whether the device is currently in "PLUGOUT situation" or "PLUGIN situation"
This allows operations such as fs_mtu_set() on sub-devices which are in "PLUGOUT situation" while their state is 
DEV_ACTIVE to be manipulated, which I think should have been avoided.

Please note fs_mtu_set() implementation:
tatic int fs_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
{
  ..
        FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
	// ***** We are here while the device can be in a "PLUGOUT situation" ***

To summarize:
I am missing a way to know in failsafe that a sub-device is currently plugged out
1. I suggested extending the "remove" flag scope for this purpose. It has minimal changes with current failsafe implementation. You prefer not using "remove".
2. You suggested adding a new state DEV_PLUGOUT. I don't think it will work with current implementation (as explained above) or may require a redesign of current implementation.

Can you suggest another way?

> > > >  		}
> > > >  	}
> > > >  	/*
> > > > diff --git a/drivers/net/failsafe/failsafe_ops.c
> > > > b/drivers/net/failsafe/failsafe_ops.c
> > > > index ff9ad15..314d53d 100644
> > > > --- a/drivers/net/failsafe/failsafe_ops.c
> > > > +++ b/drivers/net/failsafe/failsafe_ops.c
> > > > @@ -232,7 +232,6 @@
> > > >  			dev->data->dev_conf.intr_conf.lsc = 0;
> > > >  		}
> > > >  		DEBUG("Configuring sub-device %d", i);
> > > > -		sdev->remove = 0;
> > > >  		ret = rte_eth_dev_configure(PORT_ID(sdev),
> > > >  					dev->data->nb_rx_queues,
> > > >  					dev->data->nb_tx_queues,
> > > > @@ -311,6 +310,8 @@
> > > >  	int ret;
> > > >
> > > >  	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > > > +		if (sdev->remove)
> > > > +			continue;
> > > >  		DEBUG("Calling rte_eth_dev_set_link_up on sub_device %d",
> > > i);
> > > >  		ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
> > > >  		if (ret) {
> > > > @@ -330,6 +331,8 @@
> > > >  	int ret;
> > > >
> > > >  	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > > > +		if (sdev->remove)
> > > > +			continue;
> > >
> > > For this change and all the others:
> > >
> > > I think it might be best to have this check added to fs_find_next directly.
> > >
> > > Most of the call to the iterators are done within dev_ops, so it
> > > makes sense I think to have it there.
> > >
> > > But then there'd be an issue with the sub-EAL iterations done on
> > > previously- removed ports, as the removed flag is precisely resetted
> > > too late. The function failsafe_dev_remove would also need to have a
> > > manual iteration upon the sub-devices instead of using the macro.
> > >
> > > I think you can actually reset this flag within fs_dev_remove,
> > > instead of the next plug-in, then having this check within
> > > fs_find_next
> > > *should* not be a problem.
> > >
> >
> > With the new scope of "remove" flag (remaining set to 1 as long as the sub
> device is "plugged out"
> > which may last for a very long time) we cannot reset it in
> > fs_dev_remove which is called every 2 seconds.
> >
> 
> With the remove flag staying as it is, I think it should thus be resetted within
> fs_dev_remove. Actually I think it both helps you write you fix, and clarify the
> meaning and intended purpose of this flag.
> 
> > > I think you should break up those changes in two: first move the
> > > flag reset to fs_dev_remove instead of fs_dev_configure, then add
> > > this check to the iterator.
> > >
> 
> Please, do this fix this way. I think moving the dev->remove flag can have
> subtile consequences, and I'd like to have a specific commit to trace back
> which one is responsible.
> 
> > > This way, a git bisect should allow us to pinpoint more easily any
> > > new bug as both changes have the potential to introduce subtle ones.
> > >
> 
> Well, like I said :).
> 
> >
> > I suggest defining a new macro
> >
> > FOREACH_SUBDEV_ACTIVE(sdev, i, dev)  { ...
> >
> > that will replace all cases of:
> >
> > FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > 		if (sdev->remove)
> > 			continue;
> >
> > In order to support the new macro I added a "check_remove" flag to
> > fs_find_next (which is based on your idea above: "I think it might be best to
> have this check added to fs_find_next directly").
> >
> 
> I'd prefer avoiding multiplying the macros.

I agree. Should be avoided.

> There are already two iterators. You add one, which now means that there
> are two ways of iterating upon active devices: using you new macro, and
> using the old one. The difference between the two would be difficult to
> know, without profound knowledge of the rest of the code: that in one
> place the flag is checked, and in the other it is not.
> 
> As such, I suggest you check in all cases that the flag is not set. This
> simplifies the use of these macros and the conditions in which their use
> is correct.
> 
> This means that you have to manually iterate in places where this flag
> should be ignored. I listed these places in my previous email, but I may
> have missed some, please be careful.
> 

I already did it in v1 then changed it in V2/3 based on reviews (probably my misunderstanding)

> Thanks,
> --
> Gaëtan Rivet
> 6WIND


More information about the dev mailing list