[dpdk-dev,v3] net/failsafe: fix calling device during RMV events

Message ID 1507243328-11287-1-git-send-email-ophirmu@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Ophir Munk Oct. 5, 2017, 10:42 p.m. UTC
  This commit prevents control path operations from failing after a sub
device removal.

Following are the failure steps:
1. The physical device is removed due to change in one of PF parameters
(e.g. MTU)
2. The interrupt thread flags the device
3. Within 2 seconds Interrupt thread initializes the actual device removal,
then every 2 seconds it tries to re-sync (plug in) the device. The trials
fail as long as VF parameter mismatches the PF parameter.
4. A control thread initiates a control operation on failsafe which
initiates this operation on the device.
5. A race condition occurs between the control thread and interrupt thread
when accessing the device data structures.

This commit prevents the race condition in step 5. Before this commit if a
device was removed and then a control thread operation was initiated on
failsafe - in some cases failsafe called the sub device operation instead
of avoiding it. Such cases could lead to operations failures.

This commit fixes failsafe criteria to determine when the device is removed
such that it will avoid calling the sub device operations during that time
and will only call them otherwise.

Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
Cc: stable@dpdk.org

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
---
v3:
1. Rebase v2

2. Please ignore checkpatch checks on arguments re-usage - they are confirmed.
	CHECK:MACRO_ARG_REUSE: Macro argument reuse ... possible side-effects?
	#217: FILE: drivers/net/failsafe/failsafe_private.h:241:

3. Add rationales (copy from an email which accompanied v2):

On Monday, September 11, 2017 11:31 AM, Gaetan Rivet wrote:
> 
> Hi Ophir,
> 
> On Sat, Sep 09, 2017 at 07:27:11PM +0000, Ophir Munk wrote:
> > This commit prevents control path operations from failing after a 
> > sub device has informed failsafe it has been removed.
> >
> > Before this commit if a device was removed and then a control path
> 
> Here are the steps if I understood correctly:
> 
> 0. The physical device is removed
> 1. The interrupt thread flags the device 2. A control lcore initiates 
> a control operation 3. The alarm triggers, waking up the eal-intr-thread,
>    initiating the actual device removal.
> 4. Race condition occurs between control lcore and interrupt thread.
> 
> "if a device was removed" is ambiguous I think (are we speaking about 
> the physical port? Is it only flagged? Is it after the removal of the device itself?).
> From the context I gather that you mean the device is flagged to be 
> removed, but it won't be as clear in a few month when we revisit this bug :) .
> 
> Could you please rephrase this so that the whole context of the issue 
> is available?
> 

Done. Commit message was rephrased based on your comments 

> > operations was initiated on failsafe - in some cases failsafe called 
> > the sub device operation instead of avoiding it. Such cases could 
> > lead to operations failures.
> >
> > This commit fixes failsafe criteria to determine when the device is 
> > removed such that it will avoid calling the sub device operations 
> > during that time and will only call them otherwise.
> >
> 
> This commit mitigates the race condition, reducing the probability for 
> it to have an effect. It does not, however, remove this race 
> condition, which is inherent to the DPDK architecture at the moment.
> 
> A proper fix, a more detailed workaround and additional documentation 
> warning users writing applications to mind their threads could be interesting.
> 

The race condition occurs in the last step and may lead to segmentation faults (accessing data structures 
of the same device by 2 threads) The previous steps ("the physical device is removed", etc) were not 
recreated and tested but probably cannot lead to segmentation fault. 

> But let's focus on this patch for the time being.
> 
> > Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> > ---
> >  drivers/net/failsafe/failsafe_ether.c |  1 +
> >  drivers/net/failsafe/failsafe_ops.c   | 52
> +++++++++++++++++++++++++++++------
> >  2 files changed, 45 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > b/drivers/net/failsafe/failsafe_ether.c
> > index a3a8cce..1def110 100644
> > --- a/drivers/net/failsafe/failsafe_ether.c
> > +++ b/drivers/net/failsafe/failsafe_ether.c
> > @@ -378,6 +378,7 @@
> 
> Could you please generate your patches with the function name in the diff?

Done 

> 
> >  				      i);
> >  				goto err_remove;
> >  			}
> > +			sdev->remove = 0;
> 
> You are adding this here, within failsafe_eth_dev_state_sync, and 
> removing it from the dev_configure ops.
> 
> 10 lines above, the call to dev_configure is done, meaning that the 
> remove flag was resetted at this point.
> 
> Can you explain why you prefer resetting the flag here?
> 
> The position of this flag reset will be dependent upon my subsequent 
> remarks anyway, so hold that thought :) .
> 

The motivation for resetting the "remove" flag within failsafe_eth_dev_state_sync is as follows:
Previously to this patch the "remove" flag was designed to signal the need to remove the sub device. 
Once the sub device was removed and before being reconfigured the "remove" flag was reset. 

After this patch the scope of the "remove" flag was *extended* to indicate the sub device status as 
being "plugged out" by resetting this flag only after a successful call to failsafe_eth_dev_state_sync(). 
The "plug out" status could last a very long time (seconds, minutes, days, weeks, ...).

Previously to this patch failsafe based the "plugged out" status on the sub device state as being below 
ACTIVE however every 2 seconds dev_configure() was called where the sub device was assigned sdev-
>state = DEV_ACTIVE; therefore the sub device state became ACTIVE for some time every 2 seconds. 
This is where the race condition occurred: failsafe considered the sub device as "Plugged in" for some 
time every 2 seconds (based on its ACTIVE state) while it was actually plugged out. 

After this patch the "Plugged out" status is based on the "remove" flag.

> >  		}
> >  	}
> >  	/*
> > diff --git a/drivers/net/failsafe/failsafe_ops.c
> > b/drivers/net/failsafe/failsafe_ops.c
> > index ff9ad15..314d53d 100644
> > --- a/drivers/net/failsafe/failsafe_ops.c
> > +++ b/drivers/net/failsafe/failsafe_ops.c
> > @@ -232,7 +232,6 @@
> >  			dev->data->dev_conf.intr_conf.lsc = 0;
> >  		}
> >  		DEBUG("Configuring sub-device %d", i);
> > -		sdev->remove = 0;
> >  		ret = rte_eth_dev_configure(PORT_ID(sdev),
> >  					dev->data->nb_rx_queues,
> >  					dev->data->nb_tx_queues,
> > @@ -311,6 +310,8 @@
> >  	int ret;
> >
> >  	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > +		if (sdev->remove)
> > +			continue;
> >  		DEBUG("Calling rte_eth_dev_set_link_up on sub_device %d",
> i);
> >  		ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
> >  		if (ret) {
> > @@ -330,6 +331,8 @@
> >  	int ret;
> >
> >  	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > +		if (sdev->remove)
> > +			continue;
> 
> For this change and all the others:
> 
> I think it might be best to have this check added to fs_find_next directly.
> 
> Most of the call to the iterators are done within dev_ops, so it makes 
> sense I think to have it there.
> 
> But then there'd be an issue with the sub-EAL iterations done on 
> previously- removed ports, as the removed flag is precisely resetted 
> too late. The function failsafe_dev_remove would also need to have a 
> manual iteration upon the sub-devices instead of using the macro.
> 
> I think you can actually reset this flag within fs_dev_remove, instead 
> of the next plug-in, then having this check within fs_find_next 
> *should* not be a problem.
> 

With the new scope of "remove" flag (remaining set to 1 as long as the sub device is "plugged out" 
which may last for a very long time) we cannot reset it in fs_dev_remove which is called every 2 
seconds.

> I think you should break up those changes in two: first move the flag 
> reset to fs_dev_remove instead of fs_dev_configure, then add this 
> check to the iterator.
> 
> This way, a git bisect should allow us to pinpoint more easily any new 
> bug as both changes have the potential to introduce subtle ones.
> 

I suggest defining a new macro 

FOREACH_SUBDEV_ACTIVE(sdev, i, dev)  { ...

that will replace all cases of:

FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
		if (sdev->remove)
			continue;

In order to support the new macro I added a "check_remove" flag to fs_find_next (which is based on 
your idea above: "I think it might be best to have this check added to fs_find_next directly"). 

> >  		DEBUG("Calling rte_eth_dev_set_link_down on sub_device
> %d", i);
> >  		ret = rte_eth_dev_set_link_down(PORT_ID(sdev));
> >  		if (ret) {
> > @@ -517,8 +520,11 @@
> >  	struct sub_device *sdev;
> >  	uint8_t i;
> >
> > -	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
> > +	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > +		if (sdev->remove)
> > +			continue;
> >  		rte_eth_promiscuous_enable(PORT_ID(sdev));
> > +	}
> >  }
> >
> >  static void
> 
> <snip>
> 
> > --
> > 1.8.3.1
> >
> 
> Thanks,
> --
> Gaetan Rivet
> 6WIND

 drivers/net/failsafe/failsafe_ether.c   |  1 +
 drivers/net/failsafe/failsafe_ops.c     | 31 +++++++++++++++----------------
 drivers/net/failsafe/failsafe_private.h | 26 ++++++++++++++++++++++----
 3 files changed, 38 insertions(+), 20 deletions(-)
  

Comments

Gaëtan Rivet Oct. 20, 2017, 10:35 a.m. UTC | #1
Hi Ophir,

Sorry about the delay,
I have a few remarks, I think this patch could be simpler.

First, about the commit logline:
"calling device" is not descriptive enough. I'd suggest

    net/failsafe: fix device configuration during RMV events

But I'm not a native speaker either, so use it if you think it is
better, or don't, it's only a suggestion :).

On Thu, Oct 05, 2017 at 10:42:08PM +0000, Ophir Munk wrote:
> This commit prevents control path operations from failing after a sub
> device removal.
> 
> Following are the failure steps:
> 1. The physical device is removed due to change in one of PF parameters
> (e.g. MTU)
> 2. The interrupt thread flags the device
> 3. Within 2 seconds Interrupt thread initializes the actual device removal,
> then every 2 seconds it tries to re-sync (plug in) the device. The trials
> fail as long as VF parameter mismatches the PF parameter.
> 4. A control thread initiates a control operation on failsafe which
> initiates this operation on the device.
> 5. A race condition occurs between the control thread and interrupt thread
> when accessing the device data structures.
> 
> This commit prevents the race condition in step 5. Before this commit if a
> device was removed and then a control thread operation was initiated on
> failsafe - in some cases failsafe called the sub device operation instead
> of avoiding it. Such cases could lead to operations failures.
> 

This is a nitpick, but as said earlier, this is not preventing the race
condition. This race is still present and can still wreak havok on
unsuspecting users.

If an application has a weak threading model, it will be subject to this
race condition still. It is possible to prevent it fully with proper
care from the application standpoint, but this is not specific to
fail-safe and does not concern us here.

Anyway, it's really a nitpick, I just wanted to point it out. This is
not too important for this patch.

> This commit fixes failsafe criteria to determine when the device is removed
> such that it will avoid calling the sub device operations during that time
> and will only call them otherwise.
> 
> Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> ---
> v3:
> 1. Rebase v2
> 
> 2. Please ignore checkpatch checks on arguments re-usage - they are confirmed.
> 	CHECK:MACRO_ARG_REUSE: Macro argument reuse ... possible side-effects?
> 	#217: FILE: drivers/net/failsafe/failsafe_private.h:241:
> 
> 3. Add rationales (copy from an email which accompanied v2):
> 
> On Monday, September 11, 2017 11:31 AM, Gaetan Rivet wrote:
> > 
> > Hi Ophir,
> > 
> > On Sat, Sep 09, 2017 at 07:27:11PM +0000, Ophir Munk wrote:
> > > This commit prevents control path operations from failing after a 
> > > sub device has informed failsafe it has been removed.
> > >
> > > Before this commit if a device was removed and then a control path
> > 
> > Here are the steps if I understood correctly:
> > 
> > 0. The physical device is removed
> > 1. The interrupt thread flags the device 2. A control lcore initiates 
> > a control operation 3. The alarm triggers, waking up the eal-intr-thread,
> >    initiating the actual device removal.
> > 4. Race condition occurs between control lcore and interrupt thread.
> > 
> > "if a device was removed" is ambiguous I think (are we speaking about 
> > the physical port? Is it only flagged? Is it after the removal of the device itself?).
> > From the context I gather that you mean the device is flagged to be 
> > removed, but it won't be as clear in a few month when we revisit this bug :) .
> > 
> > Could you please rephrase this so that the whole context of the issue 
> > is available?
> > 
> 
> Done. Commit message was rephrased based on your comments 
> 
> > > operations was initiated on failsafe - in some cases failsafe called 
> > > the sub device operation instead of avoiding it. Such cases could 
> > > lead to operations failures.
> > >
> > > This commit fixes failsafe criteria to determine when the device is 
> > > removed such that it will avoid calling the sub device operations 
> > > during that time and will only call them otherwise.
> > >
> > 
> > This commit mitigates the race condition, reducing the probability for 
> > it to have an effect. It does not, however, remove this race 
> > condition, which is inherent to the DPDK architecture at the moment.
> > 
> > A proper fix, a more detailed workaround and additional documentation 
> > warning users writing applications to mind their threads could be interesting.
> > 
> 
> The race condition occurs in the last step and may lead to segmentation faults (accessing data structures 
> of the same device by 2 threads) The previous steps ("the physical device is removed", etc) were not 
> recreated and tested but probably cannot lead to segmentation fault. 
> 
> > But let's focus on this patch for the time being.
> > 
> > > Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> > > ---
> > >  drivers/net/failsafe/failsafe_ether.c |  1 +
> > >  drivers/net/failsafe/failsafe_ops.c   | 52
> > +++++++++++++++++++++++++++++------
> > >  2 files changed, 45 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > > b/drivers/net/failsafe/failsafe_ether.c
> > > index a3a8cce..1def110 100644
> > > --- a/drivers/net/failsafe/failsafe_ether.c
> > > +++ b/drivers/net/failsafe/failsafe_ether.c
> > > @@ -378,6 +378,7 @@
> > 
> > Could you please generate your patches with the function name in the diff?
> 
> Done 
> 
> > 
> > >  				      i);
> > >  				goto err_remove;
> > >  			}
> > > +			sdev->remove = 0;
> > 
> > You are adding this here, within failsafe_eth_dev_state_sync, and 
> > removing it from the dev_configure ops.
> > 
> > 10 lines above, the call to dev_configure is done, meaning that the 
> > remove flag was resetted at this point.
> > 
> > Can you explain why you prefer resetting the flag here?
> > 
> > The position of this flag reset will be dependent upon my subsequent 
> > remarks anyway, so hold that thought :) .
> > 
> 
> The motivation for resetting the "remove" flag within failsafe_eth_dev_state_sync is as follows:
> Previously to this patch the "remove" flag was designed to signal the need to remove the sub device. 
> Once the sub device was removed and before being reconfigured the "remove" flag was reset. 
> 
> After this patch the scope of the "remove" flag was *extended* to indicate the sub device status as 
> being "plugged out" by resetting this flag only after a successful call to failsafe_eth_dev_state_sync(). 
> The "plug out" status could last a very long time (seconds, minutes, days, weeks, ...).
> 
> Previously to this patch failsafe based the "plugged out" status on the sub device state as being below 
> ACTIVE however every 2 seconds dev_configure() was called where the sub device was assigned sdev-
> >state = DEV_ACTIVE; therefore the sub device state became ACTIVE for some time every 2 seconds. 
> This is where the race condition occurred: failsafe considered the sub device as "Plugged in" for some 
> time every 2 seconds (based on its ACTIVE state) while it was actually plugged out. 
> 
> After this patch the "Plugged out" status is based on the "remove" flag.
> 

Sorry, I do not agree with this semantical change on the "remove" flag.
You are essentially adding a new device state, which could be fine per
se, but should not be done here.

The enum dev_state is there for this purpose.

The flag dev->remove, calls for an operation to be done upon the
concerned device. It is not meant to become a new device state.

A point about the work methodoly here: if you wanted to change this
semantic, which could be legitimate and sometimes called for, you should
have proposed it either during a discussion in a response to my previous
email, or introducing the change as a separate patch. This point is
important enough for it to have its own patch, meaning we would have a
whole thread dedicated to it instead of having to interleave
commentaries between related-but-separate diffs on the code.

But anyway, if you think you need to express a PLUGOUT state, I'd
suggest adding a state between DEV_UNDEFINED and DEV_PARSED.
DEV_UNDEFINED means that the device is in limbo and has no existence per
se (its parsing failed for example, it is not clear whether the
parameters are correct, etc...). DEV_PLUGOUT could mean then that the
device has been successfully probed at least once, meaning that it could
possibly have residuals from this probing still there, or specific care
to be taken when manipulating it.

However, I'm not yet convinced that this new state is necessary. I think
you can mitigate this race condition without having to add it. If you
insist in introducing this state, please do so in a separate patch, with
proper definition about the meaning of this state:

  + When it should be valid for a device to be in this state.
  + Which operation corresponds to getting into and out of this state.
  + Why this state is interesting and what could not be expressed before
    that is thus being fixed by introducing this state.

But please verify twice whether you absolutely need to complexify the
current fail-safe internals before going all in and basing your work
upon it :)

> > >  		}
> > >  	}
> > >  	/*
> > > diff --git a/drivers/net/failsafe/failsafe_ops.c
> > > b/drivers/net/failsafe/failsafe_ops.c
> > > index ff9ad15..314d53d 100644
> > > --- a/drivers/net/failsafe/failsafe_ops.c
> > > +++ b/drivers/net/failsafe/failsafe_ops.c
> > > @@ -232,7 +232,6 @@
> > >  			dev->data->dev_conf.intr_conf.lsc = 0;
> > >  		}
> > >  		DEBUG("Configuring sub-device %d", i);
> > > -		sdev->remove = 0;
> > >  		ret = rte_eth_dev_configure(PORT_ID(sdev),
> > >  					dev->data->nb_rx_queues,
> > >  					dev->data->nb_tx_queues,
> > > @@ -311,6 +310,8 @@
> > >  	int ret;
> > >
> > >  	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > > +		if (sdev->remove)
> > > +			continue;
> > >  		DEBUG("Calling rte_eth_dev_set_link_up on sub_device %d",
> > i);
> > >  		ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
> > >  		if (ret) {
> > > @@ -330,6 +331,8 @@
> > >  	int ret;
> > >
> > >  	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > > +		if (sdev->remove)
> > > +			continue;
> > 
> > For this change and all the others:
> > 
> > I think it might be best to have this check added to fs_find_next directly.
> > 
> > Most of the call to the iterators are done within dev_ops, so it makes 
> > sense I think to have it there.
> > 
> > But then there'd be an issue with the sub-EAL iterations done on 
> > previously- removed ports, as the removed flag is precisely resetted 
> > too late. The function failsafe_dev_remove would also need to have a 
> > manual iteration upon the sub-devices instead of using the macro.
> > 
> > I think you can actually reset this flag within fs_dev_remove, instead 
> > of the next plug-in, then having this check within fs_find_next 
> > *should* not be a problem.
> > 
> 
> With the new scope of "remove" flag (remaining set to 1 as long as the sub device is "plugged out" 
> which may last for a very long time) we cannot reset it in fs_dev_remove which is called every 2 
> seconds.
> 

With the remove flag staying as it is, I think it should thus be
resetted within fs_dev_remove. Actually I think it both helps you write
you fix, and clarify the meaning and intended purpose of this flag.

> > I think you should break up those changes in two: first move the flag 
> > reset to fs_dev_remove instead of fs_dev_configure, then add this 
> > check to the iterator.
> > 

Please, do this fix this way. I think moving the dev->remove flag can
have subtile consequences, and I'd like to have a specific commit to
trace back which one is responsible.

> > This way, a git bisect should allow us to pinpoint more easily any new 
> > bug as both changes have the potential to introduce subtle ones.
> > 

Well, like I said :).

> 
> I suggest defining a new macro 
> 
> FOREACH_SUBDEV_ACTIVE(sdev, i, dev)  { ...
> 
> that will replace all cases of:
> 
> FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> 		if (sdev->remove)
> 			continue;
> 
> In order to support the new macro I added a "check_remove" flag to fs_find_next (which is based on 
> your idea above: "I think it might be best to have this check added to fs_find_next directly"). 
> 

I'd prefer avoiding multiplying the macros.
There are already two iterators. You add one, which now means that there
are two ways of iterating upon active devices: using you new macro, and
using the old one. The difference between the two would be difficult to
know, without profound knowledge of the rest of the code: that in one
place the flag is checked, and in the other it is not.

As such, I suggest you check in all cases that the flag is not set. This
simplifies the use of these macros and the conditions in which their use
is correct.

This means that you have to manually iterate in places where this flag
should be ignored. I listed these places in my previous email, but I may
have missed some, please be careful.

Thanks,
  
Ophir Munk Oct. 23, 2017, 7:17 a.m. UTC | #2
Hi Gaetan,
Thanks for your quick reply. Please see comments inline.

Regards,
Ophir

> -----Original Message-----
> From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com]
> Sent: Friday, October 20, 2017 1:35 PM
> To: Ophir Munk <ophirmu@mellanox.com>
> Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Olga Shern
> <olgas@mellanox.com>; stable@dpdk.org
> Subject: Re: [PATCH v3] net/failsafe: fix calling device during RMV events
> 
> Hi Ophir,
> 
> Sorry about the delay,
> I have a few remarks, I think this patch could be simpler.
> 
> First, about the commit logline:
> "calling device" is not descriptive enough. I'd suggest
> 
>     net/failsafe: fix device configuration during RMV events
> 
> But I'm not a native speaker either, so use it if you think it is better, or don't,
> it's only a suggestion :).
> 
> On Thu, Oct 05, 2017 at 10:42:08PM +0000, Ophir Munk wrote:
> > This commit prevents control path operations from failing after a sub
> > device removal.
> >
> > Following are the failure steps:
> > 1. The physical device is removed due to change in one of PF
> > parameters (e.g. MTU) 2. The interrupt thread flags the device 3.
> > Within 2 seconds Interrupt thread initializes the actual device
> > removal, then every 2 seconds it tries to re-sync (plug in) the
> > device. The trials fail as long as VF parameter mismatches the PF
> parameter.
> > 4. A control thread initiates a control operation on failsafe which
> > initiates this operation on the device.
> > 5. A race condition occurs between the control thread and interrupt
> > thread when accessing the device data structures.
> >
> > This commit prevents the race condition in step 5. Before this commit
> > if a device was removed and then a control thread operation was
> > initiated on failsafe - in some cases failsafe called the sub device
> > operation instead of avoiding it. Such cases could lead to operations
> failures.
> >
> 
> This is a nitpick, but as said earlier, this is not preventing the race condition.
> This race is still present and can still wreak havok on unsuspecting users.
> 
> If an application has a weak threading model, it will be subject to this race
> condition still. It is possible to prevent it fully with proper care from the
> application standpoint, but this is not specific to fail-safe and does not
> concern us here.
> 
> Anyway, it's really a nitpick, I just wanted to point it out. This is not too
> important for this patch.
> 
> > This commit fixes failsafe criteria to determine when the device is
> > removed such that it will avoid calling the sub device operations
> > during that time and will only call them otherwise.
> >
> > Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> > ---
> > v3:
> > 1. Rebase v2
> >
> > 2. Please ignore checkpatch checks on arguments re-usage - they are
> confirmed.
> > 	CHECK:MACRO_ARG_REUSE: Macro argument reuse ... possible side-
> effects?
> > 	#217: FILE: drivers/net/failsafe/failsafe_private.h:241:
> >
> > 3. Add rationales (copy from an email which accompanied v2):
> >
> > On Monday, September 11, 2017 11:31 AM, Gaetan Rivet wrote:
> > >
> > > Hi Ophir,
> > >
> > > On Sat, Sep 09, 2017 at 07:27:11PM +0000, Ophir Munk wrote:
> > > > This commit prevents control path operations from failing after a
> > > > sub device has informed failsafe it has been removed.
> > > >
> > > > Before this commit if a device was removed and then a control path
> > >
> > > Here are the steps if I understood correctly:
> > >
> > > 0. The physical device is removed
> > > 1. The interrupt thread flags the device 2. A control lcore
> > > initiates a control operation 3. The alarm triggers, waking up the eal-intr-
> thread,
> > >    initiating the actual device removal.
> > > 4. Race condition occurs between control lcore and interrupt thread.
> > >
> > > "if a device was removed" is ambiguous I think (are we speaking
> > > about the physical port? Is it only flagged? Is it after the removal of the
> device itself?).
> > > From the context I gather that you mean the device is flagged to be
> > > removed, but it won't be as clear in a few month when we revisit this bug
> :) .
> > >
> > > Could you please rephrase this so that the whole context of the
> > > issue is available?
> > >
> >
> > Done. Commit message was rephrased based on your comments
> >
> > > > operations was initiated on failsafe - in some cases failsafe
> > > > called the sub device operation instead of avoiding it. Such cases
> > > > could lead to operations failures.
> > > >
> > > > This commit fixes failsafe criteria to determine when the device
> > > > is removed such that it will avoid calling the sub device
> > > > operations during that time and will only call them otherwise.
> > > >
> > >
> > > This commit mitigates the race condition, reducing the probability
> > > for it to have an effect. It does not, however, remove this race
> > > condition, which is inherent to the DPDK architecture at the moment.
> > >
> > > A proper fix, a more detailed workaround and additional
> > > documentation warning users writing applications to mind their threads
> could be interesting.
> > >
> >
> > The race condition occurs in the last step and may lead to
> > segmentation faults (accessing data structures of the same device by 2
> > threads) The previous steps ("the physical device is removed", etc) were not
> recreated and tested but probably cannot lead to segmentation fault.
> >
> > > But let's focus on this patch for the time being.
> > >
> > > > Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> > > > ---
> > > >  drivers/net/failsafe/failsafe_ether.c |  1 +
> > > >  drivers/net/failsafe/failsafe_ops.c   | 52
> > > +++++++++++++++++++++++++++++------
> > > >  2 files changed, 45 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > > > b/drivers/net/failsafe/failsafe_ether.c
> > > > index a3a8cce..1def110 100644
> > > > --- a/drivers/net/failsafe/failsafe_ether.c
> > > > +++ b/drivers/net/failsafe/failsafe_ether.c
> > > > @@ -378,6 +378,7 @@
> > >
> > > Could you please generate your patches with the function name in the
> diff?
> >
> > Done
> >
> > >
> > > >  				      i);
> > > >  				goto err_remove;
> > > >  			}
> > > > +			sdev->remove = 0;
> > >
> > > You are adding this here, within failsafe_eth_dev_state_sync, and
> > > removing it from the dev_configure ops.
> > >
> > > 10 lines above, the call to dev_configure is done, meaning that the
> > > remove flag was resetted at this point.
> > >
> > > Can you explain why you prefer resetting the flag here?
> > >
> > > The position of this flag reset will be dependent upon my subsequent
> > > remarks anyway, so hold that thought :) .
> > >
> >
> > The motivation for resetting the "remove" flag within
> failsafe_eth_dev_state_sync is as follows:
> > Previously to this patch the "remove" flag was designed to signal the need
> to remove the sub device.
> > Once the sub device was removed and before being reconfigured the
> "remove" flag was reset.
> >
> > After this patch the scope of the "remove" flag was *extended* to
> > indicate the sub device status as being "plugged out" by resetting this flag
> only after a successful call to failsafe_eth_dev_state_sync().
> > The "plug out" status could last a very long time (seconds, minutes, days,
> weeks, ...).
> >
> > Previously to this patch failsafe based the "plugged out" status on
> > the sub device state as being below ACTIVE however every 2 seconds
> > dev_configure() was called where the sub device was assigned sdev-
> > >state = DEV_ACTIVE; therefore the sub device state became ACTIVE for
> some time every 2 seconds.
> > This is where the race condition occurred: failsafe considered the sub
> > device as "Plugged in" for some time every 2 seconds (based on its ACTIVE
> state) while it was actually plugged out.
> >
> > After this patch the "Plugged out" status is based on the "remove" flag.
> >
> 
> Sorry, I do not agree with this semantical change on the "remove" flag.
> You are essentially adding a new device state, which could be fine per se, but
> should not be done here.
> 
> The enum dev_state is there for this purpose.
> 
> The flag dev->remove, calls for an operation to be done upon the concerned
> device. It is not meant to become a new device state.
> 
> A point about the work methodoly here: if you wanted to change this
> semantic, which could be legitimate and sometimes called for, you should
> have proposed it either during a discussion in a response to my previous
> email, or introducing the change as a separate patch. This point is important
> enough for it to have its own patch, meaning we would have a whole thread
> dedicated to it instead of having to interleave commentaries between
> related-but-separate diffs on the code.
> 
> But anyway, if you think you need to express a PLUGOUT state, I'd suggest
> adding a state between DEV_UNDEFINED and DEV_PARSED.
> DEV_UNDEFINED means that the device is in limbo and has no existence per
> se (its parsing failed for example, it is not clear whether the parameters are
> correct, etc...). DEV_PLUGOUT could mean then that the device has been
> successfully probed at least once, meaning that it could possibly have
> residuals from this probing still there, or specific care to be taken when
> manipulating it.
> 
> However, I'm not yet convinced that this new state is necessary. I think you
> can mitigate this race condition without having to add it. If you insist in
> introducing this state, please do so in a separate patch, with proper
> definition about the meaning of this state:
> 
>   + When it should be valid for a device to be in this state.
>   + Which operation corresponds to getting into and out of this state.
>   + Why this state is interesting and what could not be expressed before
>     that is thus being fixed by introducing this state.
> 
> But please verify twice whether you absolutely need to complexify the
> current fail-safe internals before going all in and basing your work upon it :)
> 

Indeed what I am currently missing in failsafe is knowing the device is in a PLUGOUT state.
Your suggestion to add a new state DEV_PLUGOUT cannot be used with the current implementation
as the device states are modified during an alarm hotplug handling every 2 seconds.
In fs_hotplug_alarm() we call failsafe_eth_dev_state_sync(dev) which eventually calls ->dev_configure(dev) 
where we assign: sdev->state = DEV_ACTIVE; 
then when sync fails fs_hotplug_alarm() calls failsafe_dev_remove(dev) which will call fs_dev_remove(sdev); where the sub devices
states are changed from ACTIVE down to DEV_UNDEFINED.
Having said that it means that during a PLUGOUT event - the states are modified with each invocation of the fs_hotplug_alarm
every 2 seconds. So even if we added DEV_PLUGOUT state - it will not remain fixed during the hotplug alarm handling.
I have also verified all of this with printouts.
When seeing a sub device in state "DEV_ACTIVE" - we cannot tell whether the device is currently in "PLUGOUT situation" or "PLUGIN situation"
This allows operations such as fs_mtu_set() on sub-devices which are in "PLUGOUT situation" while their state is 
DEV_ACTIVE to be manipulated, which I think should have been avoided.

Please note fs_mtu_set() implementation:
tatic int fs_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
{
  ..
        FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
	// ***** We are here while the device can be in a "PLUGOUT situation" ***

To summarize:
I am missing a way to know in failsafe that a sub-device is currently plugged out
1. I suggested extending the "remove" flag scope for this purpose. It has minimal changes with current failsafe implementation. You prefer not using "remove".
2. You suggested adding a new state DEV_PLUGOUT. I don't think it will work with current implementation (as explained above) or may require a redesign of current implementation.

Can you suggest another way?

> > > >  		}
> > > >  	}
> > > >  	/*
> > > > diff --git a/drivers/net/failsafe/failsafe_ops.c
> > > > b/drivers/net/failsafe/failsafe_ops.c
> > > > index ff9ad15..314d53d 100644
> > > > --- a/drivers/net/failsafe/failsafe_ops.c
> > > > +++ b/drivers/net/failsafe/failsafe_ops.c
> > > > @@ -232,7 +232,6 @@
> > > >  			dev->data->dev_conf.intr_conf.lsc = 0;
> > > >  		}
> > > >  		DEBUG("Configuring sub-device %d", i);
> > > > -		sdev->remove = 0;
> > > >  		ret = rte_eth_dev_configure(PORT_ID(sdev),
> > > >  					dev->data->nb_rx_queues,
> > > >  					dev->data->nb_tx_queues,
> > > > @@ -311,6 +310,8 @@
> > > >  	int ret;
> > > >
> > > >  	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > > > +		if (sdev->remove)
> > > > +			continue;
> > > >  		DEBUG("Calling rte_eth_dev_set_link_up on sub_device %d",
> > > i);
> > > >  		ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
> > > >  		if (ret) {
> > > > @@ -330,6 +331,8 @@
> > > >  	int ret;
> > > >
> > > >  	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > > > +		if (sdev->remove)
> > > > +			continue;
> > >
> > > For this change and all the others:
> > >
> > > I think it might be best to have this check added to fs_find_next directly.
> > >
> > > Most of the call to the iterators are done within dev_ops, so it
> > > makes sense I think to have it there.
> > >
> > > But then there'd be an issue with the sub-EAL iterations done on
> > > previously- removed ports, as the removed flag is precisely resetted
> > > too late. The function failsafe_dev_remove would also need to have a
> > > manual iteration upon the sub-devices instead of using the macro.
> > >
> > > I think you can actually reset this flag within fs_dev_remove,
> > > instead of the next plug-in, then having this check within
> > > fs_find_next
> > > *should* not be a problem.
> > >
> >
> > With the new scope of "remove" flag (remaining set to 1 as long as the sub
> device is "plugged out"
> > which may last for a very long time) we cannot reset it in
> > fs_dev_remove which is called every 2 seconds.
> >
> 
> With the remove flag staying as it is, I think it should thus be resetted within
> fs_dev_remove. Actually I think it both helps you write you fix, and clarify the
> meaning and intended purpose of this flag.
> 
> > > I think you should break up those changes in two: first move the
> > > flag reset to fs_dev_remove instead of fs_dev_configure, then add
> > > this check to the iterator.
> > >
> 
> Please, do this fix this way. I think moving the dev->remove flag can have
> subtile consequences, and I'd like to have a specific commit to trace back
> which one is responsible.
> 
> > > This way, a git bisect should allow us to pinpoint more easily any
> > > new bug as both changes have the potential to introduce subtle ones.
> > >
> 
> Well, like I said :).
> 
> >
> > I suggest defining a new macro
> >
> > FOREACH_SUBDEV_ACTIVE(sdev, i, dev)  { ...
> >
> > that will replace all cases of:
> >
> > FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > 		if (sdev->remove)
> > 			continue;
> >
> > In order to support the new macro I added a "check_remove" flag to
> > fs_find_next (which is based on your idea above: "I think it might be best to
> have this check added to fs_find_next directly").
> >
> 
> I'd prefer avoiding multiplying the macros.

I agree. Should be avoided.

> There are already two iterators. You add one, which now means that there
> are two ways of iterating upon active devices: using you new macro, and
> using the old one. The difference between the two would be difficult to
> know, without profound knowledge of the rest of the code: that in one
> place the flag is checked, and in the other it is not.
> 
> As such, I suggest you check in all cases that the flag is not set. This
> simplifies the use of these macros and the conditions in which their use
> is correct.
> 
> This means that you have to manually iterate in places where this flag
> should be ignored. I listed these places in my previous email, but I may
> have missed some, please be careful.
> 

I already did it in v1 then changed it in V2/3 based on reviews (probably my misunderstanding)

> Thanks,
> --
> Gaëtan Rivet
> 6WIND
  
Gaëtan Rivet Oct. 23, 2017, 8:36 a.m. UTC | #3
On Mon, Oct 23, 2017 at 07:17:41AM +0000, Ophir Munk wrote:
> Hi Gaetan,
> Thanks for your quick reply. Please see comments inline.
> 
> Regards,
> Ophir
> 
> > -----Original Message-----
> > From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com]
> > Sent: Friday, October 20, 2017 1:35 PM
> > To: Ophir Munk <ophirmu@mellanox.com>
> > Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Olga Shern
> > <olgas@mellanox.com>; stable@dpdk.org
> > Subject: Re: [PATCH v3] net/failsafe: fix calling device during RMV events
> > 
> > Hi Ophir,
> > 
> > Sorry about the delay,
> > I have a few remarks, I think this patch could be simpler.
> > 
> > First, about the commit logline:
> > "calling device" is not descriptive enough. I'd suggest
> > 
> >     net/failsafe: fix device configuration during RMV events
> > 
> > But I'm not a native speaker either, so use it if you think it is better, or don't,
> > it's only a suggestion :).
> > 
> > On Thu, Oct 05, 2017 at 10:42:08PM +0000, Ophir Munk wrote:
> > > This commit prevents control path operations from failing after a sub
> > > device removal.
> > >
> > > Following are the failure steps:
> > > 1. The physical device is removed due to change in one of PF
> > > parameters (e.g. MTU) 2. The interrupt thread flags the device 3.
> > > Within 2 seconds Interrupt thread initializes the actual device
> > > removal, then every 2 seconds it tries to re-sync (plug in) the
> > > device. The trials fail as long as VF parameter mismatches the PF
> > parameter.
> > > 4. A control thread initiates a control operation on failsafe which
> > > initiates this operation on the device.
> > > 5. A race condition occurs between the control thread and interrupt
> > > thread when accessing the device data structures.
> > >
> > > This commit prevents the race condition in step 5. Before this commit
> > > if a device was removed and then a control thread operation was
> > > initiated on failsafe - in some cases failsafe called the sub device
> > > operation instead of avoiding it. Such cases could lead to operations
> > failures.
> > >
> > 
> > This is a nitpick, but as said earlier, this is not preventing the race condition.
> > This race is still present and can still wreak havok on unsuspecting users.
> > 
> > If an application has a weak threading model, it will be subject to this race
> > condition still. It is possible to prevent it fully with proper care from the
> > application standpoint, but this is not specific to fail-safe and does not
> > concern us here.
> > 
> > Anyway, it's really a nitpick, I just wanted to point it out. This is not too
> > important for this patch.
> > 
> > > This commit fixes failsafe criteria to determine when the device is
> > > removed such that it will avoid calling the sub device operations
> > > during that time and will only call them otherwise.
> > >
> > > Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> > > ---
> > > v3:
> > > 1. Rebase v2
> > >
> > > 2. Please ignore checkpatch checks on arguments re-usage - they are
> > confirmed.
> > > 	CHECK:MACRO_ARG_REUSE: Macro argument reuse ... possible side-
> > effects?
> > > 	#217: FILE: drivers/net/failsafe/failsafe_private.h:241:
> > >
> > > 3. Add rationales (copy from an email which accompanied v2):
> > >
> > > On Monday, September 11, 2017 11:31 AM, Gaetan Rivet wrote:
> > > >
> > > > Hi Ophir,
> > > >
> > > > On Sat, Sep 09, 2017 at 07:27:11PM +0000, Ophir Munk wrote:
> > > > > This commit prevents control path operations from failing after a
> > > > > sub device has informed failsafe it has been removed.
> > > > >
> > > > > Before this commit if a device was removed and then a control path
> > > >
> > > > Here are the steps if I understood correctly:
> > > >
> > > > 0. The physical device is removed
> > > > 1. The interrupt thread flags the device 2. A control lcore
> > > > initiates a control operation 3. The alarm triggers, waking up the eal-intr-
> > thread,
> > > >    initiating the actual device removal.
> > > > 4. Race condition occurs between control lcore and interrupt thread.
> > > >
> > > > "if a device was removed" is ambiguous I think (are we speaking
> > > > about the physical port? Is it only flagged? Is it after the removal of the
> > device itself?).
> > > > From the context I gather that you mean the device is flagged to be
> > > > removed, but it won't be as clear in a few month when we revisit this bug
> > :) .
> > > >
> > > > Could you please rephrase this so that the whole context of the
> > > > issue is available?
> > > >
> > >
> > > Done. Commit message was rephrased based on your comments
> > >
> > > > > operations was initiated on failsafe - in some cases failsafe
> > > > > called the sub device operation instead of avoiding it. Such cases
> > > > > could lead to operations failures.
> > > > >
> > > > > This commit fixes failsafe criteria to determine when the device
> > > > > is removed such that it will avoid calling the sub device
> > > > > operations during that time and will only call them otherwise.
> > > > >
> > > >
> > > > This commit mitigates the race condition, reducing the probability
> > > > for it to have an effect. It does not, however, remove this race
> > > > condition, which is inherent to the DPDK architecture at the moment.
> > > >
> > > > A proper fix, a more detailed workaround and additional
> > > > documentation warning users writing applications to mind their threads
> > could be interesting.
> > > >
> > >
> > > The race condition occurs in the last step and may lead to
> > > segmentation faults (accessing data structures of the same device by 2
> > > threads) The previous steps ("the physical device is removed", etc) were not
> > recreated and tested but probably cannot lead to segmentation fault.
> > >
> > > > But let's focus on this patch for the time being.
> > > >
> > > > > Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
> > > > > Cc: stable@dpdk.org
> > > > >
> > > > > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> > > > > ---
> > > > >  drivers/net/failsafe/failsafe_ether.c |  1 +
> > > > >  drivers/net/failsafe/failsafe_ops.c   | 52
> > > > +++++++++++++++++++++++++++++------
> > > > >  2 files changed, 45 insertions(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > > > > b/drivers/net/failsafe/failsafe_ether.c
> > > > > index a3a8cce..1def110 100644
> > > > > --- a/drivers/net/failsafe/failsafe_ether.c
> > > > > +++ b/drivers/net/failsafe/failsafe_ether.c
> > > > > @@ -378,6 +378,7 @@
> > > >
> > > > Could you please generate your patches with the function name in the
> > diff?
> > >
> > > Done
> > >
> > > >
> > > > >  				      i);
> > > > >  				goto err_remove;
> > > > >  			}
> > > > > +			sdev->remove = 0;
> > > >
> > > > You are adding this here, within failsafe_eth_dev_state_sync, and
> > > > removing it from the dev_configure ops.
> > > >
> > > > 10 lines above, the call to dev_configure is done, meaning that the
> > > > remove flag was resetted at this point.
> > > >
> > > > Can you explain why you prefer resetting the flag here?
> > > >
> > > > The position of this flag reset will be dependent upon my subsequent
> > > > remarks anyway, so hold that thought :) .
> > > >
> > >
> > > The motivation for resetting the "remove" flag within
> > failsafe_eth_dev_state_sync is as follows:
> > > Previously to this patch the "remove" flag was designed to signal the need
> > to remove the sub device.
> > > Once the sub device was removed and before being reconfigured the
> > "remove" flag was reset.
> > >
> > > After this patch the scope of the "remove" flag was *extended* to
> > > indicate the sub device status as being "plugged out" by resetting this flag
> > only after a successful call to failsafe_eth_dev_state_sync().
> > > The "plug out" status could last a very long time (seconds, minutes, days,
> > weeks, ...).
> > >
> > > Previously to this patch failsafe based the "plugged out" status on
> > > the sub device state as being below ACTIVE however every 2 seconds
> > > dev_configure() was called where the sub device was assigned sdev-
> > > >state = DEV_ACTIVE; therefore the sub device state became ACTIVE for
> > some time every 2 seconds.
> > > This is where the race condition occurred: failsafe considered the sub
> > > device as "Plugged in" for some time every 2 seconds (based on its ACTIVE
> > state) while it was actually plugged out.
> > >
> > > After this patch the "Plugged out" status is based on the "remove" flag.
> > >
> > 
> > Sorry, I do not agree with this semantical change on the "remove" flag.
> > You are essentially adding a new device state, which could be fine per se, but
> > should not be done here.
> > 
> > The enum dev_state is there for this purpose.
> > 
> > The flag dev->remove, calls for an operation to be done upon the concerned
> > device. It is not meant to become a new device state.
> > 
> > A point about the work methodoly here: if you wanted to change this
> > semantic, which could be legitimate and sometimes called for, you should
> > have proposed it either during a discussion in a response to my previous
> > email, or introducing the change as a separate patch. This point is important
> > enough for it to have its own patch, meaning we would have a whole thread
> > dedicated to it instead of having to interleave commentaries between
> > related-but-separate diffs on the code.
> > 
> > But anyway, if you think you need to express a PLUGOUT state, I'd suggest
> > adding a state between DEV_UNDEFINED and DEV_PARSED.
> > DEV_UNDEFINED means that the device is in limbo and has no existence per
> > se (its parsing failed for example, it is not clear whether the parameters are
> > correct, etc...). DEV_PLUGOUT could mean then that the device has been
> > successfully probed at least once, meaning that it could possibly have
> > residuals from this probing still there, or specific care to be taken when
> > manipulating it.
> > 
> > However, I'm not yet convinced that this new state is necessary. I think you
> > can mitigate this race condition without having to add it. If you insist in
> > introducing this state, please do so in a separate patch, with proper
> > definition about the meaning of this state:
> > 
> >   + When it should be valid for a device to be in this state.
> >   + Which operation corresponds to getting into and out of this state.
> >   + Why this state is interesting and what could not be expressed before
> >     that is thus being fixed by introducing this state.
> > 
> > But please verify twice whether you absolutely need to complexify the
> > current fail-safe internals before going all in and basing your work upon it :)
> > 
> 
> Indeed what I am currently missing in failsafe is knowing the device is in a PLUGOUT state.
> Your suggestion to add a new state DEV_PLUGOUT cannot be used with the current implementation
> as the device states are modified during an alarm hotplug handling every 2 seconds.
> In fs_hotplug_alarm() we call failsafe_eth_dev_state_sync(dev) which eventually calls ->dev_configure(dev) 
> where we assign: sdev->state = DEV_ACTIVE; 
> then when sync fails fs_hotplug_alarm() calls failsafe_dev_remove(dev) which will call fs_dev_remove(sdev); where the sub devices
> states are changed from ACTIVE down to DEV_UNDEFINED.
> Having said that it means that during a PLUGOUT event - the states are modified with each invocation of the fs_hotplug_alarm
> every 2 seconds. So even if we added DEV_PLUGOUT state - it will not remain fixed during the hotplug alarm handling.
> I have also verified all of this with printouts.
> When seeing a sub device in state "DEV_ACTIVE" - we cannot tell whether the device is currently in "PLUGOUT situation" or "PLUGIN situation"
> This allows operations such as fs_mtu_set() on sub-devices which are in "PLUGOUT situation" while their state is 
> DEV_ACTIVE to be manipulated, which I think should have been avoided.
> 
> Please note fs_mtu_set() implementation:
> tatic int fs_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
> {
>   ..
>         FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
> 	// ***** We are here while the device can be in a "PLUGOUT situation" ***
> 
> To summarize:
> I am missing a way to know in failsafe that a sub-device is currently plugged out

(sdev->state < DEV_ACTIVE && !sdev->remove) means that the device is
plugged out.

> 1. I suggested extending the "remove" flag scope for this purpose. It has minimal changes with current failsafe implementation. You prefer not using "remove".

I prefer using it, but as a flag, not as a device state.

> 2. You suggested adding a new state DEV_PLUGOUT. I don't think it will work with current implementation (as explained above) or may require a redesign of current implementation.
> 

I do not suggest adding a new state DEV_PLUGOUT.
I suggest using sdev->remove properly.

> Can you suggest another way?
> 

0. In a separate commit, move the
      sdev->remove = 0;
   from fs_dev_configure, into the case DEV_UNDEFINED of the switch
   within fs_dev_remove. This is cleaner and more logical anyway.

1. Check that sdev->remove is not set in fs_find_next.
   If sdev->remove is set, then the device should be skipped.

2. In failsafe_dev_remove, do not use the FOREACH_SUBDEV_STATE iterator,
   but manually iterate over all sub-devices using the subs_tail and
   subs_head values.
   As the generic iterator would skip over devices which have
   sdev->remove set, this function would not work anymore.

3. Find the places I have missed that needs this manual iterator to be
   used instead of the FOREACH_SUBDEV{,_STATE} ones. I think there is at
   least other place that I cannot recall, there might be more.

If you think this does not work, please tell me why, because then it
means that I have misunderstood something about the race condition you
are trying to fix.
  
Ferruh Yigit Nov. 29, 2017, 7:17 p.m. UTC | #4
On 10/23/2017 1:36 AM, Gaëtan Rivet wrote:
> On Mon, Oct 23, 2017 at 07:17:41AM +0000, Ophir Munk wrote:
>> Hi Gaetan,
>> Thanks for your quick reply. Please see comments inline.
>>
>> Regards,
>> Ophir
>>
>>> -----Original Message-----
>>> From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com]
>>> Sent: Friday, October 20, 2017 1:35 PM
>>> To: Ophir Munk <ophirmu@mellanox.com>
>>> Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Olga Shern
>>> <olgas@mellanox.com>; stable@dpdk.org
>>> Subject: Re: [PATCH v3] net/failsafe: fix calling device during RMV events
>>>
>>> Hi Ophir,
>>>
>>> Sorry about the delay,
>>> I have a few remarks, I think this patch could be simpler.
>>>
>>> First, about the commit logline:
>>> "calling device" is not descriptive enough. I'd suggest
>>>
>>>     net/failsafe: fix device configuration during RMV events
>>>
>>> But I'm not a native speaker either, so use it if you think it is better, or don't,
>>> it's only a suggestion :).
>>>
>>> On Thu, Oct 05, 2017 at 10:42:08PM +0000, Ophir Munk wrote:
>>>> This commit prevents control path operations from failing after a sub
>>>> device removal.
>>>>
>>>> Following are the failure steps:
>>>> 1. The physical device is removed due to change in one of PF
>>>> parameters (e.g. MTU) 2. The interrupt thread flags the device 3.
>>>> Within 2 seconds Interrupt thread initializes the actual device
>>>> removal, then every 2 seconds it tries to re-sync (plug in) the
>>>> device. The trials fail as long as VF parameter mismatches the PF
>>> parameter.
>>>> 4. A control thread initiates a control operation on failsafe which
>>>> initiates this operation on the device.
>>>> 5. A race condition occurs between the control thread and interrupt
>>>> thread when accessing the device data structures.
>>>>
>>>> This commit prevents the race condition in step 5. Before this commit
>>>> if a device was removed and then a control thread operation was
>>>> initiated on failsafe - in some cases failsafe called the sub device
>>>> operation instead of avoiding it. Such cases could lead to operations
>>> failures.
>>>>
>>>
>>> This is a nitpick, but as said earlier, this is not preventing the race condition.
>>> This race is still present and can still wreak havok on unsuspecting users.
>>>
>>> If an application has a weak threading model, it will be subject to this race
>>> condition still. It is possible to prevent it fully with proper care from the
>>> application standpoint, but this is not specific to fail-safe and does not
>>> concern us here.
>>>
>>> Anyway, it's really a nitpick, I just wanted to point it out. This is not too
>>> important for this patch.
>>>
>>>> This commit fixes failsafe criteria to determine when the device is
>>>> removed such that it will avoid calling the sub device operations
>>>> during that time and will only call them otherwise.
>>>>
>>>> Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
>>>> ---
>>>> v3:
>>>> 1. Rebase v2
>>>>
>>>> 2. Please ignore checkpatch checks on arguments re-usage - they are
>>> confirmed.
>>>> 	CHECK:MACRO_ARG_REUSE: Macro argument reuse ... possible side-
>>> effects?
>>>> 	#217: FILE: drivers/net/failsafe/failsafe_private.h:241:
>>>>
>>>> 3. Add rationales (copy from an email which accompanied v2):
>>>>
>>>> On Monday, September 11, 2017 11:31 AM, Gaetan Rivet wrote:
>>>>>
>>>>> Hi Ophir,
>>>>>
>>>>> On Sat, Sep 09, 2017 at 07:27:11PM +0000, Ophir Munk wrote:
>>>>>> This commit prevents control path operations from failing after a
>>>>>> sub device has informed failsafe it has been removed.
>>>>>>
>>>>>> Before this commit if a device was removed and then a control path
>>>>>
>>>>> Here are the steps if I understood correctly:
>>>>>
>>>>> 0. The physical device is removed
>>>>> 1. The interrupt thread flags the device 2. A control lcore
>>>>> initiates a control operation 3. The alarm triggers, waking up the eal-intr-
>>> thread,
>>>>>    initiating the actual device removal.
>>>>> 4. Race condition occurs between control lcore and interrupt thread.
>>>>>
>>>>> "if a device was removed" is ambiguous I think (are we speaking
>>>>> about the physical port? Is it only flagged? Is it after the removal of the
>>> device itself?).
>>>>> From the context I gather that you mean the device is flagged to be
>>>>> removed, but it won't be as clear in a few month when we revisit this bug
>>> :) .
>>>>>
>>>>> Could you please rephrase this so that the whole context of the
>>>>> issue is available?
>>>>>
>>>>
>>>> Done. Commit message was rephrased based on your comments
>>>>
>>>>>> operations was initiated on failsafe - in some cases failsafe
>>>>>> called the sub device operation instead of avoiding it. Such cases
>>>>>> could lead to operations failures.
>>>>>>
>>>>>> This commit fixes failsafe criteria to determine when the device
>>>>>> is removed such that it will avoid calling the sub device
>>>>>> operations during that time and will only call them otherwise.
>>>>>>
>>>>>
>>>>> This commit mitigates the race condition, reducing the probability
>>>>> for it to have an effect. It does not, however, remove this race
>>>>> condition, which is inherent to the DPDK architecture at the moment.
>>>>>
>>>>> A proper fix, a more detailed workaround and additional
>>>>> documentation warning users writing applications to mind their threads
>>> could be interesting.
>>>>>
>>>>
>>>> The race condition occurs in the last step and may lead to
>>>> segmentation faults (accessing data structures of the same device by 2
>>>> threads) The previous steps ("the physical device is removed", etc) were not
>>> recreated and tested but probably cannot lead to segmentation fault.
>>>>
>>>>> But let's focus on this patch for the time being.
>>>>>
>>>>>> Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
>>>>>> Cc: stable@dpdk.org
>>>>>>
>>>>>> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
>>>>>> ---
>>>>>>  drivers/net/failsafe/failsafe_ether.c |  1 +
>>>>>>  drivers/net/failsafe/failsafe_ops.c   | 52
>>>>> +++++++++++++++++++++++++++++------
>>>>>>  2 files changed, 45 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/failsafe/failsafe_ether.c
>>>>>> b/drivers/net/failsafe/failsafe_ether.c
>>>>>> index a3a8cce..1def110 100644
>>>>>> --- a/drivers/net/failsafe/failsafe_ether.c
>>>>>> +++ b/drivers/net/failsafe/failsafe_ether.c
>>>>>> @@ -378,6 +378,7 @@
>>>>>
>>>>> Could you please generate your patches with the function name in the
>>> diff?
>>>>
>>>> Done
>>>>
>>>>>
>>>>>>  				      i);
>>>>>>  				goto err_remove;
>>>>>>  			}
>>>>>> +			sdev->remove = 0;
>>>>>
>>>>> You are adding this here, within failsafe_eth_dev_state_sync, and
>>>>> removing it from the dev_configure ops.
>>>>>
>>>>> 10 lines above, the call to dev_configure is done, meaning that the
>>>>> remove flag was resetted at this point.
>>>>>
>>>>> Can you explain why you prefer resetting the flag here?
>>>>>
>>>>> The position of this flag reset will be dependent upon my subsequent
>>>>> remarks anyway, so hold that thought :) .
>>>>>
>>>>
>>>> The motivation for resetting the "remove" flag within
>>> failsafe_eth_dev_state_sync is as follows:
>>>> Previously to this patch the "remove" flag was designed to signal the need
>>> to remove the sub device.
>>>> Once the sub device was removed and before being reconfigured the
>>> "remove" flag was reset.
>>>>
>>>> After this patch the scope of the "remove" flag was *extended* to
>>>> indicate the sub device status as being "plugged out" by resetting this flag
>>> only after a successful call to failsafe_eth_dev_state_sync().
>>>> The "plug out" status could last a very long time (seconds, minutes, days,
>>> weeks, ...).
>>>>
>>>> Previously to this patch failsafe based the "plugged out" status on
>>>> the sub device state as being below ACTIVE however every 2 seconds
>>>> dev_configure() was called where the sub device was assigned sdev-
>>>>> state = DEV_ACTIVE; therefore the sub device state became ACTIVE for
>>> some time every 2 seconds.
>>>> This is where the race condition occurred: failsafe considered the sub
>>>> device as "Plugged in" for some time every 2 seconds (based on its ACTIVE
>>> state) while it was actually plugged out.
>>>>
>>>> After this patch the "Plugged out" status is based on the "remove" flag.
>>>>
>>>
>>> Sorry, I do not agree with this semantical change on the "remove" flag.
>>> You are essentially adding a new device state, which could be fine per se, but
>>> should not be done here.
>>>
>>> The enum dev_state is there for this purpose.
>>>
>>> The flag dev->remove, calls for an operation to be done upon the concerned
>>> device. It is not meant to become a new device state.
>>>
>>> A point about the work methodoly here: if you wanted to change this
>>> semantic, which could be legitimate and sometimes called for, you should
>>> have proposed it either during a discussion in a response to my previous
>>> email, or introducing the change as a separate patch. This point is important
>>> enough for it to have its own patch, meaning we would have a whole thread
>>> dedicated to it instead of having to interleave commentaries between
>>> related-but-separate diffs on the code.
>>>
>>> But anyway, if you think you need to express a PLUGOUT state, I'd suggest
>>> adding a state between DEV_UNDEFINED and DEV_PARSED.
>>> DEV_UNDEFINED means that the device is in limbo and has no existence per
>>> se (its parsing failed for example, it is not clear whether the parameters are
>>> correct, etc...). DEV_PLUGOUT could mean then that the device has been
>>> successfully probed at least once, meaning that it could possibly have
>>> residuals from this probing still there, or specific care to be taken when
>>> manipulating it.
>>>
>>> However, I'm not yet convinced that this new state is necessary. I think you
>>> can mitigate this race condition without having to add it. If you insist in
>>> introducing this state, please do so in a separate patch, with proper
>>> definition about the meaning of this state:
>>>
>>>   + When it should be valid for a device to be in this state.
>>>   + Which operation corresponds to getting into and out of this state.
>>>   + Why this state is interesting and what could not be expressed before
>>>     that is thus being fixed by introducing this state.
>>>
>>> But please verify twice whether you absolutely need to complexify the
>>> current fail-safe internals before going all in and basing your work upon it :)
>>>
>>
>> Indeed what I am currently missing in failsafe is knowing the device is in a PLUGOUT state.
>> Your suggestion to add a new state DEV_PLUGOUT cannot be used with the current implementation
>> as the device states are modified during an alarm hotplug handling every 2 seconds.
>> In fs_hotplug_alarm() we call failsafe_eth_dev_state_sync(dev) which eventually calls ->dev_configure(dev) 
>> where we assign: sdev->state = DEV_ACTIVE; 
>> then when sync fails fs_hotplug_alarm() calls failsafe_dev_remove(dev) which will call fs_dev_remove(sdev); where the sub devices
>> states are changed from ACTIVE down to DEV_UNDEFINED.
>> Having said that it means that during a PLUGOUT event - the states are modified with each invocation of the fs_hotplug_alarm
>> every 2 seconds. So even if we added DEV_PLUGOUT state - it will not remain fixed during the hotplug alarm handling.
>> I have also verified all of this with printouts.
>> When seeing a sub device in state "DEV_ACTIVE" - we cannot tell whether the device is currently in "PLUGOUT situation" or "PLUGIN situation"
>> This allows operations such as fs_mtu_set() on sub-devices which are in "PLUGOUT situation" while their state is 
>> DEV_ACTIVE to be manipulated, which I think should have been avoided.
>>
>> Please note fs_mtu_set() implementation:
>> tatic int fs_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
>> {
>>   ..
>>         FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
>> 	// ***** We are here while the device can be in a "PLUGOUT situation" ***
>>
>> To summarize:
>> I am missing a way to know in failsafe that a sub-device is currently plugged out
> 
> (sdev->state < DEV_ACTIVE && !sdev->remove) means that the device is
> plugged out.
> 
>> 1. I suggested extending the "remove" flag scope for this purpose. It has minimal changes with current failsafe implementation. You prefer not using "remove".
> 
> I prefer using it, but as a flag, not as a device state.
> 
>> 2. You suggested adding a new state DEV_PLUGOUT. I don't think it will work with current implementation (as explained above) or may require a redesign of current implementation.
>>
> 
> I do not suggest adding a new state DEV_PLUGOUT.
> I suggest using sdev->remove properly.
> 
>> Can you suggest another way?
>>
> 
> 0. In a separate commit, move the
>       sdev->remove = 0;
>    from fs_dev_configure, into the case DEV_UNDEFINED of the switch
>    within fs_dev_remove. This is cleaner and more logical anyway.
> 
> 1. Check that sdev->remove is not set in fs_find_next.
>    If sdev->remove is set, then the device should be skipped.
> 
> 2. In failsafe_dev_remove, do not use the FOREACH_SUBDEV_STATE iterator,
>    but manually iterate over all sub-devices using the subs_tail and
>    subs_head values.
>    As the generic iterator would skip over devices which have
>    sdev->remove set, this function would not work anymore.
> 
> 3. Find the places I have missed that needs this manual iterator to be
>    used instead of the FOREACH_SUBDEV{,_STATE} ones. I think there is at
>    least other place that I cannot recall, there might be more.
> 
> If you think this does not work, please tell me why, because then it
> means that I have misunderstood something about the race condition you
> are trying to fix.

Reminder of this patch remaining from previous release.

>
  
Thomas Monjalon Jan. 18, 2018, 10:22 p.m. UTC | #5
29/11/2017 20:17, Ferruh Yigit:
> >>> On Thu, Oct 05, 2017 at 10:42:08PM +0000, Ophir Munk wrote:
> >>>> This commit prevents control path operations from failing after a sub
> >>>> device removal.
> >>>>
> >>>> Following are the failure steps:
> >>>> 1. The physical device is removed due to change in one of PF
> >>>> parameters (e.g. MTU) 2. The interrupt thread flags the device 3.
> >>>> Within 2 seconds Interrupt thread initializes the actual device
> >>>> removal, then every 2 seconds it tries to re-sync (plug in) the
> >>>> device. The trials fail as long as VF parameter mismatches the PF
> >>> parameter.
> >>>> 4. A control thread initiates a control operation on failsafe which
> >>>> initiates this operation on the device.
> >>>> 5. A race condition occurs between the control thread and interrupt
> >>>> thread when accessing the device data structures.
> >>>>
> >>>> This commit prevents the race condition in step 5. Before this commit
> >>>> if a device was removed and then a control thread operation was
> >>>> initiated on failsafe - in some cases failsafe called the sub device
> >>>> operation instead of avoiding it. Such cases could lead to operations
> >>> failures.
[...]
> 
> Reminder of this patch remaining from previous release.

Gaetan, what is the decision for this possible race condition?
Can we try to fix it in 18.02?
  
Gaëtan Rivet Jan. 18, 2018, 11:35 p.m. UTC | #6
On Thu, Jan 18, 2018 at 11:22:51PM +0100, Thomas Monjalon wrote:
> 29/11/2017 20:17, Ferruh Yigit:
> > >>> On Thu, Oct 05, 2017 at 10:42:08PM +0000, Ophir Munk wrote:
> > >>>> This commit prevents control path operations from failing after a sub
> > >>>> device removal.
> > >>>>
> > >>>> Following are the failure steps:
> > >>>> 1. The physical device is removed due to change in one of PF
> > >>>> parameters (e.g. MTU) 2. The interrupt thread flags the device 3.
> > >>>> Within 2 seconds Interrupt thread initializes the actual device
> > >>>> removal, then every 2 seconds it tries to re-sync (plug in) the
> > >>>> device. The trials fail as long as VF parameter mismatches the PF
> > >>> parameter.
> > >>>> 4. A control thread initiates a control operation on failsafe which
> > >>>> initiates this operation on the device.
> > >>>> 5. A race condition occurs between the control thread and interrupt
> > >>>> thread when accessing the device data structures.
> > >>>>
> > >>>> This commit prevents the race condition in step 5. Before this commit
> > >>>> if a device was removed and then a control thread operation was
> > >>>> initiated on failsafe - in some cases failsafe called the sub device
> > >>>> operation instead of avoiding it. Such cases could lead to operations
> > >>> failures.
> [...]
> > 
> > Reminder of this patch remaining from previous release.
> 
> Gaetan, what is the decision for this possible race condition?

This patchset had several issues that I outlined.

> Can we try to fix it in 18.02?

These patches could go in with a rework. If you feel like it I can
review those fixes in the coming weeks if new versions are submitted.
  
Matan Azrad Feb. 8, 2018, 12:20 p.m. UTC | #7
This series trys to mitigate failsafe race between control commands to
the asynchronic plug-out\in processes.

A full fix is required and will be sent later.

v4(Matan):
Rebase on top of 18.02-rc3.
Extend the fix for other control commands.
Fix hotplug alarm cancel.

V3(Ophir):
Rebase v2.
Add rationales (copy from an email which accompanied v2).


Matan Azrad (1):
  net/failsafe: fix hotplug alarm cancel

Ophir Munk (1):
  net/failsafe: fix calling device during RMV events

 drivers/net/failsafe/failsafe.c         | 18 ++++++------
 drivers/net/failsafe/failsafe_ether.c   |  2 ++
 drivers/net/failsafe/failsafe_flow.c    |  8 +++---
 drivers/net/failsafe/failsafe_ops.c     | 50 ++++++++++++++++++++-------------
 drivers/net/failsafe/failsafe_private.h | 26 ++++++++++++++---
 5 files changed, 66 insertions(+), 38 deletions(-)
  
Matan Azrad Feb. 8, 2018, 4:34 p.m. UTC | #8
This series trys to mitigate failsafe race between control commands to the asynchronic plug-out\in processes.

A full fix is required and will be sent later.

v5(Matan):
Change defines names to failsafe convention (UNSAFE).
split a fix patch.

v4(Matan):
Rebase on top of 18.02-rc3.
Extend the fix for other control commands.
Fix hotplug alarm cancel.

V3(Ophir):
Rebase v2.
Add rationales (copy from an email which accompanied v2).


Matan Azrad (3):
  net/failsafe: fix hotplug alarm cancel
  net/failsafe: fix removal scope
  net/failsafe: fix calling device during RMV events

 drivers/net/failsafe/failsafe.c         | 20 +++++++++----------
 drivers/net/failsafe/failsafe_eal.c     |  2 +-
 drivers/net/failsafe/failsafe_ether.c   |  4 +++-
 drivers/net/failsafe/failsafe_ops.c     | 28 ++++++++++++++++++---------
 drivers/net/failsafe/failsafe_private.h | 34 ++++++++++++++++++++++++++-------
 5 files changed, 59 insertions(+), 29 deletions(-)
  
Matan Azrad Feb. 11, 2018, 5:24 p.m. UTC | #9
This series fixes failsafe race between control commands to the asynchronic plug-out\in processes.

V6(matan):
Full lock based fix.
Change the remove flag scope until SW resources release. 

v5(Matan):
Change defines names to failsafe convention (UNSAFE).
split a fix patch.

v4(Matan):
Rebase on top of 18.02-rc3.
Extend the fix for other control commands.
Fix hotplug alarm cancel.

V3(Ophir):
Rebase v2.
Add rationales (copy from an email which accompanied v2).


Matan Azrad (3):
  net/failsafe: fix hotplug alarm cancel
  net/failsafe: fix removal scope
  net/failsafe: fix hotplug races

 drivers/net/failsafe/Makefile           |   1 +
 drivers/net/failsafe/failsafe.c         |  53 +++++++++---
 drivers/net/failsafe/failsafe_ether.c   |   7 +-
 drivers/net/failsafe/failsafe_flow.c    |  20 ++++-
 drivers/net/failsafe/failsafe_ops.c     | 149 ++++++++++++++++++++++++++------
 drivers/net/failsafe/failsafe_private.h |  62 +++++++++++--
 6 files changed, 248 insertions(+), 44 deletions(-)
  
Matan Azrad Feb. 12, 2018, 8:51 p.m. UTC | #10
This series fixes failsafe race between control commands to the asynchronic plug-out\in processes.

V7(matan):
improve commit logs.
return back emty line.
return back description wrongly removed.

V6(matan):
Full lock based fix.
Change the remove flag scope until SW resources release. 

v5(Matan):
Change defines names to failsafe convention (UNSAFE).
split a fix patch.

v4(Matan):
Rebase on top of 18.02-rc3.
Extend the fix for other control commands.
Fix hotplug alarm cancel.

V3(Ophir):
Rebase v2.
Add rationales (copy from an email which accompanied v2).



Matan Azrad (3):
  net/failsafe: fix hotplug alarm cancel
  net/failsafe: fix removal scope
  net/failsafe: fix hotplug races

 drivers/net/failsafe/Makefile           |   1 +
 drivers/net/failsafe/failsafe.c         |  53 +++++++++---
 drivers/net/failsafe/failsafe_ether.c   |   6 ++
 drivers/net/failsafe/failsafe_flow.c    |  20 ++++-
 drivers/net/failsafe/failsafe_ops.c     | 149 ++++++++++++++++++++++++++------
 drivers/net/failsafe/failsafe_private.h |  56 ++++++++++++
 6 files changed, 248 insertions(+), 37 deletions(-)
  
Gaëtan Rivet Feb. 13, 2018, 1:31 p.m. UTC | #11
Hi Matan,

On Mon, Feb 12, 2018 at 08:51:39PM +0000, Matan Azrad wrote:
> This series fixes failsafe race between control commands to the asynchronic plug-out\in processes.
> 

Thanks for tackling this complicated issue.
For the series:

Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>

I hope there will be more work on this matter soon.

> V7(matan):
> improve commit logs.
> return back emty line.
> return back description wrongly removed.
> 
> V6(matan):
> Full lock based fix.
> Change the remove flag scope until SW resources release. 
> 
> v5(Matan):
> Change defines names to failsafe convention (UNSAFE).
> split a fix patch.
> 
> v4(Matan):
> Rebase on top of 18.02-rc3.
> Extend the fix for other control commands.
> Fix hotplug alarm cancel.
> 
> V3(Ophir):
> Rebase v2.
> Add rationales (copy from an email which accompanied v2).
> 
> 
> 
> Matan Azrad (3):
>   net/failsafe: fix hotplug alarm cancel
>   net/failsafe: fix removal scope
>   net/failsafe: fix hotplug races
> 
>  drivers/net/failsafe/Makefile           |   1 +
>  drivers/net/failsafe/failsafe.c         |  53 +++++++++---
>  drivers/net/failsafe/failsafe_ether.c   |   6 ++
>  drivers/net/failsafe/failsafe_flow.c    |  20 ++++-
>  drivers/net/failsafe/failsafe_ops.c     | 149 ++++++++++++++++++++++++++------
>  drivers/net/failsafe/failsafe_private.h |  56 ++++++++++++
>  6 files changed, 248 insertions(+), 37 deletions(-)
> 
> -- 
> 1.9.5
>
  
Thomas Monjalon Feb. 13, 2018, 4:12 p.m. UTC | #12
13/02/2018 14:31, Gaëtan Rivet:
> Hi Matan,
> 
> On Mon, Feb 12, 2018 at 08:51:39PM +0000, Matan Azrad wrote:
> > This series fixes failsafe race between control commands to the asynchronic plug-out\in processes.
> > 
> 
> Thanks for tackling this complicated issue.
> For the series:
> 
> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>

Applied, thanks
  
De Lara Guarch, Pablo Feb. 13, 2018, 8:58 p.m. UTC | #13
Hi,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Tuesday, February 13, 2018 4:12 PM
> To: Matan Azrad <matan@mellanox.com>
> Cc: dev@dpdk.org; Gaëtan Rivet <gaetan.rivet@6wind.com>
> Subject: Re: [dpdk-dev] [PATCH v7 0/3] failsafe: fix hotplug races
> 
> 13/02/2018 14:31, Gaëtan Rivet:
> > Hi Matan,
> >
> > On Mon, Feb 12, 2018 at 08:51:39PM +0000, Matan Azrad wrote:
> > > This series fixes failsafe race between control commands to the
> asynchronic plug-out\in processes.
> > >
> >
> > Thanks for tackling this complicated issue.
> > For the series:
> >
> > Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
> 
> Applied, thanks

There is a compilation error due to this patch on FreeBSD:

drivers/net/failsafe/failsafe_private.h:377:53: error: format specifies
type 'unsigned long' but the argument has type 'pthread_t' (aka 'struct pthread *') [-Werror,-Wformat]
        DEBUG("Hot-plug mutex was locked by thread %lu%s", pthread_self(),


I am not sure how to print a pthread_t, so I can just report the issue.

Thanks,
Pablo
  
Matan Azrad Feb. 13, 2018, 9:13 p.m. UTC | #14
From: De Lara Guarch, Pablo [mailto:pablo.de.lara.guarch@intel.com]
> Hi,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> > Sent: Tuesday, February 13, 2018 4:12 PM
> > To: Matan Azrad <matan@mellanox.com>
> > Cc: dev@dpdk.org; Gaëtan Rivet <gaetan.rivet@6wind.com>
> > Subject: Re: [dpdk-dev] [PATCH v7 0/3] failsafe: fix hotplug races
> >
> > 13/02/2018 14:31, Gaëtan Rivet:
> > > Hi Matan,
> > >
> > > On Mon, Feb 12, 2018 at 08:51:39PM +0000, Matan Azrad wrote:
> > > > This series fixes failsafe race between control commands to the
> > asynchronic plug-out\in processes.
> > > >
> > >
> > > Thanks for tackling this complicated issue.
> > > For the series:
> > >
> > > Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
> >
> > Applied, thanks
> 
> There is a compilation error due to this patch on FreeBSD:
> 
> drivers/net/failsafe/failsafe_private.h:377:53: error: format specifies type
> 'unsigned long' but the argument has type 'pthread_t' (aka 'struct pthread *')
> [-Werror,-Wformat]
>         DEBUG("Hot-plug mutex was locked by thread %lu%s", pthread_self(),
> 
> 
> I am not sure how to print a pthread_t, so I can just report the issue.
>
Can you check with (unsigned long int) conversion?

 
> Thanks,
> Pablo
  
Thomas Monjalon Feb. 13, 2018, 9:21 p.m. UTC | #15
13/02/2018 22:13, Matan Azrad:
>  From: De Lara Guarch, Pablo [mailto:pablo.de.lara.guarch@intel.com]
> > There is a compilation error due to this patch on FreeBSD:
> > 
> > drivers/net/failsafe/failsafe_private.h:377:53: error: format specifies type
> > 'unsigned long' but the argument has type 'pthread_t' (aka 'struct pthread *')
> > [-Werror,-Wformat]
> >         DEBUG("Hot-plug mutex was locked by thread %lu%s", pthread_self(),
> > 
> > 
> > I am not sure how to print a pthread_t, so I can just report the issue.
> >
> Can you check with (unsigned long int) conversion?

On FreeBSD, pthread_t is:
typedef struct  pthread *pthread_t;

pthread_t is not portable and should not be printed.
I am preparing a patch to enable pthread_t debugging only in Linux.
  

Patch

diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index 0c0748f..42e9808 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -389,6 +389,7 @@  failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
 				      i);
 				goto err_remove;
 			}
+			sdev->remove = 0;
 		}
 	}
 	/*
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index e0f1b0b..b3cac40 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -232,7 +232,6 @@  fs_dev_configure(struct rte_eth_dev *dev)
 			dev->data->dev_conf.intr_conf.lsc = 0;
 		}
 		DEBUG("Configuring sub-device %d", i);
-		sdev->remove = 0;
 		ret = rte_eth_dev_configure(PORT_ID(sdev),
 					dev->data->nb_rx_queues,
 					dev->data->nb_tx_queues,
@@ -310,7 +309,7 @@  fs_dev_set_link_up(struct rte_eth_dev *dev)
 	uint8_t i;
 	int ret;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		DEBUG("Calling rte_eth_dev_set_link_up on sub_device %d", i);
 		ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
 		if (ret) {
@@ -329,7 +328,7 @@  fs_dev_set_link_down(struct rte_eth_dev *dev)
 	uint8_t i;
 	int ret;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		DEBUG("Calling rte_eth_dev_set_link_down on sub_device %d", i);
 		ret = rte_eth_dev_set_link_down(PORT_ID(sdev));
 		if (ret) {
@@ -517,7 +516,7 @@  fs_promiscuous_enable(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev)
 		rte_eth_promiscuous_enable(PORT_ID(sdev));
 }
 
@@ -527,7 +526,7 @@  fs_promiscuous_disable(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev)
 		rte_eth_promiscuous_disable(PORT_ID(sdev));
 }
 
@@ -537,7 +536,7 @@  fs_allmulticast_enable(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev)
 		rte_eth_allmulticast_enable(PORT_ID(sdev));
 }
 
@@ -547,7 +546,7 @@  fs_allmulticast_disable(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev)
 		rte_eth_allmulticast_disable(PORT_ID(sdev));
 }
 
@@ -559,7 +558,7 @@  fs_link_update(struct rte_eth_dev *dev,
 	uint8_t i;
 	int ret;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		DEBUG("Calling link_update on sub_device %d", i);
 		ret = (SUBOPS(sdev, link_update))(ETH(sdev), wait_to_complete);
 		if (ret && ret != -1) {
@@ -602,7 +601,7 @@  fs_stats_reset(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		rte_eth_stats_reset(PORT_ID(sdev));
 		memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
 	}
@@ -700,7 +699,7 @@  fs_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 	uint8_t i;
 	int ret;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		DEBUG("Calling rte_eth_dev_set_mtu on sub_device %d", i);
 		ret = rte_eth_dev_set_mtu(PORT_ID(sdev), mtu);
 		if (ret) {
@@ -719,7 +718,7 @@  fs_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 	uint8_t i;
 	int ret;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		DEBUG("Calling rte_eth_dev_vlan_filter on sub_device %d", i);
 		ret = rte_eth_dev_vlan_filter(PORT_ID(sdev), vlan_id, on);
 		if (ret) {
@@ -753,7 +752,7 @@  fs_flow_ctrl_set(struct rte_eth_dev *dev,
 	uint8_t i;
 	int ret;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		DEBUG("Calling rte_eth_dev_flow_ctrl_set on sub_device %d", i);
 		ret = rte_eth_dev_flow_ctrl_set(PORT_ID(sdev), fc_conf);
 		if (ret) {
@@ -774,7 +773,7 @@  fs_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
 	/* No check: already done within the rte_eth_dev_mac_addr_remove
 	 * call for the fail-safe device.
 	 */
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev)
 		rte_eth_dev_mac_addr_remove(PORT_ID(sdev),
 				&dev->data->mac_addrs[index]);
 	PRIV(dev)->mac_addr_pool[index] = 0;
@@ -791,7 +790,7 @@  fs_mac_addr_add(struct rte_eth_dev *dev,
 	uint8_t i;
 
 	RTE_ASSERT(index < FAILSAFE_MAX_ETHADDR);
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		ret = rte_eth_dev_mac_addr_add(PORT_ID(sdev), mac_addr, vmdq);
 		if (ret) {
 			ERROR("Operation rte_eth_dev_mac_addr_add failed for sub_device %"
@@ -813,7 +812,7 @@  fs_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
 	struct sub_device *sdev;
 	uint8_t i;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev)
 		rte_eth_dev_default_mac_addr_set(PORT_ID(sdev), mac_addr);
 }
 
@@ -832,7 +831,7 @@  fs_filter_ctrl(struct rte_eth_dev *dev,
 		*(const void **)arg = &fs_flow_ops;
 		return 0;
 	}
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		DEBUG("Calling rte_eth_dev_filter_ctrl on sub_device %d", i);
 		ret = rte_eth_dev_filter_ctrl(PORT_ID(sdev), type, op, arg);
 		if (ret) {
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index d2d92af..03e1f58 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -225,10 +225,23 @@  extern int mac_from_arg;
  * dev:   (struct rte_eth_dev *), fail-safe ethdev
  * state: (enum dev_state), minimum acceptable device state
  */
+
 #define FOREACH_SUBDEV_STATE(s, i, dev, state)		\
-	for (s = fs_find_next((dev), 0, state, &i);	\
+	for (s = fs_find_next((dev), 0, state, 0, &i);	\
 	     s != NULL;					\
-	     s = fs_find_next((dev), i + 1, state, &i))
+	     s = fs_find_next((dev), i + 1, state, 0, &i))
+
+/**
+ * Stateful iterator construct over fail-safe sub-devices
+ * in ACTIVE state and not removed due to RMV event
+ * s:     (struct sub_device *), iterator
+ * i:     (uint8_t), increment
+ * dev:   (struct rte_eth_dev *), fail-safe ethdev
+ */
+#define FOREACH_SUBDEV_ACTIVE(s, i, dev)				\
+	for (s = fs_find_next((dev), 0, DEV_ACTIVE, 1, &i);	\
+	     s != NULL;						\
+	     s = fs_find_next((dev), i + 1, DEV_ACTIVE, 1, &i))
 
 /**
  * Iterator construct over fail-safe sub-devices:
@@ -303,6 +316,7 @@  static inline struct sub_device *
 fs_find_next(struct rte_eth_dev *dev,
 	     uint8_t sid,
 	     enum dev_state min_state,
+		 uint8_t check_remove,
 	     uint8_t *sid_out)
 {
 	struct sub_device *subs;
@@ -311,8 +325,12 @@  fs_find_next(struct rte_eth_dev *dev,
 	subs = PRIV(dev)->subs;
 	tail = PRIV(dev)->subs_tail;
 	while (sid < tail) {
-		if (subs[sid].state >= min_state)
-			break;
+		if (subs[sid].state >= min_state) {
+			if (check_remove == 0)
+				break;
+			if (PRIV(dev)->subs[sid].remove == 0)
+				break;
+		}
 		sid++;
 	}
 	*sid_out = sid;