[dpdk-dev,v5,2/3] net/failsafe: fix removal scope

Message ID 1518107653-15466-3-git-send-email-matan@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Matan Azrad Feb. 8, 2018, 4:34 p.m. UTC
  Fail-safe PMD uses per sub-device flag called "remove" to indicate the
scope where the sub-device isn't synchronized with the fail-safe state.

This flag is set when fail-safe gets RMV notification about the
physical removal of the sub-device and should be unset when the
sub-device completes all the configurations cause it to arrive to the
fail-safe state.

The previous code wrongly unsets the flag after calling to the
sub-device PMD dev_configure() operation and before all the
configurations were done.

Change the remove flag unsetting to be only after the sub-device
successes to arrive to the fail-safe state.

Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/failsafe/failsafe_ether.c | 2 ++
 drivers/net/failsafe/failsafe_ops.c   | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)
  

Comments

Gaëtan Rivet Feb. 8, 2018, 5:19 p.m. UTC | #1
Hi Matan,

Thanks for dealing with this.

On Thu, Feb 08, 2018 at 04:34:12PM +0000, Matan Azrad wrote:
> Fail-safe PMD uses per sub-device flag called "remove" to indicate the
> scope where the sub-device isn't synchronized with the fail-safe state.
> 
> This flag is set when fail-safe gets RMV notification about the
> physical removal of the sub-device and should be unset when the
> sub-device completes all the configurations cause it to arrive to the
> fail-safe state.
> 
> The previous code wrongly unsets the flag after calling to the
> sub-device PMD dev_configure() operation and before all the
> configurations were done.
> 
> Change the remove flag unsetting to be only after the sub-device
> successes to arrive to the fail-safe state.
> 

I'm not sure this is the right way to do this.
I think it's clear that it was a mistake to set sdev->remove to 0
only during fs_dev_configure.

The flag itself only means "there is something to be done on this
device, please clean up".

Once the clean-up has happened, then the flag is not necessary anymore
and should be reset.

So I thought that this fix would actually put the flag reset within
fs_dev_remove, right before reinstalling the hotplug alarm.

At this point, the device state would have been set back to
DEV_UNDEFINED, so the remove flag is unnecessary for any operation
trying to avoid unplugged slaves.

The "remove" flag is initialized at 0 when sub-devices are allocated
(during fail-safe init). This means that there would be a difference in
the state of the slave between its first initialization and any
subsequent init, after one successful plugout.

> Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  drivers/net/failsafe/failsafe_ether.c | 2 ++
>  drivers/net/failsafe/failsafe_ops.c   | 2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
> index 4c6e938..ca42376 100644
> --- a/drivers/net/failsafe/failsafe_ether.c
> +++ b/drivers/net/failsafe/failsafe_ether.c
> @@ -377,6 +377,8 @@
>  				      i);
>  				goto err_remove;
>  			}
> +			if (PRIV(dev)->state < DEV_STARTED)
> +				sdev->remove = 0;

Here the remove flag should already be 0. If it isn't, this is a
(logical) bug, which should be properly addressed instead of patched
in this way.

>  		}
>  	}
>  	/*
> diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
> index 7a67e16..a7c2dba 100644
> --- a/drivers/net/failsafe/failsafe_ops.c
> +++ b/drivers/net/failsafe/failsafe_ops.c
> @@ -131,7 +131,6 @@
>  			dev->data->dev_conf.intr_conf.lsc = 0;
>  		}
>  		DEBUG("Configuring sub-device %d", i);
> -		sdev->remove = 0;

This is correct.

>  		ret = rte_eth_dev_configure(PORT_ID(sdev),
>  					dev->data->nb_rx_queues,
>  					dev->data->nb_tx_queues,
> @@ -197,6 +196,7 @@
>  			return ret;
>  		}
>  		sdev->state = DEV_STARTED;
> +		sdev->remove = 0;

This seems unnecessary, if this operation was already performed once the
device has been properly removed.

>  	}
>  	if (PRIV(dev)->state < DEV_STARTED)
>  		PRIV(dev)->state = DEV_STARTED;
> -- 
> 1.8.3.1
>
  
Matan Azrad Feb. 8, 2018, 7:03 p.m. UTC | #2
Hi Gaetan

From: Gaëtan Rivet, Thursday, February 8, 2018 7:20 PM
> Hi Matan,
> 
> Thanks for dealing with this.
> 
> On Thu, Feb 08, 2018 at 04:34:12PM +0000, Matan Azrad wrote:
> > Fail-safe PMD uses per sub-device flag called "remove" to indicate the
> > scope where the sub-device isn't synchronized with the fail-safe state.
> >
> > This flag is set when fail-safe gets RMV notification about the
> > physical removal of the sub-device and should be unset when the
> > sub-device completes all the configurations cause it to arrive to the
> > fail-safe state.
> >
> > The previous code wrongly unsets the flag after calling to the
> > sub-device PMD dev_configure() operation and before all the
> > configurations were done.
> >
> > Change the remove flag unsetting to be only after the sub-device
> > successes to arrive to the fail-safe state.
> >
> 
> I'm not sure this is the right way to do this.
> I think it's clear that it was a mistake to set sdev->remove to 0 only during
> fs_dev_configure.
> 
> The flag itself only means "there is something to be done on this device,
> please clean up".
> 
> Once the clean-up has happened, then the flag is not necessary anymore
> and should be reset.
> 
> So I thought that this fix would actually put the flag reset within
> fs_dev_remove, right before reinstalling the hotplug alarm.
> 
> At this point, the device state would have been set back to DEV_UNDEFINED,
> so the remove flag is unnecessary for any operation trying to avoid
> unplugged slaves.
> 
> The "remove" flag is initialized at 0 when sub-devices are allocated (during
> fail-safe init). This means that there would be a difference in the state of the
> slave between its first initialization and any subsequent init, after one
> successful plugout.
> 

But what's about plug-in process?
Do you want to allow control commands for a sub-device while it is plugging-in?

Unset the remove flag in fs_dev_remove allows to control commands to occur in parallel to plug in process.  

Maybe the name of the flag should be changed to unsynchronized.

> > Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > ---
> >  drivers/net/failsafe/failsafe_ether.c | 2 ++
> >  drivers/net/failsafe/failsafe_ops.c   | 2 +-
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > b/drivers/net/failsafe/failsafe_ether.c
> > index 4c6e938..ca42376 100644
> > --- a/drivers/net/failsafe/failsafe_ether.c
> > +++ b/drivers/net/failsafe/failsafe_ether.c
> > @@ -377,6 +377,8 @@
> >  				      i);
> >  				goto err_remove;
> >  			}
> > +			if (PRIV(dev)->state < DEV_STARTED)
> > +				sdev->remove = 0;
> 
> Here the remove flag should already be 0. If it isn't, this is a
> (logical) bug, which should be properly addressed instead of patched in this
> way.

Same answer as above.

> >  		}
> >  	}
> >  	/*
> > diff --git a/drivers/net/failsafe/failsafe_ops.c
> > b/drivers/net/failsafe/failsafe_ops.c
> > index 7a67e16..a7c2dba 100644
> > --- a/drivers/net/failsafe/failsafe_ops.c
> > +++ b/drivers/net/failsafe/failsafe_ops.c
> > @@ -131,7 +131,6 @@
> >  			dev->data->dev_conf.intr_conf.lsc = 0;
> >  		}
> >  		DEBUG("Configuring sub-device %d", i);
> > -		sdev->remove = 0;
> 
> This is correct.
> 
> >  		ret = rte_eth_dev_configure(PORT_ID(sdev),
> >  					dev->data->nb_rx_queues,
> >  					dev->data->nb_tx_queues,
> > @@ -197,6 +196,7 @@
> >  			return ret;
> >  		}
> >  		sdev->state = DEV_STARTED;
> > +		sdev->remove = 0;
> 
> This seems unnecessary, if this operation was already performed once the
> device has been properly removed.

Same answer as above.
 
> >  	}
> >  	if (PRIV(dev)->state < DEV_STARTED)
> >  		PRIV(dev)->state = DEV_STARTED;
> > --
> > 1.8.3.1
> >
> 
> --
> Gaëtan Rivet
> 6WIND
  

Patch

diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index 4c6e938..ca42376 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -377,6 +377,8 @@ 
 				      i);
 				goto err_remove;
 			}
+			if (PRIV(dev)->state < DEV_STARTED)
+				sdev->remove = 0;
 		}
 	}
 	/*
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index 7a67e16..a7c2dba 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -131,7 +131,6 @@ 
 			dev->data->dev_conf.intr_conf.lsc = 0;
 		}
 		DEBUG("Configuring sub-device %d", i);
-		sdev->remove = 0;
 		ret = rte_eth_dev_configure(PORT_ID(sdev),
 					dev->data->nb_rx_queues,
 					dev->data->nb_tx_queues,
@@ -197,6 +196,7 @@ 
 			return ret;
 		}
 		sdev->state = DEV_STARTED;
+		sdev->remove = 0;
 	}
 	if (PRIV(dev)->state < DEV_STARTED)
 		PRIV(dev)->state = DEV_STARTED;