[dpdk-dev,v1] net/failsafe: fix VLAN stripping configuration

Message ID 1509567158-15670-1-git-send-email-ophirmu@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Ophir Munk Nov. 1, 2017, 8:12 p.m. UTC
  failsafe device has vlan stripping configured at startup however once
a sub device is found as non-capable of vlan-stripping failsafe
updates it configuration and removes vlan stripping from it.
This update occurs only once at startup. Following a later plugin
attempt and in case of vlan stripping mismatch between failsafe
configuration and device capability - failsafe cannot recover and the
device remains constantly in plug out state.

The sequence of events leading to this situation is described as
follows:
1. Start testpmd with failsafe where mlx4 is a sub device (not capable
of vlan stripping). Expected printout:
PMD: net_failsafe: Disabling VLAN stripping offload
2. Execute:
testpmd> port stop all
testpmd> port config all max-pkt-len 2048
testpmd> port start all
3. Do a plug out (e.g. disable sriov)
4. Do a plug in (e.g. enable sriov)
5. Expected result: failsafe successfully configures and starts its sub
devices
Actual result: failsafe is continuously failing with these messages:
PMD: net_failsafe: VLAN stripping offload requested but not supported by
sub_device 0
PMD: net_failsafe: device already configured, cannot fix live
configuration
PMD: net_failsafe: Unable to synchronize sub device state

Root cause analysis: at startup failsafe removes vlan stripping from its
configuration. After executing "port config all max-pkt-len 2048"
testpmd marks failsafe in need for configuration update.
After executing "port start all" testpmd overrides failsafe
configuration with its own configuration which includes vlan stripping

During the plugin attempt failsafe refuses to update its configuration
by removing vlan stripping since it has already updated its
configuration at startup.

The fix is to remove the limitation of one time configuration at
startup and allow it during plugin attempts.

Cc: stable@dpdk.org
Fixes: bbc6a53dda44 ("net/failsafe: support Rx offload capabilities")

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
---
The commit message includes bug and fix descriptions
---
 drivers/net/failsafe/failsafe_ops.c | 10 ----------
 1 file changed, 10 deletions(-)
  

Comments

Gaëtan Rivet Nov. 2, 2017, 1:52 p.m. UTC | #1
On Wed, Nov 01, 2017 at 08:12:38PM +0000, Ophir Munk wrote:
> failsafe device has vlan stripping configured at startup however once
> a sub device is found as non-capable of vlan-stripping failsafe
> updates it configuration and removes vlan stripping from it.
> This update occurs only once at startup. Following a later plugin
> attempt and in case of vlan stripping mismatch between failsafe
> configuration and device capability - failsafe cannot recover and the
> device remains constantly in plug out state.
> 
> The sequence of events leading to this situation is described as
> follows:
> 1. Start testpmd with failsafe where mlx4 is a sub device (not capable
> of vlan stripping). Expected printout:
> PMD: net_failsafe: Disabling VLAN stripping offload
> 2. Execute:
> testpmd> port stop all
> testpmd> port config all max-pkt-len 2048
> testpmd> port start all
> 3. Do a plug out (e.g. disable sriov)
> 4. Do a plug in (e.g. enable sriov)
> 5. Expected result: failsafe successfully configures and starts its sub
> devices
> Actual result: failsafe is continuously failing with these messages:
> PMD: net_failsafe: VLAN stripping offload requested but not supported by
> sub_device 0
> PMD: net_failsafe: device already configured, cannot fix live
> configuration
> PMD: net_failsafe: Unable to synchronize sub device state
> 
> Root cause analysis: at startup failsafe removes vlan stripping from its
> configuration. After executing "port config all max-pkt-len 2048"
> testpmd marks failsafe in need for configuration update.
> After executing "port start all" testpmd overrides failsafe
> configuration with its own configuration which includes vlan stripping
> 

Have you tried launching testpmd with the option

"--disable-hw-vlan"

as your mlx4 port does not support it?

> During the plugin attempt failsafe refuses to update its configuration
> by removing vlan stripping since it has already updated its
> configuration at startup.
> 
> The fix is to remove the limitation of one time configuration at
> startup and allow it during plugin attempts.
> 
> Cc: stable@dpdk.org
> Fixes: bbc6a53dda44 ("net/failsafe: support Rx offload capabilities")
> 
> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> ---
> The commit message includes bug and fix descriptions
> ---
>  drivers/net/failsafe/failsafe_ops.c | 10 ----------
>  1 file changed, 10 deletions(-)
> 
> diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
> index f460551..953ee65 100644
> --- a/drivers/net/failsafe/failsafe_ops.c
> +++ b/drivers/net/failsafe/failsafe_ops.c
> @@ -187,16 +187,6 @@
>  			continue;
>  		DEBUG("Checking capabilities for sub_device %d", i);
>  		while ((capa_flag = fs_port_offload_validate(dev, sdev))) {
> -			/*
> -			 * Refuse to change configuration if multiple devices
> -			 * are present and we already have configured at least
> -			 * some of them.
> -			 */
> -			if (PRIV(dev)->state >= DEV_ACTIVE &&
> -			    PRIV(dev)->subs_tail > 1) {
> -				ERROR("device already configured, cannot fix live configuration");
> -				return -1;
> -			}
>  			ret = fs_port_disable_offload(&dev->data->dev_conf,
>  						      capa_flag);
>  			if (ret) {
> -- 
> 1.8.3.1
>
  
Gaëtan Rivet Nov. 2, 2017, 2:16 p.m. UTC | #2
On Thu, Nov 02, 2017 at 02:52:16PM +0100, Gaëtan Rivet wrote:
> On Wed, Nov 01, 2017 at 08:12:38PM +0000, Ophir Munk wrote:
> > failsafe device has vlan stripping configured at startup however once
> > a sub device is found as non-capable of vlan-stripping failsafe
> > updates it configuration and removes vlan stripping from it.
> > This update occurs only once at startup. Following a later plugin
> > attempt and in case of vlan stripping mismatch between failsafe
> > configuration and device capability - failsafe cannot recover and the
> > device remains constantly in plug out state.
> > 
> > The sequence of events leading to this situation is described as
> > follows:
> > 1. Start testpmd with failsafe where mlx4 is a sub device (not capable
> > of vlan stripping). Expected printout:
> > PMD: net_failsafe: Disabling VLAN stripping offload
> > 2. Execute:
> > testpmd> port stop all
> > testpmd> port config all max-pkt-len 2048
> > testpmd> port start all
> > 3. Do a plug out (e.g. disable sriov)
> > 4. Do a plug in (e.g. enable sriov)
> > 5. Expected result: failsafe successfully configures and starts its sub
> > devices
> > Actual result: failsafe is continuously failing with these messages:
> > PMD: net_failsafe: VLAN stripping offload requested but not supported by
> > sub_device 0
> > PMD: net_failsafe: device already configured, cannot fix live
> > configuration
> > PMD: net_failsafe: Unable to synchronize sub device state
> > 
> > Root cause analysis: at startup failsafe removes vlan stripping from its
> > configuration. After executing "port config all max-pkt-len 2048"
> > testpmd marks failsafe in need for configuration update.
> > After executing "port start all" testpmd overrides failsafe
> > configuration with its own configuration which includes vlan stripping
> > 
> 
> Have you tried launching testpmd with the option
> 
> "--disable-hw-vlan"
> 
> as your mlx4 port does not support it?
> 

On a second thought, I think there is a simple solution:

The fail-safe should stop trying to be clever with port configuration.
On rte_eth_dev_configure, simply apply the user configuration (without
trying to detect support and disabling flags on the fly).

If a PMD has an issue, it should warn the user. If it has an issue but
does not warn, it is a bug for this PMD. This is the case for MLX4:
either the PMD changes its behavior, or not, as long as users are fine
with it.

So a proper fix would be to remove the checks (fs_port_offload_validate
and fs_port_disable_offload) and depend on the sub-device for proper
configuration vetting.

Thoughts?

> > During the plugin attempt failsafe refuses to update its configuration
> > by removing vlan stripping since it has already updated its
> > configuration at startup.
> > 
> > The fix is to remove the limitation of one time configuration at
> > startup and allow it during plugin attempts.
> > 
> > Cc: stable@dpdk.org
> > Fixes: bbc6a53dda44 ("net/failsafe: support Rx offload capabilities")
> > 
> > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> > ---
> > The commit message includes bug and fix descriptions
> > ---
> >  drivers/net/failsafe/failsafe_ops.c | 10 ----------
> >  1 file changed, 10 deletions(-)
> > 
> > diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
> > index f460551..953ee65 100644
> > --- a/drivers/net/failsafe/failsafe_ops.c
> > +++ b/drivers/net/failsafe/failsafe_ops.c
> > @@ -187,16 +187,6 @@
> >  			continue;
> >  		DEBUG("Checking capabilities for sub_device %d", i);
> >  		while ((capa_flag = fs_port_offload_validate(dev, sdev))) {
> > -			/*
> > -			 * Refuse to change configuration if multiple devices
> > -			 * are present and we already have configured at least
> > -			 * some of them.
> > -			 */
> > -			if (PRIV(dev)->state >= DEV_ACTIVE &&
> > -			    PRIV(dev)->subs_tail > 1) {
> > -				ERROR("device already configured, cannot fix live configuration");
> > -				return -1;
> > -			}
> >  			ret = fs_port_disable_offload(&dev->data->dev_conf,
> >  						      capa_flag);
> >  			if (ret) {
> > -- 
> > 1.8.3.1
> > 
> 
> -- 
> Gaëtan Rivet
> 6WIND
  
Ophir Munk Nov. 3, 2017, 9:52 a.m. UTC | #3
Hi,
Please see below

> -----Original Message-----
> From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com]
> Sent: Thursday, November 02, 2017 4:16 PM
> To: Ophir Munk <ophirmu@mellanox.com>
> Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Olga Shern
> <olgas@mellanox.com>; stable@dpdk.org
> Subject: Re: [PATCH v1] net/failsafe: fix VLAN stripping configuration
> 
> On Thu, Nov 02, 2017 at 02:52:16PM +0100, Gaëtan Rivet wrote:
> > On Wed, Nov 01, 2017 at 08:12:38PM +0000, Ophir Munk wrote:
> > > failsafe device has vlan stripping configured at startup however
> > > once a sub device is found as non-capable of vlan-stripping failsafe
> > > updates it configuration and removes vlan stripping from it.
> > > This update occurs only once at startup. Following a later plugin
> > > attempt and in case of vlan stripping mismatch between failsafe
> > > configuration and device capability - failsafe cannot recover and
> > > the device remains constantly in plug out state.
> > >
> > > The sequence of events leading to this situation is described as
> > > follows:
> > > 1. Start testpmd with failsafe where mlx4 is a sub device (not
> > > capable of vlan stripping). Expected printout:
> > > PMD: net_failsafe: Disabling VLAN stripping offload 2. Execute:
> > > testpmd> port stop all
> > > testpmd> port config all max-pkt-len 2048 port start all
> > > 3. Do a plug out (e.g. disable sriov) 4. Do a plug in (e.g. enable
> > > sriov) 5. Expected result: failsafe successfully configures and
> > > starts its sub devices Actual result: failsafe is continuously
> > > failing with these messages:
> > > PMD: net_failsafe: VLAN stripping offload requested but not
> > > supported by sub_device 0
> > > PMD: net_failsafe: device already configured, cannot fix live
> > > configuration
> > > PMD: net_failsafe: Unable to synchronize sub device state
> > >
> > > Root cause analysis: at startup failsafe removes vlan stripping from
> > > its configuration. After executing "port config all max-pkt-len 2048"
> > > testpmd marks failsafe in need for configuration update.
> > > After executing "port start all" testpmd overrides failsafe
> > > configuration with its own configuration which includes vlan
> > > stripping
> > >
> >
> > Have you tried launching testpmd with the option
> >
> > "--disable-hw-vlan"
> >
> > as your mlx4 port does not support it?
> >
> 
> On a second thought, I think there is a simple solution:
> 
> The fail-safe should stop trying to be clever with port configuration.
> On rte_eth_dev_configure, simply apply the user configuration (without
> trying to detect support and disabling flags on the fly).
> 
> If a PMD has an issue, it should warn the user. If it has an issue but does not
> warn, it is a bug for this PMD. This is the case for MLX4:
> either the PMD changes its behavior, or not, as long as users are fine with it.
> 
> So a proper fix would be to remove the checks (fs_port_offload_validate and
> fs_port_disable_offload) and depend on the sub-device for proper
> configuration vetting.
> 
> Thoughts?

Agreed. I have sent v2 based on your suggestion. 

> 
> > > During the plugin attempt failsafe refuses to update its
> > > configuration by removing vlan stripping since it has already
> > > updated its configuration at startup.
> > >
> > > The fix is to remove the limitation of one time configuration at
> > > startup and allow it during plugin attempts.
> > >
> > > Cc: stable@dpdk.org
> > > Fixes: bbc6a53dda44 ("net/failsafe: support Rx offload
> > > capabilities")
> > >
> > > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> > > ---
> > > The commit message includes bug and fix descriptions
> > > ---
> > >  drivers/net/failsafe/failsafe_ops.c | 10 ----------
> > >  1 file changed, 10 deletions(-)
> > >
> > > diff --git a/drivers/net/failsafe/failsafe_ops.c
> > > b/drivers/net/failsafe/failsafe_ops.c
> > > index f460551..953ee65 100644
> > > --- a/drivers/net/failsafe/failsafe_ops.c
> > > +++ b/drivers/net/failsafe/failsafe_ops.c
> > > @@ -187,16 +187,6 @@
> > >  			continue;
> > >  		DEBUG("Checking capabilities for sub_device %d", i);
> > >  		while ((capa_flag = fs_port_offload_validate(dev, sdev))) {
> > > -			/*
> > > -			 * Refuse to change configuration if multiple devices
> > > -			 * are present and we already have configured at
> least
> > > -			 * some of them.
> > > -			 */
> > > -			if (PRIV(dev)->state >= DEV_ACTIVE &&
> > > -			    PRIV(dev)->subs_tail > 1) {
> > > -				ERROR("device already configured, cannot
> fix live configuration");
> > > -				return -1;
> > > -			}
> > >  			ret = fs_port_disable_offload(&dev->data-
> >dev_conf,
> > >  						      capa_flag);
> > >  			if (ret) {
> > > --
> > > 1.8.3.1
> > >
> >
> > --
> > Gaëtan Rivet
> > 6WIND
> 
> --
> Gaëtan Rivet
> 6WIND
  

Patch

diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index f460551..953ee65 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -187,16 +187,6 @@ 
 			continue;
 		DEBUG("Checking capabilities for sub_device %d", i);
 		while ((capa_flag = fs_port_offload_validate(dev, sdev))) {
-			/*
-			 * Refuse to change configuration if multiple devices
-			 * are present and we already have configured at least
-			 * some of them.
-			 */
-			if (PRIV(dev)->state >= DEV_ACTIVE &&
-			    PRIV(dev)->subs_tail > 1) {
-				ERROR("device already configured, cannot fix live configuration");
-				return -1;
-			}
 			ret = fs_port_disable_offload(&dev->data->dev_conf,
 						      capa_flag);
 			if (ret) {