[dpdk-dev,v2] net/bonding: fix link status check

Message ID 20171129154200.14436-1-tomaszx.kulasek@intel.com (mailing list archive)
State Rejected, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail apply issues

Commit Message

Tomasz Kulasek Nov. 29, 2017, 3:42 p.m. UTC
  Some devices needs more time to initialize and bring interface up. When
link is down the link properties are not valid, e.g. link_speed is
reported as 0 and this is not a valid speed for slave as well as for whole
bonding.

During NIC (and bonding) initialization there's concurrency between
updating link status and adding slave to the bonding.

This patch:

 - adds delay before configuring bonding (if link is down) to be sure that
   link status of new slave is valid,
 - propagates information about link status from first slave with link up
   instead of first slave at all, to be sure that link speed is valid.

Fixes: 6abd94d72ab5 ("net/bonding: fix check slaves link properties")
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
v2 changes:
 - Checkpatch warnings,
 - Improved code style

 drivers/net/bonding/rte_eth_bond_8023ad_private.h |  1 +
 drivers/net/bonding/rte_eth_bond_api.c            | 51 +++++++++++++++++------
 2 files changed, 39 insertions(+), 13 deletions(-)
  

Comments

Ferruh Yigit Jan. 17, 2018, 4:02 p.m. UTC | #1
On 11/29/2017 3:42 PM, Tomasz Kulasek wrote:
> Some devices needs more time to initialize and bring interface up. When
> link is down the link properties are not valid, e.g. link_speed is
> reported as 0 and this is not a valid speed for slave as well as for whole
> bonding.
> 
> During NIC (and bonding) initialization there's concurrency between
> updating link status and adding slave to the bonding.
> 
> This patch:
> 
>  - adds delay before configuring bonding (if link is down) to be sure that
>    link status of new slave is valid,
>  - propagates information about link status from first slave with link up
>    instead of first slave at all, to be sure that link speed is valid.
> 
> Fixes: 6abd94d72ab5 ("net/bonding: fix check slaves link properties")
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> ---
> v2 changes:
>  - Checkpatch warnings,
>  - Improved code style
Hi Declan,

Any comment on this patch?

Thanks,
ferruh
  
Thomas Monjalon Feb. 6, 2018, 8:52 p.m. UTC | #2
17/01/2018 17:02, Ferruh Yigit:
> On 11/29/2017 3:42 PM, Tomasz Kulasek wrote:
> > Some devices needs more time to initialize and bring interface up. When
> > link is down the link properties are not valid, e.g. link_speed is
> > reported as 0 and this is not a valid speed for slave as well as for whole
> > bonding.
> > 
> > During NIC (and bonding) initialization there's concurrency between
> > updating link status and adding slave to the bonding.
> > 
> > This patch:
> > 
> >  - adds delay before configuring bonding (if link is down) to be sure that
> >    link status of new slave is valid,
> >  - propagates information about link status from first slave with link up
> >    instead of first slave at all, to be sure that link speed is valid.
> > 
> > Fixes: 6abd94d72ab5 ("net/bonding: fix check slaves link properties")
> > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > ---
> > v2 changes:
> >  - Checkpatch warnings,
> >  - Improved code style
> Hi Declan,
> 
> Any comment on this patch?

Any news?
  
Chas Williams Feb. 12, 2018, 6:26 p.m. UTC | #3
It's not clear to me that link_properties_valid() is even correct.  Nothing
prevents an adapter from later negotiating a lower speed and would fail
this test.  If both adapters are set to autoneg, that should be sufficient
but nothing enforces the speed match after the slaves are configured.  So
what is the point of this check?

On Tue, Feb 6, 2018 at 3:52 PM, Thomas Monjalon <thomas@monjalon.net> wrote:

> 17/01/2018 17:02, Ferruh Yigit:
> > On 11/29/2017 3:42 PM, Tomasz Kulasek wrote:
> > > Some devices needs more time to initialize and bring interface up. When
> > > link is down the link properties are not valid, e.g. link_speed is
> > > reported as 0 and this is not a valid speed for slave as well as for
> whole
> > > bonding.
> > >
> > > During NIC (and bonding) initialization there's concurrency between
> > > updating link status and adding slave to the bonding.
> > >
> > > This patch:
> > >
> > >  - adds delay before configuring bonding (if link is down) to be sure
> that
> > >    link status of new slave is valid,
> > >  - propagates information about link status from first slave with link
> up
> > >    instead of first slave at all, to be sure that link speed is valid.
> > >
> > > Fixes: 6abd94d72ab5 ("net/bonding: fix check slaves link properties")
> > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > ---
> > > v2 changes:
> > >  - Checkpatch warnings,
> > >  - Improved code style
> > Hi Declan,
> >
> > Any comment on this patch?
>
> Any news?
>
>
>
  
Stephen Hemminger Feb. 16, 2018, 8:13 p.m. UTC | #4
On Wed, 29 Nov 2017 16:42:00 +0100
Tomasz Kulasek <tomaszx.kulasek@intel.com> wrote:

> +	/* Some devices needs more time to initialize and bring interface up.
> +	 * While link status up is preferable we wait some time to be sure that
> +	 * link status of slave is valid.
> +	 */
> +	if (slave_eth_dev->data->dev_link.link_status == ETH_LINK_DOWN) {
> +		rte_delay_ms(100);
> +		rte_eth_link_get_nowait(slave_port_id, &link_props);
> +		while ((link_props.link_status == ETH_LINK_DOWN) &&
> +				(retries > 0)) {
> +			rte_delay_ms(100);
> +			rte_eth_link_get_nowait(slave_port_id, &link_props);
> +			retries--;
> +		}
> +	}
> +

Why use nowait and a loop, when there is already a waiting version of eth_link_get?
  
Ferruh Yigit June 13, 2018, 4:10 p.m. UTC | #5
On 2/12/2018 6:26 PM, Chas Williams wrote:
> It's not clear to me that link_properties_valid() is even correct.  Nothing
> prevents an adapter from later negotiating a lower speed and would fail this
> test.  If both adapters are set to autoneg, that should be sufficient but
> nothing enforces the speed match after the slaves are configured.  So what is
> the point of this check?

Reminder of this patch.

This is waiting in the backlog for a long time. It is not even clear if the
patch is still valid or not.

Also based on missing response to Chas' clarification request, I am for
dropping/rejecting the patch.

@Declan, @Radu please chime in if this patch is required/valid.

> 
> On Tue, Feb 6, 2018 at 3:52 PM, Thomas Monjalon <thomas@monjalon.net
> <mailto:thomas@monjalon.net>> wrote:
> 
>     17/01/2018 17:02, Ferruh Yigit:
>     > On 11/29/2017 3:42 PM, Tomasz Kulasek wrote:
>     > > Some devices needs more time to initialize and bring interface up. When
>     > > link is down the link properties are not valid, e.g. link_speed is
>     > > reported as 0 and this is not a valid speed for slave as well as for whole
>     > > bonding.
>     > >
>     > > During NIC (and bonding) initialization there's concurrency between
>     > > updating link status and adding slave to the bonding.
>     > >
>     > > This patch:
>     > >
>     > >  - adds delay before configuring bonding (if link is down) to be sure that
>     > >    link status of new slave is valid,
>     > >  - propagates information about link status from first slave with link up
>     > >    instead of first slave at all, to be sure that link speed is valid.
>     > >
>     > > Fixes: 6abd94d72ab5 ("net/bonding: fix check slaves link properties")
>     > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com <mailto:tomaszx.kulasek@intel.com>>
>     > > ---
>     > > v2 changes:
>     > >  - Checkpatch warnings,
>     > >  - Improved code style
>     > Hi Declan,
>     >
>     > Any comment on this patch?
> 
>     Any news?
> 
> 
>
  
Ferruh Yigit June 18, 2018, 8:39 a.m. UTC | #6
On 6/13/2018 5:10 PM, Ferruh Yigit wrote:
> On 2/12/2018 6:26 PM, Chas Williams wrote:
>> It's not clear to me that link_properties_valid() is even correct.  Nothing
>> prevents an adapter from later negotiating a lower speed and would fail this
>> test.  If both adapters are set to autoneg, that should be sufficient but
>> nothing enforces the speed match after the slaves are configured.  So what is
>> the point of this check?
> 
> Reminder of this patch.
> 
> This is waiting in the backlog for a long time. It is not even clear if the
> patch is still valid or not.
> 
> Also based on missing response to Chas' clarification request, I am for
> dropping/rejecting the patch.
> 
> @Declan, @Radu please chime in if this patch is required/valid.

Marked as rejected.

> 
>>
>> On Tue, Feb 6, 2018 at 3:52 PM, Thomas Monjalon <thomas@monjalon.net
>> <mailto:thomas@monjalon.net>> wrote:
>>
>>     17/01/2018 17:02, Ferruh Yigit:
>>     > On 11/29/2017 3:42 PM, Tomasz Kulasek wrote:
>>     > > Some devices needs more time to initialize and bring interface up. When
>>     > > link is down the link properties are not valid, e.g. link_speed is
>>     > > reported as 0 and this is not a valid speed for slave as well as for whole
>>     > > bonding.
>>     > >
>>     > > During NIC (and bonding) initialization there's concurrency between
>>     > > updating link status and adding slave to the bonding.
>>     > >
>>     > > This patch:
>>     > >
>>     > >  - adds delay before configuring bonding (if link is down) to be sure that
>>     > >    link status of new slave is valid,
>>     > >  - propagates information about link status from first slave with link up
>>     > >    instead of first slave at all, to be sure that link speed is valid.
>>     > >
>>     > > Fixes: 6abd94d72ab5 ("net/bonding: fix check slaves link properties")
>>     > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com <mailto:tomaszx.kulasek@intel.com>>
>>     > > ---
>>     > > v2 changes:
>>     > >  - Checkpatch warnings,
>>     > >  - Improved code style
>>     > Hi Declan,
>>     >
>>     > Any comment on this patch?
>>
>>     Any news?
>>
>>
>>
>
  

Patch

diff --git a/drivers/net/bonding/rte_eth_bond_8023ad_private.h b/drivers/net/bonding/rte_eth_bond_8023ad_private.h
index 433c700..0a72beb 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad_private.h
+++ b/drivers/net/bonding/rte_eth_bond_8023ad_private.h
@@ -180,6 +180,7 @@  struct mode8023ad_private {
 	rte_eth_bond_8023ad_ext_slowrx_fn slowrx_cb;
 	uint8_t external_sm;
 
+	uint8_t slave_link_valid;
 	struct rte_eth_link slave_link;
 	/***< slave link properties */
 
diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c
index 980e636..82b6525 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -33,6 +33,7 @@ 
 
 #include <string.h>
 
+#include <rte_cycles.h>
 #include <rte_mbuf.h>
 #include <rte_malloc.h>
 #include <rte_ethdev.h>
@@ -239,6 +240,7 @@ 
 	struct bond_dev_private *internals;
 	struct rte_eth_link link_props;
 	struct rte_eth_dev_info dev_info;
+	int retries = 10;
 
 	bonded_eth_dev = &rte_eth_devices[bonded_port_id];
 	internals = bonded_eth_dev->data->dev_private;
@@ -262,6 +264,21 @@ 
 		return -1;
 	}
 
+	/* Some devices needs more time to initialize and bring interface up.
+	 * While link status up is preferable we wait some time to be sure that
+	 * link status of slave is valid.
+	 */
+	if (slave_eth_dev->data->dev_link.link_status == ETH_LINK_DOWN) {
+		rte_delay_ms(100);
+		rte_eth_link_get_nowait(slave_port_id, &link_props);
+		while ((link_props.link_status == ETH_LINK_DOWN) &&
+				(retries > 0)) {
+			rte_delay_ms(100);
+			rte_eth_link_get_nowait(slave_port_id, &link_props);
+			retries--;
+		}
+	}
+
 	slave_add(internals, slave_eth_dev);
 
 	/* We need to store slaves reta_size to be able to synchronize RETA for all
@@ -270,15 +287,16 @@ 
 	internals->slaves[internals->slave_count].reta_size = dev_info.reta_size;
 
 	if (internals->slave_count < 1) {
+		/* Reset link status information for bonded slaves (it will be
+		 * taken from first valid link status)
+		 */
+		internals->mode4.slave_link_valid = 0;
+
 		/* if MAC is not user defined then use MAC of first slave add to
 		 * bonded device */
 		if (!internals->user_defined_mac)
 			mac_address_set(bonded_eth_dev, slave_eth_dev->data->mac_addrs);
 
-		/* Inherit eth dev link properties from first slave */
-		link_properties_set(bonded_eth_dev,
-				&(slave_eth_dev->data->dev_link));
-
 		/* Make primary slave */
 		internals->primary_port = slave_port_id;
 		internals->current_primary_port = slave_port_id;
@@ -302,14 +320,6 @@ 
 		internals->tx_offload_capa &= dev_info.tx_offload_capa;
 		internals->flow_type_rss_offloads &= dev_info.flow_type_rss_offloads;
 
-		if (link_properties_valid(bonded_eth_dev,
-				&slave_eth_dev->data->dev_link) != 0) {
-			RTE_BOND_LOG(ERR, "Invalid link properties for slave %d"
-					" in bonding mode %d", slave_port_id,
-					internals->mode);
-			return -1;
-		}
-
 		/* RETA size is GCD of all slaves RETA sizes, so, if all sizes will be
 		 * the power of 2, the lower one is GCD
 		 */
@@ -321,6 +331,21 @@ 
 			internals->candidate_max_rx_pktlen = dev_info.max_rx_pktlen;
 	}
 
+	if (link_props.link_status == ETH_LINK_UP) {
+		if (internals->mode4.slave_link_valid == 0) {
+			internals->mode4.slave_link_valid = 1;
+			/* Inherit link properties from first valid */
+			link_properties_set(bonded_eth_dev,
+					&slave_eth_dev->data->dev_link);
+		} else if (link_properties_valid(bonded_eth_dev,
+				&slave_eth_dev->data->dev_link) != 0) {
+			RTE_BOND_LOG(ERR, "Invalid link properties for slave %d"
+					" in bonding mode %d", slave_port_id,
+					internals->mode);
+			return -1;
+		}
+	}
+
 	bonded_eth_dev->data->dev_conf.rx_adv_conf.rss_conf.rss_hf &=
 			internals->flow_type_rss_offloads;
 
@@ -352,7 +377,7 @@ 
 			if (internals->active_slave_count == 0 &&
 			    !internals->user_defined_primary_port)
 				bond_ethdev_primary_set(internals,
-							slave_port_id);
+						slave_port_id);
 
 			if (find_slave_by_id(internals->active_slaves,
 					     internals->active_slave_count,