[dpdk-dev] net/bonding: fix link properties with autoneg

Message ID 20180213225430.15556-1-3chas3@gmail.com (mailing list archive)
State Rejected, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Chas Williams Feb. 13, 2018, 10:54 p.m. UTC
  From: Chas Williams <chas3@att.com>

If a link is carrier down and using autonegotiation, then the PMD may not
have detected a speed yet.  In this case the best we can do is ignore the
link speed and duplex since they aren't valid.  To be completely correct,
there should be additional checks to prevent a slave that negotiates a
different speed from being activated.

Signed-off-by: Chas Williams <chas3@att.com>
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
  

Comments

Thomas Monjalon Feb. 13, 2018, 11:03 p.m. UTC | #1
13/02/2018 23:54, Chas Williams:
> From: Chas Williams <chas3@att.com>
> 
> If a link is carrier down and using autonegotiation, then the PMD may not
> have detected a speed yet.  In this case the best we can do is ignore the
> link speed and duplex since they aren't valid.  To be completely correct,
> there should be additional checks to prevent a slave that negotiates a
> different speed from being activated.
> 
> Signed-off-by: Chas Williams <chas3@att.com>

Please add Fixes line to all your fix patches. Thanks
  
Matan Azrad April 16, 2018, 8:06 a.m. UTC | #2
Hi Chas

From: Chas Williams, Wednesday, February 14, 2018 12:55 AM
> If a link is carrier down and using autonegotiation, then the PMD may not
> have detected a speed yet.  In this case the best we can do is ignore the link
> speed and duplex since they aren't valid.

Ok for this.

>  To be completely correct, there
> should be additional checks to prevent a slave that negotiates a different
> speed from being activated.

Looks like every changing in the link properties should cause LSC interrupt.
In the bonding LCS interrupt you could handle and to deactivate the device.
Also you should deal with the case of the first slave, what is happen if the first slave has invalid link properties?
How can you know that the speed\duplex_mode is invalid?
Are we sure LACP mode can run with auto negotiation?
  

> 
> Signed-off-by: Chas Williams <chas3@att.com>
> ---
>  drivers/net/bonding/rte_eth_bond_pmd.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> b/drivers/net/bonding/rte_eth_bond_pmd.c
> index 92ad688..5559879 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -1545,9 +1545,10 @@ link_properties_valid(struct rte_eth_dev
> *ethdev,
>  	if (bond_ctx->mode == BONDING_MODE_8023AD) {
>  		struct rte_eth_link *bond_link = &bond_ctx-
> >mode4.slave_link;
> 
> -		if (bond_link->link_duplex != slave_link->link_duplex ||
> -			bond_link->link_autoneg != slave_link->link_autoneg
> ||
> -			bond_link->link_speed != slave_link->link_speed)
> +		if (bond_link->link_autoneg != slave_link->link_autoneg ||
> +		    (bond_link->link_autoneg != ETH_LINK_AUTONEG &&
> +		     (bond_link->link_duplex != slave_link->link_duplex ||
> +		      bond_link->link_speed != slave_link->link_speed)))
>  			return -1;
>  	}
> 
> --
> 2.9.5
  
Chas Williams April 16, 2018, 4:44 p.m. UTC | #3
On Mon, Apr 16, 2018 at 4:06 AM, Matan Azrad <matan@mellanox.com> wrote:
> Hi Chas
>
> From: Chas Williams, Wednesday, February 14, 2018 12:55 AM
>> If a link is carrier down and using autonegotiation, then the PMD may not
>> have detected a speed yet.  In this case the best we can do is ignore the link
>> speed and duplex since they aren't valid.
>
> Ok for this.
>
>>  To be completely correct, there
>> should be additional checks to prevent a slave that negotiates a different
>> speed from being activated.
>
> Looks like every changing in the link properties should cause LSC interrupt.
> In the bonding LCS interrupt you could handle and to deactivate the device.
> Also you should deal with the case of the first slave, what is happen if the first slave has invalid link properties?
> How can you know that the speed\duplex_mode is invalid?
> Are we sure LACP mode can run with auto negotiation?

Yes, I am pretty sure bonding doesn't get this right when the interfaces aren't
link up.  While what bonding is doing is likely wrong, it doesn't mean that the
behavior of the PMDs are correct in leaving the link_status unset
until the first
LSC interrupt.

I plan to get around to looking at this bonding problem in a little
bit.  Luckily
it seems that we always tend to get matched links and even if bonding is
advertising the wrong aggregate speed and duplex we are find for now.  It
wouldn't pass close inspection by a protocol analyzer though.

>>
>> Signed-off-by: Chas Williams <chas3@att.com>
>> ---
>>  drivers/net/bonding/rte_eth_bond_pmd.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>> index 92ad688..5559879 100644
>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>> @@ -1545,9 +1545,10 @@ link_properties_valid(struct rte_eth_dev
>> *ethdev,
>>       if (bond_ctx->mode == BONDING_MODE_8023AD) {
>>               struct rte_eth_link *bond_link = &bond_ctx-
>> >mode4.slave_link;
>>
>> -             if (bond_link->link_duplex != slave_link->link_duplex ||
>> -                     bond_link->link_autoneg != slave_link->link_autoneg
>> ||
>> -                     bond_link->link_speed != slave_link->link_speed)
>> +             if (bond_link->link_autoneg != slave_link->link_autoneg ||
>> +                 (bond_link->link_autoneg != ETH_LINK_AUTONEG &&
>> +                  (bond_link->link_duplex != slave_link->link_duplex ||
>> +                   bond_link->link_speed != slave_link->link_speed)))
>>                       return -1;
>>       }
>>
>> --
>> 2.9.5
>
  
Matan Azrad April 16, 2018, 7:09 p.m. UTC | #4
Hi Chas

From: Chas Williams, Monday, April 16, 2018 7:44 PM

> On Mon, Apr 16, 2018 at 4:06 AM, Matan Azrad <matan@mellanox.com>

> wrote:

> > Hi Chas

> >

> > From: Chas Williams, Wednesday, February 14, 2018 12:55 AM

> >> If a link is carrier down and using autonegotiation, then the PMD may

> >> not have detected a speed yet.  In this case the best we can do is

> >> ignore the link speed and duplex since they aren't valid.

> >

> > Ok for this.

> >

> >>  To be completely correct, there

> >> should be additional checks to prevent a slave that negotiates a

> >> different speed from being activated.

> >

> > Looks like every changing in the link properties should cause LSC interrupt.

> > In the bonding LCS interrupt you could handle and to deactivate the device.

> > Also you should deal with the case of the first slave, what is happen if the

> first slave has invalid link properties?

> > How can you know that the speed\duplex_mode is invalid?

> > Are we sure LACP mode can run with auto negotiation?

> 

> Yes, I am pretty sure bonding doesn't get this right when the interfaces

> aren't link up.  While what bonding is doing is likely wrong, it doesn't mean

> that the behavior of the PMDs are correct in leaving the link_status unset

> until the first LSC interrupt.

> 

> I plan to get around to looking at this bonding problem in a little bit.  Luckily it

> seems that we always tend to get matched links and even if bonding is

> advertising the wrong aggregate speed and duplex we are find for now.  It

> wouldn't pass close inspection by a protocol analyzer though.

> 


So, Are you going to fix it,
If no, I think you can open a bug in Bugzilla.

> >>

> >> Signed-off-by: Chas Williams <chas3@att.com>

> >> ---

> >>  drivers/net/bonding/rte_eth_bond_pmd.c | 7 ++++---

> >>  1 file changed, 4 insertions(+), 3 deletions(-)

> >>

> >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c

> >> b/drivers/net/bonding/rte_eth_bond_pmd.c

> >> index 92ad688..5559879 100644

> >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c

> >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c

> >> @@ -1545,9 +1545,10 @@ link_properties_valid(struct rte_eth_dev

> >> *ethdev,

> >>       if (bond_ctx->mode == BONDING_MODE_8023AD) {

> >>               struct rte_eth_link *bond_link = &bond_ctx-

> >> >mode4.slave_link;

> >>

> >> -             if (bond_link->link_duplex != slave_link->link_duplex ||

> >> -                     bond_link->link_autoneg != slave_link->link_autoneg

> >> ||

> >> -                     bond_link->link_speed != slave_link->link_speed)

> >> +             if (bond_link->link_autoneg != slave_link->link_autoneg ||

> >> +                 (bond_link->link_autoneg != ETH_LINK_AUTONEG &&

> >> +                  (bond_link->link_duplex != slave_link->link_duplex ||

> >> +                   bond_link->link_speed !=

> >> + slave_link->link_speed)))

> >>                       return -1;

> >>       }

> >>

> >> --

> >> 2.9.5

> >
  
Ferruh Yigit June 14, 2018, 5:04 p.m. UTC | #5
On 4/16/2018 8:09 PM, Matan Azrad wrote:
> Hi Chas
> 
> From: Chas Williams, Monday, April 16, 2018 7:44 PM
>> On Mon, Apr 16, 2018 at 4:06 AM, Matan Azrad <matan@mellanox.com>
>> wrote:
>>> Hi Chas
>>>
>>> From: Chas Williams, Wednesday, February 14, 2018 12:55 AM
>>>> If a link is carrier down and using autonegotiation, then the PMD may
>>>> not have detected a speed yet.  In this case the best we can do is
>>>> ignore the link speed and duplex since they aren't valid.
>>>
>>> Ok for this.
>>>
>>>>  To be completely correct, there
>>>> should be additional checks to prevent a slave that negotiates a
>>>> different speed from being activated.
>>>
>>> Looks like every changing in the link properties should cause LSC interrupt.
>>> In the bonding LCS interrupt you could handle and to deactivate the device.
>>> Also you should deal with the case of the first slave, what is happen if the
>> first slave has invalid link properties?
>>> How can you know that the speed\duplex_mode is invalid?
>>> Are we sure LACP mode can run with auto negotiation?
>>
>> Yes, I am pretty sure bonding doesn't get this right when the interfaces
>> aren't link up.  While what bonding is doing is likely wrong, it doesn't mean
>> that the behavior of the PMDs are correct in leaving the link_status unset
>> until the first LSC interrupt.
>>
>> I plan to get around to looking at this bonding problem in a little bit.  Luckily it
>> seems that we always tend to get matched links and even if bonding is
>> advertising the wrong aggregate speed and duplex we are find for now.  It
>> wouldn't pass close inspection by a protocol analyzer though.
>>
> 
> So, Are you going to fix it,
> If no, I think you can open a bug in Bugzilla.

Hi Matan, Chas,

What is the latest status of the patch?
And I guess there is another issue as well discussed here, is it still valid?

Thanks,
ferruh
  
Chas Williams June 16, 2018, 5:29 p.m. UTC | #6
On Thu, Jun 14, 2018 at 1:04 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> On 4/16/2018 8:09 PM, Matan Azrad wrote:
> > Hi Chas
> >
> > From: Chas Williams, Monday, April 16, 2018 7:44 PM
> >> On Mon, Apr 16, 2018 at 4:06 AM, Matan Azrad <matan@mellanox.com>
> >> wrote:
> >>> Hi Chas
> >>>
> >>> From: Chas Williams, Wednesday, February 14, 2018 12:55 AM
> >>>> If a link is carrier down and using autonegotiation, then the PMD may
> >>>> not have detected a speed yet.  In this case the best we can do is
> >>>> ignore the link speed and duplex since they aren't valid.
> >>>
> >>> Ok for this.
> >>>
> >>>>  To be completely correct, there
> >>>> should be additional checks to prevent a slave that negotiates a
> >>>> different speed from being activated.
> >>>
> >>> Looks like every changing in the link properties should cause LSC
> interrupt.
> >>> In the bonding LCS interrupt you could handle and to deactivate the
> device.
> >>> Also you should deal with the case of the first slave, what is happen
> if the
> >> first slave has invalid link properties?
> >>> How can you know that the speed\duplex_mode is invalid?
> >>> Are we sure LACP mode can run with auto negotiation?
> >>
> >> Yes, I am pretty sure bonding doesn't get this right when the interfaces
> >> aren't link up.  While what bonding is doing is likely wrong, it
> doesn't mean
> >> that the behavior of the PMDs are correct in leaving the link_status
> unset
> >> until the first LSC interrupt.
> >>
> >> I plan to get around to looking at this bonding problem in a little
> bit.  Luckily it
> >> seems that we always tend to get matched links and even if bonding is
> >> advertising the wrong aggregate speed and duplex we are find for now.
> It
> >> wouldn't pass close inspection by a protocol analyzer though.
> >>
> >
> > So, Are you going to fix it,
> > If no, I think you can open a bug in Bugzilla.
>
> Hi Matan, Chas,
>
> What is the latest status of the patch?
> And I guess there is another issue as well discussed here, is it still
> valid?
>
> Thanks,
> ferruh
>


I think this issue is better addressed by
http://dpdk.org/dev/patchwork/patch/40572/

There's just a little more cleanup that needs to be done in that patch.
  

Patch

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index 92ad688..5559879 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1545,9 +1545,10 @@  link_properties_valid(struct rte_eth_dev *ethdev,
 	if (bond_ctx->mode == BONDING_MODE_8023AD) {
 		struct rte_eth_link *bond_link = &bond_ctx->mode4.slave_link;
 
-		if (bond_link->link_duplex != slave_link->link_duplex ||
-			bond_link->link_autoneg != slave_link->link_autoneg ||
-			bond_link->link_speed != slave_link->link_speed)
+		if (bond_link->link_autoneg != slave_link->link_autoneg ||
+		    (bond_link->link_autoneg != ETH_LINK_AUTONEG &&
+		     (bond_link->link_duplex != slave_link->link_duplex ||
+		      bond_link->link_speed != slave_link->link_speed)))
 			return -1;
 	}