Bug 551 - LACP failover with 802.3ad bond mode 4 takes long time
Summary: LACP failover with 802.3ad bond mode 4 takes long time
Status: CONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: ethdev (show other bugs)
Version: 20.11
Hardware: All All
: Normal major
Target Milestone: ---
Assignee: Declan Doherty
URL:
Depends on:
Blocks:
 
Reported: 2020-10-09 20:43 CEST by Kiran
Modified: 2022-05-19 10:09 CEST (History)
2 users (show)



Attachments
Proposed patch (3.34 KB, patch)
2021-03-10 22:44 CET, Nandini
Details | Diff

Description Kiran 2020-10-09 20:43:15 CEST
When one of the bond slaves with 802.3ad is disabled, the switchover takes almost 6 seconds which is not acceptable for any Telcos. We need sub-second switchover time like in linux.

Testing with Juniper QFX switch.

The reason is system ID is changing (to that of the other slave device) when one of the active slaves go down. This causes re-negotiation and hence takes a lot of time to converge.

Is the system ID expected to be different for each link? Shouldn't it be the same for all links?

As you can see below, system id of slave 0 is {0xac, 0x1f, 0x6b, 0x8d, 0xd7, 0xbd}
system id of slave 1 is  {0xac, 0x1f, 0x6b, 0x8d, 0xd7, 0xbc}

Due to this, when the active slave goes down, system id changes.

Shown this to Doherty, Declan <declan.doherty@intel.com>

----- Logs from DPDK application ----- 
Breakpoint 1, rx_machine (internals=0x11409edf00, slave_id=1, lacp=0x13ae8b4be)
at /root/contrail/third_party/dpdk/drivers/net/bonding/rte_eth_bond_8023ad.c:326
326     /root/contrail/third_party/dpdk/drivers/net/bonding/rte_eth_bond_8023ad.c: No such file or directory.
(gdb) p/x *port
$2 = {actor_state = 0x3f, actor = {system_priority = 0xffff, system = {addr_bytes = {0xac, 0x1f, 0x6b,
0x8d, 0xd7, 0xbd}}, key = 0x2100, port_priority = 0xff00, port_number = 0x200},
partner_state = 0x3f, partner = {system_priority = 0x7f00, system = {addr_bytes = {0x5c, 0x45, 0x27,
0x49, 0x64, 0x8c}}, key = 0x1500, port_priority = 0x7f00, port_number = 0x1000},
sm_flags = 0x202, selected = 0x2, forced_rx_flags = 0x1, current_while_timer = 0x23e52ecad117e2,
periodic_timer = 0x23e52d82f5c5a2, wait_while_timer = 0x23dae7bc8488e2,
tx_machine_timer = 0x23e52d4e8213a1, tx_marker_timer = 0x0, aggregator_port_id = 0x0,
mbuf_pool = 0x113f5db580, rx_ring = 0x113fa00bc0, tx_ring = 0x113fa00980, rx_marker_timer = 0x0,
warning_timer = 0x23db47b16ff07d, warnings_to_show = 0x10, slow_pool = 0x0}
 
Breakpoint 1, rx_machine (internals=0x11409edf00, slave_id=0, lacp=0x13d7ce87e)
at /root/contrail/third_party/dpdk/drivers/net/bonding/rte_eth_bond_8023ad.c:326
326     in /root/contrail/third_party/dpdk/drivers/net/bonding/rte_eth_bond_8023ad.c
(gdb) p/x *port
$3 = {actor_state = 0x8f, actor = {system_priority = 0xffff, system = {addr_bytes = {0xac, 0x1f, 0x6b,
0x8d, 0xd7, 0xbc}}, key = 0x2100, port_priority = 0xff00, port_number = 0x100},
partner_state = 0x3f, partner = {system_priority = 0x7f00, system = {addr_bytes = {0x5c, 0x45, 0x27,
0x49, 0x64, 0x8c}}, key = 0x1500, port_priority = 0x7f00, port_number = 0x1100},
sm_flags = 0x202, selected = 0x2, forced_rx_flags = 0x1, current_while_timer = 0x23e537ad7f62bc,
periodic_timer = 0x23e5369a1fcadb, wait_while_timer = 0x23da0985aca4f4,
tx_machine_timer = 0x23e53665ac40d8, tx_marker_timer = 0x0, aggregator_port_id = 0x0,
mbuf_pool = 0x114022b800, rx_ring = 0x1140607600, tx_ring = 0x11406073c0, rx_marker_timer = 0x0,
warning_timer = 0x23e536a73cfc50, warnings_to_show = 0x10, slow_pool = 0x0}

---------


---- Logs from Juniper QFX switch ----
= =
 
root@a6-qfx1# run show lacp interfaces ae20
Aggregated interface: ae20
    LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
      xe-0/0/23      Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
      xe-0/0/23    Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active
      xe-0/0/20      Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
      xe-0/0/20    Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active
    LACP protocol:        Receive State  Transmit State          Mux State
      xe-0/0/23                 Current   Fast periodic Collecting distributing
      xe-0/0/20                 Current   Fast periodic Collecting distributing
 
[edit]
root@a6-qfx1# run show interfaces ae20 extensive | find LACP
    LACP info:        Role     System             System      Port    Port  Port
                             priority          identifier  priority  number   key
      xe-0/0/20.0    Actor        127  5c:45:27:49:64:8c       127      17    21
      xe-0/0/20.0  Partner      65535  ac:1f:6b:8d:d7:bc       255       1    33
      xe-0/0/23.0    Actor        127  5c:45:27:49:64:8c       127      16    21
      xe-0/0/23.0  Partner      65535  ac:1f:6b:8d:d7:bc       255       2    33
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/20.0          1541355     1441147            0            0
      xe-0/0/23.0          1727402     1601884            0            0
    Marker Statistics:   Marker Rx     Resp Tx   Unknown Rx   Illegal Rx
      xe-0/0/20.0                0           0            0            0
      xe-0/0/23.0                0           0            0            0
    Protocol eth-switch, MTU: 9216, Generation: 164, Route table: 0
      Flags: None
 
 
05:28:19.157776  In LACPv1, length 110
        Actor Information TLV (0x01), length 20
          System ac:1f:6b:8d:d7:bc, System Priority 65535, Key 33, Port 2, Port Priority 255
          State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing]
        Partner Information TLV (0x02), length 20
          System 5c:45:27:49:64:8c, System Priority 127, Key 21, Port 16, Port Priority 127
          State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing]
        Collector Information TLV (0x03), length 16
          Max Delay 0
        Terminator TLV (0x00), length 0
 
 
[edit]
root@a6-qfx1#
 
[edit]
root@a6-qfx1# set interfaces xe-0/0/20 disable
 
root@a6-qfx1# commit
commit complete
 
[edit]
root@a6-qfx1# run show lacp interfaces ae20
Aggregated interface: ae20
    LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
      xe-0/0/23      Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
      xe-0/0/23    Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active
      xe-0/0/20      Actor    No   Yes    No   No   No   Yes     Fast    Active
      xe-0/0/20    Partner    No   Yes    No   No   No   Yes     Fast   Passive
    LACP protocol:        Receive State  Transmit State          Mux State
      xe-0/0/23                 Current   Fast periodic Collecting distributing
      xe-0/0/20           Port disabled     No periodic           Detached
 
[edit]
root@a6-qfx1# run show interfaces ae20 extensive | find LACP
    LACP info:        Role     System             System      Port    Port  Port
                             priority          identifier  priority  number   key
      xe-0/0/20.0    Actor        127  5c:45:27:49:64:8c       127      17    21
      xe-0/0/20.0  Partner      65535  ac:1f:6b:8d:d7:bd         1      17    33
      xe-0/0/23.0    Actor        127  5c:45:27:49:64:8c       127      16    21
      xe-0/0/23.0  Partner      65535  ac:1f:6b:8d:d7:bd       255       2    33 =>>> notice the change in Linux system-id “:bc to :bd”
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/20.0          1541397     1441186            0            0
      xe-0/0/23.0          1727471     1601948            0            0
    Marker Statistics:   Marker Rx     Resp Tx   Unknown Rx   Illegal Rx
      xe-0/0/20.0                0           0            0            0
      xe-0/0/23.0                0           0            0            0
    Protocol eth-switch, MTU: 9216, Generation: 164, Route table: 0
      Flags: None
 
[edit]
root@a6-qfx1#
 
05:28:20.306105  In LACPv1, length 110
        Actor Information TLV (0x01), length 20
          System ac:1f:6b:8d:d7:bd, System Priority 65535, Key 33, Port 2, Port Priority 255
          State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting]
        Partner Information TLV (0x02), length 20
          System 5c:45:27:49:64:8c, System Priority 127, Key 21, Port 16, Port Priority 127
          State Flags [Activity, Timeout, Aggregation, Synchronization]
        Collector Information TLV (0x03), length 16
          Max Delay 0
        Terminator TLV (0x00), length 0
 
= =
--------
Comment 1 Kiran 2020-10-21 09:54:18 CEST
Hi Folks, any update on this one?
Comment 2 Nandini 2021-03-10 22:44:29 CET
Created attachment 147 [details]
Proposed patch

This patch addresses  https://bugs.dpdk.org/show_bug.cgi?id=550 as well.
Comment 3 gaoxiangliu0 2022-05-19 10:09:09 CEST
Comment on attachment 147 [details]
Proposed patch

hi Nandini,
 Is there a plan to propose the patch into dpdk 21.11?

Note You need to log in before you can comment on or make changes to this bug.