When one of the bond slaves with 802.3ad is disabled, the switchover takes almost 6 seconds which is not acceptable for any Telcos. We need sub-second switchover time like in linux. Testing with Juniper QFX switch. The reason is system ID is changing (to that of the other slave device) when one of the active slaves go down. This causes re-negotiation and hence takes a lot of time to converge. Is the system ID expected to be different for each link? Shouldn't it be the same for all links? As you can see below, system id of slave 0 is {0xac, 0x1f, 0x6b, 0x8d, 0xd7, 0xbd} system id of slave 1 is {0xac, 0x1f, 0x6b, 0x8d, 0xd7, 0xbc} Due to this, when the active slave goes down, system id changes. Shown this to Doherty, Declan <declan.doherty@intel.com> ----- Logs from DPDK application ----- Breakpoint 1, rx_machine (internals=0x11409edf00, slave_id=1, lacp=0x13ae8b4be) at /root/contrail/third_party/dpdk/drivers/net/bonding/rte_eth_bond_8023ad.c:326 326 /root/contrail/third_party/dpdk/drivers/net/bonding/rte_eth_bond_8023ad.c: No such file or directory. (gdb) p/x *port $2 = {actor_state = 0x3f, actor = {system_priority = 0xffff, system = {addr_bytes = {0xac, 0x1f, 0x6b, 0x8d, 0xd7, 0xbd}}, key = 0x2100, port_priority = 0xff00, port_number = 0x200}, partner_state = 0x3f, partner = {system_priority = 0x7f00, system = {addr_bytes = {0x5c, 0x45, 0x27, 0x49, 0x64, 0x8c}}, key = 0x1500, port_priority = 0x7f00, port_number = 0x1000}, sm_flags = 0x202, selected = 0x2, forced_rx_flags = 0x1, current_while_timer = 0x23e52ecad117e2, periodic_timer = 0x23e52d82f5c5a2, wait_while_timer = 0x23dae7bc8488e2, tx_machine_timer = 0x23e52d4e8213a1, tx_marker_timer = 0x0, aggregator_port_id = 0x0, mbuf_pool = 0x113f5db580, rx_ring = 0x113fa00bc0, tx_ring = 0x113fa00980, rx_marker_timer = 0x0, warning_timer = 0x23db47b16ff07d, warnings_to_show = 0x10, slow_pool = 0x0} Breakpoint 1, rx_machine (internals=0x11409edf00, slave_id=0, lacp=0x13d7ce87e) at /root/contrail/third_party/dpdk/drivers/net/bonding/rte_eth_bond_8023ad.c:326 326 in /root/contrail/third_party/dpdk/drivers/net/bonding/rte_eth_bond_8023ad.c (gdb) p/x *port $3 = {actor_state = 0x8f, actor = {system_priority = 0xffff, system = {addr_bytes = {0xac, 0x1f, 0x6b, 0x8d, 0xd7, 0xbc}}, key = 0x2100, port_priority = 0xff00, port_number = 0x100}, partner_state = 0x3f, partner = {system_priority = 0x7f00, system = {addr_bytes = {0x5c, 0x45, 0x27, 0x49, 0x64, 0x8c}}, key = 0x1500, port_priority = 0x7f00, port_number = 0x1100}, sm_flags = 0x202, selected = 0x2, forced_rx_flags = 0x1, current_while_timer = 0x23e537ad7f62bc, periodic_timer = 0x23e5369a1fcadb, wait_while_timer = 0x23da0985aca4f4, tx_machine_timer = 0x23e53665ac40d8, tx_marker_timer = 0x0, aggregator_port_id = 0x0, mbuf_pool = 0x114022b800, rx_ring = 0x1140607600, tx_ring = 0x11406073c0, rx_marker_timer = 0x0, warning_timer = 0x23e536a73cfc50, warnings_to_show = 0x10, slow_pool = 0x0} --------- ---- Logs from Juniper QFX switch ---- = = root@a6-qfx1# run show lacp interfaces ae20 Aggregated interface: ae20 LACP state: Role Exp Def Dist Col Syn Aggr Timeout Activity xe-0/0/23 Actor No No Yes Yes Yes Yes Fast Active xe-0/0/23 Partner No No Yes Yes Yes Yes Fast Active xe-0/0/20 Actor No No Yes Yes Yes Yes Fast Active xe-0/0/20 Partner No No Yes Yes Yes Yes Fast Active LACP protocol: Receive State Transmit State Mux State xe-0/0/23 Current Fast periodic Collecting distributing xe-0/0/20 Current Fast periodic Collecting distributing [edit] root@a6-qfx1# run show interfaces ae20 extensive | find LACP LACP info: Role System System Port Port Port priority identifier priority number key xe-0/0/20.0 Actor 127 5c:45:27:49:64:8c 127 17 21 xe-0/0/20.0 Partner 65535 ac:1f:6b:8d:d7:bc 255 1 33 xe-0/0/23.0 Actor 127 5c:45:27:49:64:8c 127 16 21 xe-0/0/23.0 Partner 65535 ac:1f:6b:8d:d7:bc 255 2 33 LACP Statistics: LACP Rx LACP Tx Unknown Rx Illegal Rx xe-0/0/20.0 1541355 1441147 0 0 xe-0/0/23.0 1727402 1601884 0 0 Marker Statistics: Marker Rx Resp Tx Unknown Rx Illegal Rx xe-0/0/20.0 0 0 0 0 xe-0/0/23.0 0 0 0 0 Protocol eth-switch, MTU: 9216, Generation: 164, Route table: 0 Flags: None 05:28:19.157776 In LACPv1, length 110 Actor Information TLV (0x01), length 20 System ac:1f:6b:8d:d7:bc, System Priority 65535, Key 33, Port 2, Port Priority 255 State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing] Partner Information TLV (0x02), length 20 System 5c:45:27:49:64:8c, System Priority 127, Key 21, Port 16, Port Priority 127 State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing] Collector Information TLV (0x03), length 16 Max Delay 0 Terminator TLV (0x00), length 0 [edit] root@a6-qfx1# [edit] root@a6-qfx1# set interfaces xe-0/0/20 disable root@a6-qfx1# commit commit complete [edit] root@a6-qfx1# run show lacp interfaces ae20 Aggregated interface: ae20 LACP state: Role Exp Def Dist Col Syn Aggr Timeout Activity xe-0/0/23 Actor No No Yes Yes Yes Yes Fast Active xe-0/0/23 Partner No No Yes Yes Yes Yes Fast Active xe-0/0/20 Actor No Yes No No No Yes Fast Active xe-0/0/20 Partner No Yes No No No Yes Fast Passive LACP protocol: Receive State Transmit State Mux State xe-0/0/23 Current Fast periodic Collecting distributing xe-0/0/20 Port disabled No periodic Detached [edit] root@a6-qfx1# run show interfaces ae20 extensive | find LACP LACP info: Role System System Port Port Port priority identifier priority number key xe-0/0/20.0 Actor 127 5c:45:27:49:64:8c 127 17 21 xe-0/0/20.0 Partner 65535 ac:1f:6b:8d:d7:bd 1 17 33 xe-0/0/23.0 Actor 127 5c:45:27:49:64:8c 127 16 21 xe-0/0/23.0 Partner 65535 ac:1f:6b:8d:d7:bd 255 2 33 =>>> notice the change in Linux system-id “:bc to :bd” LACP Statistics: LACP Rx LACP Tx Unknown Rx Illegal Rx xe-0/0/20.0 1541397 1441186 0 0 xe-0/0/23.0 1727471 1601948 0 0 Marker Statistics: Marker Rx Resp Tx Unknown Rx Illegal Rx xe-0/0/20.0 0 0 0 0 xe-0/0/23.0 0 0 0 0 Protocol eth-switch, MTU: 9216, Generation: 164, Route table: 0 Flags: None [edit] root@a6-qfx1# 05:28:20.306105 In LACPv1, length 110 Actor Information TLV (0x01), length 20 System ac:1f:6b:8d:d7:bd, System Priority 65535, Key 33, Port 2, Port Priority 255 State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting] Partner Information TLV (0x02), length 20 System 5c:45:27:49:64:8c, System Priority 127, Key 21, Port 16, Port Priority 127 State Flags [Activity, Timeout, Aggregation, Synchronization] Collector Information TLV (0x03), length 16 Max Delay 0 Terminator TLV (0x00), length 0 = = --------
Hi Folks, any update on this one?
Created attachment 147 [details] Proposed patch This patch addresses https://bugs.dpdk.org/show_bug.cgi?id=550 as well.
Comment on attachment 147 [details] Proposed patch hi Nandini, Is there a plan to propose the patch into dpdk 21.11?