There are two ports in my bond and two hosts are connected by a switch. I open the dpdk debug info with macro RTE_LIBRTE_BOND_DEBUG_8023AD. Port 0 MAC: ac:f9:70:88:f3:26 Port 1 MAC: ac:f9:70:88:f3:27 BOND MAC: ac:f9:70:88:f3:26 When tx_machine send lacp with Port 1 Mac ac:f9:70:88:f3:27, the handshake will fail. when lacp handshake failed, log like this: ---------- 997 [Port 0: rx_machine] -> INITIALIZE 997 [Port 0: periodic_machine] -> NO_PERIODIC ( begind LACP active ) 997 [Port 0: mux_machine] -> DETACHED 997 [Port 0: selection_logic] -> SELECTED: ID= 1 aggregator found aggregator ID= 1 997 [Port 0: mux_machine] DETACHED -> WAITING 1995 [Port 1: tx_machine] Sending LACP frame bond_print_lacp(122) - LACP: { subtype= 01 ver_num=01 actor={ tlv=01, len=14 pri=FFFF, system=AC:F9:70:88:F3:27, key=2100, p_pri=FF00 p_num=0200 state={ ACT AGG DEF EXP } } partner={ tlv=02, len=14 pri=FFFF, system=00:00:00:00:00:00, key=0100, p_pri=FF00 p_num=0000 state={ ACT TIMEOUT AGG } } collector={info=03, length=10, max_delay=0000 , type_term=00, terminator_length = 00 } 1995 [Port 0: tx_machine] Sending LACP frame bond_print_lacp(122) - LACP: { subtype= 01 ver_num=01 actor={ tlv=01, len=14 pri=FFFF, system=AC:F9:70:88:F3:27, key=2100, p_pri=FF00 p_num=0100 state={ ACT AGG DEF EXP } } partner={ tlv=02, len=14 pri=FFFF, system=00:00:00:00:00:00, key=0100, p_pri=FF00 p_num=0000 state={ ACT TIMEOUT AGG } } collector={info=03, length=10, max_delay=0000 , type_term=00, terminator_length = 00 } 2095 [Port 1: mux_machine] ATTACHED Entered 2594 [Port 1: tx_machine] Sending LACP frame ---------- when lacp handshake succeeds, log like this: ---------- 0 [Port 0: rx_machine] -> INITIALIZE 0 [Port 0: periodic_machine] -> NO_PERIODIC ( begind LACP active ) 0 [Port 0: mux_machine] -> DETACHED 99 [Port 0: mux_machine] DETACHED -> WAITING Waiting for slaves to become active... Port 2 MAC: ac:f9:70:88:f3:26 236 [Port 1: rx_machine] -> INITIALIZE 236 [Port 1: periodic_machine] -> NO_PERIODIC ( begind LACP active ) 236 [Port 1: mux_machine] -> DETACHED 236 [Port 1: selection_logic] -> SELECTED: ID= 0 aggregator found aggregator ID= 0 236 [Port 1: mux_machine] DETACHED -> WAITING 1034 [Port 0: tx_machine] Sending LACP frame 1034 [Port 0: tx_machine] Sending LACP frame bond_print_lacp(122) - LACP: { subtype= 01 ver_num=01 actor={ tlv=01, len=14 pri=FFFF, system=AC:F9:70:88:F3:26, key=2100, p_pri=FF00 p_num=0100 state={ ACT AGG DEF EXP } } partner={ tlv=02, len=14 pri=FFFF, system=00:00:00:00:00:00, key=0100, p_pri=FF00 p_num=0000 state={ ACT TIMEOUT AGG } } collector={info=03, length=10, max_delay=0000 , type_term=00, terminator_length = 00 } 1234 [Port 1: tx_machine] Sending LACP frame bond_print_lacp(122) - LACP: { subtype= 01 ver_num=01 actor={ tlv=01, len=14 pri=FFFF, system=AC:F9:70:88:F3:26, key=2100, p_pri=FF00 p_num=0200 state={ ACT AGG DEF EXP } } partner={ tlv=02, len=14 pri=FFFF, system=00:00:00:00:00:00, key=0100, p_pri=FF00 p_num=0000 state={ ACT TIMEOUT AGG } } collector={info=03, length=10, max_delay=0000 , type_term=00, terminator_length = 00 } 2032 [Port 0: tx_machine] Sending LACP frame 2332 [Port 1: rx_machine] LACP -> CURRENT bond_print_lacp(122) - LACP: { subtype= 01 ver_num=01 actor={ tlv=01, len=14 pri=0080, system=F8:98:EF:69:83:91, key=417F, p_pri=0080 p_num=0600 state={ ACT TIMEOUT AGG } } partner={ tlv=02, len=14 pri=FFFF, system=AC:F9:70:88:F3:26, key=2100, p_pri=FF00 p_num=0200 state={ ACT AGG DEF EXP } } collector={info=03, length=10, max_delay=0000 , type_term=00, terminator_length = 00 } 2332 [Port 1: mux_machine] ATTACHED Entered ---------- Through my observation: when log print "SELECTED: ID= 1", it uses the wrong mac address to send lacp. selection_logic function choose wrong aggregator_port_id here. rte_eth_bond_8023ad.c:749 case AGG_STABLE: if (default_slave == slaves_count) new_agg_id = slaves[slave_id]; else new_agg_id = slaves[default_slave]; // sometimes new_agg_id will be 1 why does the lacp handshake succeed sometimes? The "slaves" array is filled with unsure order by function "activate_slave". When port 0 fill the slave[0], It works correctly.
Chas - can you please check? Thanks
index b77a37d..48242a9 100644 --- a/FStackV1.12/dpdk/drivers/net/bonding/rte_eth_bond_8023ad.c +++ b/FStackV1.12/dpdk/drivers/net/bonding/rte_eth_bond_8023ad.c @@ -658,6 +658,25 @@ max_index(uint64_t *a, int n) return max_i; } +static uint16_t +min_index(uint16_t *a, uint16_t n) +{ + if (n <= 0) + return -1; + + int i, min_i = 0; + uint64_t min = a[0]; + + for (i = 1; i < n; ++i) { + if (a[i] < min) { + min = a[i]; + min_i = i; + } + } + + return min_i; +} + /** * Function assigns port to aggregator. * @@ -728,7 +747,11 @@ selection_logic(struct bond_dev_private *internals, uint16_t slave_id) if (default_slave == slaves_count) new_agg_id = slaves[slave_id]; else - new_agg_id = slaves[default_slave]; + { + //new_agg_id = slaves[default_slave]; + agg_new_idx = min_index(slaves, slaves_count); + new_agg_id = slaves[agg_new_idx]; + } break; default: if (default_slave == slaves_count) This is my patch, It works well for me.
Markopeng - can you submit the patch to the mailing list formally? Thnaks
Thanks, this works for me.