[dpdk-users] bonding driver LACP mode issues

Alex Kiselev kiselev99 at gmail.com
Sat Jul 1 20:23:08 CEST 2017


Hi!

Working with the bonding driver mode 4 (LACP) several times I am stuck
in a situation when link aggregation port stopped forwarding packets after some
time of normal operation. Recreating aggregation group on the switch didn't help
in that situations. The only way out was to restart my application.

I started investigating the source code of the bonding driver and discovered
that the rx_machine() function doesn't follow IEEE Std 802.1AX-2008 standard.

It looks like the following part of rx_machine() code implements 
the recordPDU function described in the section "5.4.9 Functions" of the standard.

		bool match = port->actor.system_priority ==
			lacp->partner.port_params.system_priority &&
			is_same_ether_addr(&agg->actor.system,
			&lacp->partner.port_params.system) &&
			port->actor.port_priority ==
			lacp->partner.port_params.port_priority &&
			port->actor.port_number ==
			lacp->partner.port_params.port_number;
			
		...

		/* If LACP partner params match this port actor params */
		if (match == true && ACTOR_STATE(port, AGGREGATION) ==
				PARTNER_STATE(port,	AGGREGATION))
			PARTNER_STATE_SET(port, SYNCHRONIZATION);
		else if (!PARTNER_STATE(port, AGGREGATION) && ACTOR_STATE(port,
				AGGREGATION))
			PARTNER_STATE_SET(port, SYNCHRONIZATION);
		else
			PARTNER_STATE_CLR(port, SYNCHRONIZATION);

Problem #1:
According to recordPDU function, the "Partner_Key" parameter carried in the
received PDU should be compared to Actor_Oper_Port_Key.
But the bonding driver doesn't do it. It only compares system_priority, system, port_priority and
port_number when evaluated match variable.

Problem #2:
Also, the standard indicates that:
"Partner_Oper_Port_State.Synchronization is set to TRUE if all of these parameters match,
Actor_State.Synchronization in the received PDU is set to TRUE, and LACP will actively
maintain the link in the aggregation."

But the bonding driver doesn't check that Actor_State.Synchronization in the received PDU is set to TRUE.

Problem #3:
Also, the standard indicates that:
"Partner_Oper_Port_State.Synchronization is also set to TRUE if the value of
Actor_State.Aggregation in the received PDU is set to FALSE (i.e., indicates an Individual
link), Actor_State.Synchronization in the received PDU is set to TRUE, and LACP will
actively maintain the link."

The bonding driver only partly follows that rule and doesn't check
that Actor_State.Synchronization in the received PDU is set to TRUE.
Also, it checks
ACTOR_STATE(port, AGGREGATION)
but the standard doesn't say anything about this.

My proposal is to replace partner state sync flag evalution block with the a following
one in order to more strictly follow the standart:

		/* If LACP partner params match this port actor params */
		if ((match == true && lacp->partner.port_params.key == port->actor.key &&
				  ACTOR_STATE(port, AGGREGATION) == PARTNER_STATE(port, AGGREGATION) &&
				  STATE_FLAG(lacp->actor.state, SYNCHRONIZATION) == true) ||
			(STATE_FLAG(lacp->actor.state, AGGREGATION) == false &&
				  STATE_FLAG(lacp->actor.state, SYNCHRONIZATION) == true)
			)
			PARTNER_STATE_SET(port, SYNCHRONIZATION);
		else
			PARTNER_STATE_CLR(port, SYNCHRONIZATION);

...

#define STATE_FLAG(_p, _f) (!!CHECK_FLAGS(_p, STATE_ ## _f))


I am not sure yet if the described problems are causing the driver to stuck in a kind of deadlock situation
in my application, but I think they might be the sources of my problem.

Could someone take a look at my suggestions and help
me to find out why my LACP boding port doesn't work correctly?

Thank you.

--
Alex Kiselev


More information about the users mailing list