[dpdk-dev] [PATCH 2/2] bond: add mode 4 support

Wodkowski, PawelX pawelx.wodkowski at intel.com
Thu Sep 18 10:07:31 CEST 2014


> > +int
> > +bond_mode_8023ad_deactivate_slave(struct rte_eth_dev *bond_dev,
> > +		uint8_t slave_pos)
> > +{
> > +	struct bond_dev_private *internals = bond_dev->data->dev_private;
> > +	struct mode8023ad_data *data = &internals->mode4;
> > +	struct port *port;
> > +	uint8_t i;
> > +
> > +	bond_mode_8023ad_stop(bond_dev);
> > +
> > +	/* Exclude slave from transmit policy. If this slave is an aggregator
> > +	 * make all aggregated slaves unselected to force sellection logic
> > +	 * to select suitable aggregator for this port	 */
> > +	for (i = 0; i < internals->active_slave_count; i++) {
> > +		port = &data->port_list[slave_pos];
> > +		if (port->used_agregator_idx == slave_pos) {
> > +			port->selected = UNSELECTED;
> > +			port->actor_state &= ~(STATE_SYNCHRONIZATION |
> STATE_DISTRIBUTING |
> > +				STATE_COLLECTING);
> > +
> > +			/* Use default aggregator */
> > +			port->used_agregator_idx = i;
> > +		}
> > +	}
> > +
> > +	port = &data->port_list[slave_pos];
> > +	timer_cancel(&port->current_while_timer);
> > +	timer_cancel(&port->periodic_timer);
> > +	timer_cancel(&port->wait_while_timer);
> > +	timer_cancel(&port->tx_machine_timer);
> > +
> These all seem rather racy.  Alarm callbacks are executed with the alarm list
> locks not held.  So there is every possibility that you could execute these (or
> any timer_cancel calls in this PMD in parallel with the internal state machine
> timer callback, and leave either with a corrupted timer list (resulting from a
> double free between here, and the actual callback site),

I don't think so. Yes, callbacks are executed with  alarm list locks not held, but 
this is not the issue because access to list itself is guarded by lock and 
ap->executing variable. So list will not be trashed. Check source of 
eal_alarm_callback(), rte_eal_alarm_set() and rte_eal_alarm_cancel().

> or a timer that is
> actually still pending when a slave is removed.
> 
This is not the issue also, but problem might be similar. I assumed that alarms
are atomic but when I looked at rte alarms closer I saw a race condition
between and rte_eal_alarm_cancel() from  bond_mode_8023ad_stop()
and rte_eal_alarm_set() from state machines callback. This need to be 
reworked in some way.


More information about the dev mailing list