[dpdk-dev] [PATCH 2/2] bond: add mode 4 support

Neil Horman nhorman at tuxdriver.com
Thu Sep 18 18:02:34 CEST 2014


On Thu, Sep 18, 2014 at 08:07:31AM +0000, Wodkowski, PawelX wrote:
> > > +int
> > > +bond_mode_8023ad_deactivate_slave(struct rte_eth_dev *bond_dev,
> > > +		uint8_t slave_pos)
> > > +{
> > > +	struct bond_dev_private *internals = bond_dev->data->dev_private;
> > > +	struct mode8023ad_data *data = &internals->mode4;
> > > +	struct port *port;
> > > +	uint8_t i;
> > > +
> > > +	bond_mode_8023ad_stop(bond_dev);
> > > +
> > > +	/* Exclude slave from transmit policy. If this slave is an aggregator
> > > +	 * make all aggregated slaves unselected to force sellection logic
> > > +	 * to select suitable aggregator for this port	 */
> > > +	for (i = 0; i < internals->active_slave_count; i++) {
> > > +		port = &data->port_list[slave_pos];
> > > +		if (port->used_agregator_idx == slave_pos) {
> > > +			port->selected = UNSELECTED;
> > > +			port->actor_state &= ~(STATE_SYNCHRONIZATION |
> > STATE_DISTRIBUTING |
> > > +				STATE_COLLECTING);
> > > +
> > > +			/* Use default aggregator */
> > > +			port->used_agregator_idx = i;
> > > +		}
> > > +	}
> > > +
> > > +	port = &data->port_list[slave_pos];
> > > +	timer_cancel(&port->current_while_timer);
> > > +	timer_cancel(&port->periodic_timer);
> > > +	timer_cancel(&port->wait_while_timer);
> > > +	timer_cancel(&port->tx_machine_timer);
> > > +
> > These all seem rather racy.  Alarm callbacks are executed with the alarm list
> > locks not held.  So there is every possibility that you could execute these (or
> > any timer_cancel calls in this PMD in parallel with the internal state machine
> > timer callback, and leave either with a corrupted timer list (resulting from a
> > double free between here, and the actual callback site),
> 
> I don't think so. Yes, callbacks are executed with  alarm list locks not held, but 
> this is not the issue because access to list itself is guarded by lock and 
> ap->executing variable. So list will not be trashed. Check source of 
> eal_alarm_callback(), rte_eal_alarm_set() and rte_eal_alarm_cancel().
> 
Yes, you're right, the list is probably safe wht the executing bit.

> > or a timer that is
> > actually still pending when a slave is removed.
> > 
> This is not the issue also, but problem might be similar. I assumed that alarms
> are atomic but when I looked at rte alarms closer I saw a race condition
> between and rte_eal_alarm_cancel() from  bond_mode_8023ad_stop()
> and rte_eal_alarm_set() from state machines callback. This need to be 
> reworked in some way.

Yes, this is what I was referring to:

CPU0				CPU1
rte_eal_alarm_callback		bond_8023ad_deactivate_slave
-bond_8023_ad_periodic_cb	timer_cancel
timer_set

If those timer functions operate on the same timer, the result is that you can
leave the stop/deactivate slave paths with a timer function for that slave still
pending. The bonding mode needs some internal state to serialize those
operations and determine if the timer should be reactivated.

Neil



More information about the dev mailing list