[dpdk-stable] [dpdk-dev] [PATCH] net/bonding: fix slave activation simultaneously

Matan Azrad matan at mellanox.com
Tue May 15 10:24:26 CEST 2018


Hi

From: Chas Williams
> There's possibly an issue here:
> 
>         /* If the device isn't started don't handle interrupts */
>         if (!bonded_eth_dev->data->dev_started)
>                 return rc;
> 
>         /* verify that port_id is a valid slave of bonded port */
>         for (i = 0; i < internals->slave_count; i++) {
>                 if (internals->slaves[i].port_id == port_id) {
>                         valid_slave = 1;
>                         break;
>                 }
>         }
> 
>         if (!valid_slave)
>                 return rc;
> 
>         /* Synchronize lsc callback parallel calls either by real link event
>          * from the slaves PMDs or by the bonding PMD itself.
>          */
>         rte_spinlock_lock(&internals->lsc_lock);
> 
> Nothing keeps the master thread from modifying internals while this is running
> and your slave port may suddenly no longer be a slave port by the time you get
> to lsc_lock.
> I think there is probably an issue dev_started as well.  Again, nothing keeps the
> LSC thread from attempting to activate a slave that may have just been
> stopped/removed.

Yes, you right, the same issue for data-path functions, I think there is a big synchronization work that should be done for the bonding PMD.
This patch just synchronizes the callback parallel calls and solved an issue saw in mlx5 device, and do not pretend to solve all the sync issues.

> 
> On Mon, May 14, 2018 at 8:41 AM, Ferruh Yigit <ferruh.yigit at intel.com>
> wrote:
> On 5/14/2018 12:45 PM, Doherty, Declan wrote:
> > On 24/04/2018 12:29 PM, Matan Azrad wrote:
> >> The bonding PMD decides to activate\deactivate its slaves according
> >> to the slaves link statuses.
> >> Thus, it registers to the LSC events of the slaves ports and
> >> activates\deactivates them from its LSC callbacks called
> >> asynchronously by the host thread when the slave link status is changed.
> >>
> >> In addition, the bonding PMD uses the callback for slave activation
> >> when it tries to start it, this operation is probably called by the
> >> master thread.
> >>
> >> Consequently, a slave may be activated in the same time by two
> >> different threads and may cause a lot of optional errors, for
> >> example, slave mempool recreation with the same name causes an error.
> >>
> >> Synchronize the critical section in the LSC callback using a special
> >> new spinlock.
> >>
> >> Fixes: 414b202343ce ("bonding: fix initial link status of slave")
> >> Fixes: a45b288ef21a ("bond: support link status polling")
> >> Cc: stable at dpdk.org
> >>
> >> Signed-off-by: Matan Azrad <matan at mellanox.com>
> >> ---
> > ...
> >>
> >
> > Acked-by: Declan Doherty <declan.doherty at intel.com>
> >
> Applied to dpdk-next-net/master, thanks.
> 




More information about the stable mailing list