[dpdk-dev] [PATCH] librte_eal: fix mcslock hang on weak memory

Diogo Behrens diogo.behrens at huawei.com
Thu Aug 27 10:56:46 CEST 2020


Hi Phil,

thanks for taking a look at this. Indeed, the cmpxchg will have release semantics, but implicit barriers do not provide the same ordering guarantees as DMB ISH, ie, atomic_thread_fence(__ATOMIC_ACQ_REL).

Let me try to explain the scenario. Here is a pseudo-ARM assembly with the instructions involved:

STR me->locked  = 1   // 1: initialize the node
LDAXR pred = tail  // 2: cmpxchg
STLXR tail = me    // 3: cmpxchg
STR pred->next = me // 4: the problematic store

Now, ARM allows instruction 1 and 2 to be reordered, and also instructions 3 and 4 to be reordered:

LDAXR pred = tail  // 2: cmpxchg  (acquires can be moved up)
STR me-> locked = 1   // 1: initialize the node
STR pred->next = me // 4: the problematic store
STLXR tail = me    // 3: cmpxchg (releases can be moved down)

And since stores 1 and 4 can be reordered too, the following execution is possible:

LDAXR pred = tail  // 2: cmpxchg
STR pred->next = me // 4: the problematic store
STR me-> locked = 1   // 1: initialize the node
STLXR tail = me    // 3: cmpxchg

In this subtle situation, the locking-thread can overwrite the store to next->locked of the unlocking-thread (if it occurs between 4 and 1), and subsequently hang waiting for me->locked to become 0.

We couldn't reproduce this with our available hardware, but we "reproduced" it with the RMEM model checker, which precisely models the ARM memory model (https://github.com/rems-project/rmem). The GenMC model checker (https://github.com/MPI-SWS/genmc) also finds the issue, but its intermediate memory model is slightly more general than the ARM model.

The release attached to store (4) prevents (1) to reordering with it.

Please, let me know if that makes sense or if I am missing something.

Thanks,
-Diogo

-----Original Message-----
From: Phil Yang [mailto:Phil.Yang at arm.com] 
Sent: Wednesday, August 26, 2020 12:17 PM
To: Diogo Behrens <diogo.behrens at huawei.com>
Cc: dev at dpdk.org; nd <nd at arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>; nd <nd at arm.com>
Subject: RE: [PATCH] librte_eal: fix mcslock hang on weak memory

Diogo Behrens <diogo.behrens at huawei.com> writes:

> Subject: [PATCH] librte_eal: fix mcslock hang on weak memory
> 
>     The initialization me->locked=1 in lock() must happen before
>     next->locked=0 in unlock(), otherwise a thread may hang forever,
>     waiting me->locked become 0. On weak memory systems (sHi Puch as ARMv8),
>     the current implementation allows me->locked=1 to be reordered with
>     announcing the node (pred->next=me) and, consequently, to be
>     reordered with next->locked=0 in unlock().
> 
>     This fix adds a release barrier to pred->next=me, forcing
>     me->locked=1 to happen before this operation.
> 
> Signed-off-by: Diogo Behrens <diogo.behrens at huawei.com>
> ---
>  lib/librte_eal/include/generic/rte_mcslock.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/include/generic/rte_mcslock.h
> b/lib/librte_eal/include/generic/rte_mcslock.h
> index 2bef28351..ce553f547 100644
> --- a/lib/librte_eal/include/generic/rte_mcslock.h
> +++ b/lib/librte_eal/include/generic/rte_mcslock.h
> @@ -68,7 +68,14 @@ rte_mcslock_lock(rte_mcslock_t **msl, rte_mcslock_t
> *me)
>  		 */
>  		return;
>  	}
> -	__atomic_store_n(&prev->next, me, __ATOMIC_RELAXED);
> +	/* The store to me->next above should also complete before the
> node is
> +	 * visible to predecessor thread releasing the lock. Hence, the store
> +	 * prev->next also requires release semantics. Note that, for
> example,
> +	 * on ARM, the release semantics in the exchange operation is not
> +	 * strong as a release fence and is not sufficient to enforce the
> +	 * desired order here.


Hi Diogo,

I didn't quite understand why the exchange operation with ACQ_REL memory ordering is not sufficient.
It emits 'stlxr' on armv8.0-a and 'swpal' on armv8.1-a (with LSE support). 
Both of these two instructions contain a release semantics. 

Please check the URL for the detail.
https://gcc.godbolt.org/z/Mc4373

BTW, if you captured a deadlock issue on your testbed, could you please share your test case here?

Thanks,
Phil


> +	 */
> +	__atomic_store_n(&prev->next, me, __ATOMIC_RELEASE);
> 
>  	/* The while-load of me->locked should not move above the previous
>  	 * store to prev->next. Otherwise it will cause a deadlock. Need a
> --
> 2.28.0
> 



More information about the dev mailing list