[dpdk-stable] patch 'ring/c11: keep deterministic order allowing retry to work' has been queued to stable release 18.08.1

Kevin Traynor ktraynor at redhat.com
Fri Nov 23 11:27:02 CET 2018


Hi,

FYI, your patch has been queued to stable release 18.08.1

Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet.
It will be pushed if I get no objections before 11/29/18. So please
shout if anyone has objections.

Also note that after the patch there's a diff of the upstream commit vs the patch applied
to the branch. If the code is different (ie: not only metadata diffs), due for example to
a change in context or macro names, please double check it.

Thanks.

Kevin Traynor

---
>From 20a505bd4cc15804bfafd8e401c19c14cf2fe17c Mon Sep 17 00:00:00 2001
From: Gavin Hu <gavin.hu at arm.com>
Date: Fri, 9 Nov 2018 19:42:46 +0800
Subject: [PATCH] ring/c11: keep deterministic order allowing retry to work

[ upstream commit 86757c2c3ed5006940f93725d39131dfb0d09b60 ]

Use case scenario:
1) Thread 1 is enqueuing. It reads prod.head and gets stalled for some
   reasons (running out of cpu time, preempted,...)
2) Thread 2 is enqueuing. It succeeds in enqueuing and moves prod.head
   forward.
3) Thread 3 is dequeuing. It succeeds in dequeuing and moves the cons.tail
   beyond the prod.head read by thread 1.
4) Thread 1 is re-scheduled. It reads cons.tail.

cpu1(producer)      cpu2(producer)          cpu3(consumer)
load r->prod.head
    ^               load r->prod.head
    |               load r->cons.tail
    |               store r->prod.head(+n)
  stalled           <-- enqueue ----->
    |               store r->prod.tail(+n)
    |                                        load r->cons.head
    |                                        load r->prod.tail
    |                                        store r->cons.head(+n)
    |                                        <...dequeue.....>
    v                                        store r->cons.tail(+n)
load r->cons.tail

For thread 1, the __atomic_compare_exchange_n detects the outdated
prod.head and retry the flow with the new one. This retry flow works ok on
strong ordering platform(eg:x86). But for weak ordering platforms(arm,
ppc), loading cons.tail and prod.head might be re-ordered, prod.head is new
but cons.tail becomes too old, the retry flow, based on the detection of
outdated head, does not trigger as expected, thus the outdate cons.tail
causes wrong free_entries.

Similarly, for dequeuing, outdated prod.tail leads to wrong avail_entries.

The fix is to keep the deterministic order of two loads allowing the retry
to work.

Run the ring perf test on the following testbed:
HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core, 4 threads/core, 2.5GHz
OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic
DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc
gcc: 8.1.0
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \
--socket-mem=1024 -- -i

Without the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.64
MP/MC bulk enq/dequeue (size: 8): 9.58
SP/SC bulk enq/dequeue (size: 32): 1.98
MP/MC bulk enq/dequeue (size: 32): 2.30

With the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.75
MP/MC bulk enq/dequeue (size: 8): 10.18
SP/SC bulk enq/dequeue (size: 32): 1.80
MP/MC bulk enq/dequeue (size: 32): 2.34

The results showed the thread fence degrade the performance slightly, but
it is required for correctness.

Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")

Signed-off-by: Gavin Hu <gavin.hu at arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
Reviewed-by: Steve Capper <steve.capper at arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl at arm.com>
---
 lib/librte_ring/rte_ring_c11_mem.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
index 7bc74a4cb..dc49a998f 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -67,4 +67,7 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 		n = max;
 
+		/* Ensure the head is read before tail */
+		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+
 		/* load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
@@ -140,4 +143,7 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 		n = max;
 
+		/* Ensure the head is read before tail */
+		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+
 		/* this load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
-- 
2.19.0

---
  Diff of the applied patch vs upstream commit (please double-check if non-empty:
---
--- -	2018-11-23 10:22:55.753727339 +0000
+++ 0058-ring-c11-keep-deterministic-order-allowing-retry-to-.patch	2018-11-23 10:22:54.000000000 +0000
@@ -1,8 +1,10 @@
-From 86757c2c3ed5006940f93725d39131dfb0d09b60 Mon Sep 17 00:00:00 2001
+From 20a505bd4cc15804bfafd8e401c19c14cf2fe17c Mon Sep 17 00:00:00 2001
 From: Gavin Hu <gavin.hu at arm.com>
 Date: Fri, 9 Nov 2018 19:42:46 +0800
 Subject: [PATCH] ring/c11: keep deterministic order allowing retry to work
 
+[ upstream commit 86757c2c3ed5006940f93725d39131dfb0d09b60 ]
+
 Use case scenario:
 1) Thread 1 is enqueuing. It reads prod.head and gets stalled for some
    reasons (running out of cpu time, preempted,...)
@@ -65,7 +67,6 @@
 it is required for correctness.
 
 Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
-Cc: stable at dpdk.org
 
 Signed-off-by: Gavin Hu <gavin.hu at arm.com>
 Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>


More information about the stable mailing list