[dpdk-dev] [PATCH v5 1/1] ring: guarantee load/load order in enqueue and dequeue
Jia He
hejianet at gmail.com
Fri Nov 10 02:51:09 CET 2017
We watched a rte panic of mbuf_autotest in our qualcomm arm64 server.
In __rte_ring_move_cons_head()
...
do {
/* Restore n as it may change every loop */
n = max;
*old_head = r->cons.head; //1st load
const uint32_t prod_tail = r->prod.tail; //2nd load
cpu1(producer) cpu2(consumer) cpu3(consumer)
load r->prod.tail
in enqueue:
load r->cons.tail
load r->prod.head
store r->prod.tail
load r->cons.head
load r->prod.tail
...
store r->cons.{head,tail}
load r->cons.head
In weak memory order architectures(powerpc,arm), the 2nd load might be
reodered before the 1st load, that makes *entries is bigger than we
wanted. This nasty reording messed enque/deque up. Then, r->cons.head
will be bigger than prod_tail, then make *entries very big and the
consumer will go forward incorrectly.
After this patch, even with above context switches, the old cons.head
will be recaculated after failure of rte_atomic32_cmpset. So no race
conditions left.
There is no such issue on X86, because X86 is strong memory order model.
But rte_smp_rmb() doesn't have impact on runtime performance on X86, so
keep the same code without architectures specific concerns.
Signed-off-by: Jia He <jia.he at hxt-semitech.com>
Signed-off-by: jie2.liu at hxt-semitech.com
Signed-off-by: bing.zhao at hxt-semitech.com
---
lib/librte_ring/rte_ring.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 5e9b3b7..3e8085a 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -409,6 +409,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, int is_sp,
n = max;
*old_head = r->prod.head;
+
+ /* add rmb barrier to avoid load/load reorder in weak
+ * memory model. It is noop on x86 */
+ rte_smp_rmb();
+
const uint32_t cons_tail = r->cons.tail;
/*
* The subtraction is done between two unsigned 32bits value
@@ -517,6 +522,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
n = max;
*old_head = r->cons.head;
+
+ /* add rmb barrier to avoid load/load reorder in weak
+ * memory model. It is noop on x86 */
+ rte_smp_rmb();
+
const uint32_t prod_tail = r->prod.tail;
/* The subtraction is done between two unsigned 32bits value
* (the result is always modulo 32 bits even if we have
--
2.7.4
More information about the dev
mailing list