[dpdk-dev,v5,1/1] ring: guarantee load/load order in enqueue and dequeue

Message ID 1510278669-8489-2-git-send-email-hejianet@gmail.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Jia He Nov. 10, 2017, 1:51 a.m. UTC
  We watched a rte panic of mbuf_autotest in our qualcomm arm64 server.
In __rte_ring_move_cons_head()
...
        do {
                /* Restore n as it may change every loop */
                n = max;

                *old_head = r->cons.head;                //1st load
                const uint32_t prod_tail = r->prod.tail; //2nd load

cpu1(producer)          cpu2(consumer)          cpu3(consumer)
                        load r->prod.tail
in enqueue:
load r->cons.tail
load r->prod.head

store r->prod.tail

                                                load r->cons.head
                                                load r->prod.tail
                                                ...
                                                store r->cons.{head,tail}
                        load r->cons.head

In weak memory order architectures(powerpc,arm), the 2nd load might be
reodered before the 1st load, that makes *entries is bigger than we
wanted. This nasty reording messed enque/deque up. Then, r->cons.head
will be bigger than prod_tail, then make *entries very big and the
consumer will go forward incorrectly.

After this patch, even with above context switches, the old cons.head
will be recaculated after failure of rte_atomic32_cmpset. So no race
conditions left.

There is no such issue on X86, because X86 is strong memory order model.
But rte_smp_rmb() doesn't have impact on runtime performance on X86, so
keep the same code without architectures specific concerns.

Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Signed-off-by: jie2.liu@hxt-semitech.com
Signed-off-by: bing.zhao@hxt-semitech.com
---
 lib/librte_ring/rte_ring.h | 10 ++++++++++
 1 file changed, 10 insertions(+)
  

Comments

Jerin Jacob Nov. 10, 2017, 2:46 a.m. UTC | #1
-----Original Message-----
> Date: Fri, 10 Nov 2017 01:51:09 +0000
> From: Jia He <hejianet@gmail.com>
> To: jerin.jacob@caviumnetworks.com, dev@dpdk.org, olivier.matz@6wind.com
> Cc: konstantin.ananyev@intel.com, bruce.richardson@intel.com,
>  jianbo.liu@arm.com, hemant.agrawal@nxp.com, Jia He <hejianet@gmail.com>,
>  Jia He <jia.he@hxt-semitech.com>, jie2.liu@hxt-semitech.com,
>  bing.zhao@hxt-semitech.com
> Subject: [PATCH v5 1/1] ring: guarantee load/load order in enqueue and
>  dequeue
> X-Mailer: git-send-email 2.7.4
> 
> We watched a rte panic of mbuf_autotest in our qualcomm arm64 server.

I think, the above text can be improved. Something like below.


Fix for the rte_panic() in mbuf_autotest on qualcomm arm64 server(...SoC
name...)

Root cause:
> In __rte_ring_move_cons_head()
> ...
>         do {
>                 /* Restore n as it may change every loop */
>                 n = max;
> 
>                 *old_head = r->cons.head;                //1st load
>                 const uint32_t prod_tail = r->prod.tail; //2nd load
> 
> cpu1(producer)          cpu2(consumer)          cpu3(consumer)
>                         load r->prod.tail
> in enqueue:
> load r->cons.tail
> load r->prod.head
> 
> store r->prod.tail
> 
>                                                 load r->cons.head
>                                                 load r->prod.tail
>                                                 ...
>                                                 store r->cons.{head,tail}
>                         load r->cons.head
> 
> In weak memory order architectures(powerpc,arm), the 2nd load might be
> reodered before the 1st load, that makes *entries is bigger than we
> wanted. This nasty reording messed enque/deque up. Then, r->cons.head
> will be bigger than prod_tail, then make *entries very big and the
> consumer will go forward incorrectly.
> 
> After this patch, even with above context switches, the old cons.head
> will be recaculated after failure of rte_atomic32_cmpset. So no race
> conditions left.
> 
> There is no such issue on X86, because X86 is strong memory order model.
> But rte_smp_rmb() doesn't have impact on runtime performance on X86, so
> keep the same code without architectures specific concerns.
> 
> Signed-off-by: Jia He <jia.he@hxt-semitech.com>
> Signed-off-by: jie2.liu@hxt-semitech.com
> Signed-off-by: bing.zhao@hxt-semitech.com

➜ [master][dpdk.org] $ ./devtools/checkpatches.sh 

### ring: guarantee load/load order in enqueue and dequeue

WARNING:BLOCK_COMMENT_STYLE: Block comments use a trailing */ on a
separate line
#58: FILE: lib/librte_ring/rte_ring.h:414:
+		 * memory model. It is noop on x86 */

WARNING:BLOCK_COMMENT_STYLE: Block comments use a trailing */ on a
separate line
#70: FILE: lib/librte_ring/rte_ring.h:527:
+		 * memory model. It is noop on x86 */

total: 0 errors, 2 warnings, 22 lines checked

0/1 valid patch
➜ [master][dpdk.org] $ ./devtools/check-git-log.sh 
Wrong tag:
	Signed-off-by: jie2.liu@hxt-semitech.com
	Signed-off-by: bing.zhao@hxt-semitech.com


With above fixes:
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
  
Jianbo Liu Nov. 10, 2017, 3:12 a.m. UTC | #2
The 11/10/2017 08:16, Jerin Jacob wrote:
> -----Original Message-----
> > Date: Fri, 10 Nov 2017 01:51:09 +0000
> > From: Jia He <hejianet@gmail.com>
> > To: jerin.jacob@caviumnetworks.com, dev@dpdk.org, olivier.matz@6wind.com
> > Cc: konstantin.ananyev@intel.com, bruce.richardson@intel.com,
> >  jianbo.liu@arm.com, hemant.agrawal@nxp.com, Jia He <hejianet@gmail.com>,
> >  Jia He <jia.he@hxt-semitech.com>, jie2.liu@hxt-semitech.com,
> >  bing.zhao@hxt-semitech.com
> > Subject: [PATCH v5 1/1] ring: guarantee load/load order in enqueue and
> >  dequeue
> > X-Mailer: git-send-email 2.7.4
> >
> > We watched a rte panic of mbuf_autotest in our qualcomm arm64 server.
>
> I think, the above text can be improved. Something like below.
>
>
> Fix for the rte_panic() in mbuf_autotest on qualcomm arm64 server(...SoC
> name...)
>
> Root cause:
> > In __rte_ring_move_cons_head()
> > ...
> >         do {
> >                 /* Restore n as it may change every loop */
> >                 n = max;
> >
> >                 *old_head = r->cons.head;                //1st load
> >                 const uint32_t prod_tail = r->prod.tail; //2nd load
> >
> > cpu1(producer)          cpu2(consumer)          cpu3(consumer)
> >                         load r->prod.tail
> > in enqueue:
> > load r->cons.tail
> > load r->prod.head
> >
> > store r->prod.tail
> >
> >                                                 load r->cons.head
> >                                                 load r->prod.tail
> >                                                 ...
> >                                                 store r->cons.{head,tail}
> >                         load r->cons.head
> >
> > In weak memory order architectures(powerpc,arm), the 2nd load might be
> > reodered before the 1st load, that makes *entries is bigger than we
> > wanted. This nasty reording messed enque/deque up. Then, r->cons.head
> > will be bigger than prod_tail, then make *entries very big and the
> > consumer will go forward incorrectly.
> >
> > After this patch, even with above context switches, the old cons.head
> > will be recaculated after failure of rte_atomic32_cmpset. So no race
> > conditions left.
> >
> > There is no such issue on X86, because X86 is strong memory order model.
> > But rte_smp_rmb() doesn't have impact on runtime performance on X86, so
> > keep the same code without architectures specific concerns.
> >
> > Signed-off-by: Jia He <jia.he@hxt-semitech.com>
> > Signed-off-by: jie2.liu@hxt-semitech.com
> > Signed-off-by: bing.zhao@hxt-semitech.com
>
> ➜ [master][dpdk.org] $ ./devtools/checkpatches.sh
>
> ### ring: guarantee load/load order in enqueue and dequeue
>
> WARNING:BLOCK_COMMENT_STYLE: Block comments use a trailing */ on a
> separate line
> #58: FILE: lib/librte_ring/rte_ring.h:414:
> +              * memory model. It is noop on x86 */
>
> WARNING:BLOCK_COMMENT_STYLE: Block comments use a trailing */ on a
> separate line
> #70: FILE: lib/librte_ring/rte_ring.h:527:
> +              * memory model. It is noop on x86 */
>
> total: 0 errors, 2 warnings, 22 lines checked
>
> 0/1 valid patch
> ➜ [master][dpdk.org] $ ./devtools/check-git-log.sh
> Wrong tag:
>       Signed-off-by: jie2.liu@hxt-semitech.com
>       Signed-off-by: bing.zhao@hxt-semitech.com
>
>
> With above fixes:
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>
>
>

Acked-by: Jianbo Liu <jianbo.liu@arm.com>

And add it to stable.
 Cc: stable@dpdk.org

--
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
  
Ananyev, Konstantin Nov. 10, 2017, 9:59 a.m. UTC | #3
> -----Original Message-----
> From: Jia He [mailto:hejianet@gmail.com]
> Sent: Friday, November 10, 2017 1:51 AM
> To: jerin.jacob@caviumnetworks.com; dev@dpdk.org; olivier.matz@6wind.com
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; jianbo.liu@arm.com;
> hemant.agrawal@nxp.com; Jia He <hejianet@gmail.com>; Jia He <jia.he@hxt-semitech.com>; jie2.liu@hxt-semitech.com; bing.zhao@hxt-
> semitech.com
> Subject: [PATCH v5 1/1] ring: guarantee load/load order in enqueue and dequeue
> 
> We watched a rte panic of mbuf_autotest in our qualcomm arm64 server.
> In __rte_ring_move_cons_head()
> ...
>         do {
>                 /* Restore n as it may change every loop */
>                 n = max;
> 
>                 *old_head = r->cons.head;                //1st load
>                 const uint32_t prod_tail = r->prod.tail; //2nd load
> 
> cpu1(producer)          cpu2(consumer)          cpu3(consumer)
>                         load r->prod.tail
> in enqueue:
> load r->cons.tail
> load r->prod.head
> 
> store r->prod.tail
> 
>                                                 load r->cons.head
>                                                 load r->prod.tail
>                                                 ...
>                                                 store r->cons.{head,tail}
>                         load r->cons.head
> 
> In weak memory order architectures(powerpc,arm), the 2nd load might be
> reodered before the 1st load, that makes *entries is bigger than we
> wanted. This nasty reording messed enque/deque up. Then, r->cons.head
> will be bigger than prod_tail, then make *entries very big and the
> consumer will go forward incorrectly.
> 
> After this patch, even with above context switches, the old cons.head
> will be recaculated after failure of rte_atomic32_cmpset. So no race
> conditions left.
> 
> There is no such issue on X86, because X86 is strong memory order model.
> But rte_smp_rmb() doesn't have impact on runtime performance on X86, so
> keep the same code without architectures specific concerns.
> 
> Signed-off-by: Jia He <jia.he@hxt-semitech.com>
> Signed-off-by: jie2.liu@hxt-semitech.com
> Signed-off-by: bing.zhao@hxt-semitech.com
> ---
>  lib/librte_ring/rte_ring.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index 5e9b3b7..3e8085a 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -409,6 +409,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, int is_sp,
>  		n = max;
> 
>  		*old_head = r->prod.head;
> +
> +		/* add rmb barrier to avoid load/load reorder in weak
> +		 * memory model. It is noop on x86 */
> +		rte_smp_rmb();
> +
>  		const uint32_t cons_tail = r->cons.tail;
>  		/*
>  		 *  The subtraction is done between two unsigned 32bits value
> @@ -517,6 +522,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
>  		n = max;
> 
>  		*old_head = r->cons.head;
> +
> +		/* add rmb barrier to avoid load/load reorder in weak
> +		 * memory model. It is noop on x86 */
> +		rte_smp_rmb();
> +
>  		const uint32_t prod_tail = r->prod.tail;
>  		/* The subtraction is done between two unsigned 32bits value
>  		 * (the result is always modulo 32 bits even if we have
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.7.4
  

Patch

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 5e9b3b7..3e8085a 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -409,6 +409,11 @@  __rte_ring_move_prod_head(struct rte_ring *r, int is_sp,
 		n = max;
 
 		*old_head = r->prod.head;
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86 */
+		rte_smp_rmb();
+
 		const uint32_t cons_tail = r->cons.tail;
 		/*
 		 *  The subtraction is done between two unsigned 32bits value
@@ -517,6 +522,11 @@  __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 		n = max;
 
 		*old_head = r->cons.head;
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86 */
+		rte_smp_rmb();
+
 		const uint32_t prod_tail = r->prod.tail;
 		/* The subtraction is done between two unsigned 32bits value
 		 * (the result is always modulo 32 bits even if we have