[dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue

Ananyev, Konstantin konstantin.ananyev at intel.com
Thu Oct 19 22:02:38 CEST 2017



> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Thursday, October 19, 2017 3:15 PM
> To: Zhao, Bing <ilovethull at 163.com>; Jia He <hejianet at gmail.com>; Jerin Jacob <jerin.jacob at caviumnetworks.com>
> Cc: Olivier MATZ <olivier.matz at 6wind.com>; dev at dpdk.org; jia.he at hxt-semitech.com; jie2.liu at hxt-semitech.com; bing.zhao at hxt-
> semitech.com
> Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue
> 
> >
> > Hi,
> >
> > On 2017/10/19 18:02, Ananyev, Konstantin wrote:
> > >
> > > Hi Jia,
> > >
> > >>
> > >> Hi
> > >>
> > >>
> > >> On 10/13/2017 9:02 AM, Jia He Wrote:
> > >>> Hi Jerin
> > >>>
> > >>>
> > >>> On 10/13/2017 1:23 AM, Jerin Jacob Wrote:
> > >>>> -----Original Message-----
> > >>>>> Date: Thu, 12 Oct 2017 17:05:50 +0000
> > >>>>>
> > >> [...]
> > >>>> On the same lines,
> > >>>>
> > >>>> Jia He, jie2.liu, bing.zhao,
> > >>>>
> > >>>> Is this patch based on code review or do you saw this issue on any of
> > >>>> the
> > >>>> arm/ppc target? arm64 will have performance impact with this change.
> > >> sorry, miss one important information
> > >> Our platform is an aarch64 server with 46 cpus.
> > >> If we reduced the involved cpu numbers, the bug occurred less frequently.
> > >>
> > >> Yes, mb barrier impact the performance, but correctness is more
> > >> important, isn't it ;-)
> > >> Maybe we can  find any other lightweight barrier here?
> > >>
> > >> Cheers,
> > >> Jia
> > >>> Based on mbuf_autotest, the rte_panic will be invoked in seconds.
> > >>>
> > >>> PANIC in test_refcnt_iter():
> > >>> (lcore=0, iter=0): after 10s only 61 of 64 mbufs left free
> > >>> 1: [./test(rte_dump_stack+0x38) [0x58d868]]
> > >>> Aborted (core dumped)
> > >>>
> > >
> > > So is it only reproducible with mbuf refcnt test?
> > > Could it be reproduced with some 'pure' ring test
> > > (no mempools/mbufs refcnt, etc.)?
> > > The reason I am asking - in that test we also have mbuf refcnt updates
> > > (that's what for that test was created) and we are doing some optimizations here too
> > > to avoid excessive atomic updates.
> > > BTW, if the problem is not reproducible without mbuf refcnt,
> > > can I suggest to extend the test  with:
> > >    - add a check that enqueue() operation was successful
> > >    - walk through the pool and check/printf refcnt of each mbuf.
> > > Hopefully that would give us some extra information what is going wrong here.
> > > Konstantin
> > >
> > >
> > Currently, the issue is only found in this case here on the ARM
> > platform, not sure how it is going with the X86_64 platform
> 
> I understand that it is only reproducible on arm so far.
> What I am asking - with dpdk is there any other way to reproduce it (on arm)
> except then running mbuf_autotest?
> Something really simple that not using mbuf/mempool etc?
> Just do dequeue/enqueue from multiple threads and check data integrity at the end?
> If not  - what makes you think that the problem is precisely in rte_ring code?
> Why not in rte_mbuf let say?

Actually I reread your original mail and finally get your point.
If I understand you correctly the problem with read reordering here is that
after we read prot.tail but before we read cons.head
both cons.head and prod.tail might be updated,
but for us prod.tail change might be unnoticed. 
As an example:
time 0 (cons.head == 0, prod.tail == 0):
prod_tail = r->prod.tail; /* due read reordering */
/* prod_tail == 0 */

 time 1 (cons.head ==5, prod.tail == 5):
*old_head = r->cons.head;
/* cons.head == 5 */
*entries = (prod_tail - *old_head);
/* *entries == (0 - 5) == 0xFFFFFFFB */

And we will move our cons.head forward, even though there are no filled entries in the ring.
Is that right?

As I side notice, I wonder why we have here:
*entries = (prod_tail - *old_head);
instead of:
*entries = r->size + prod_tail - *old_head;
?

Konstantin

> 
> >. In another
> > mail of this thread, we've made a simple test based on this and captured
> > some information and I pasted there.(I pasted the patch there :-))
> 
> Are you talking about that one:
> http://dpdk.org/dev/patchwork/patch/30405/
> ?
> It still uses test/test/test_mbuf.c...,
> but anyway I don't really understand how mbuf_autotest supposed
> to work with these changes:
> @@ -730,7 +739,7 @@ test_refcnt_iter(unsigned int lcore, unsigned int iter,
> rte_ring_enqueue(refcnt_mbuf_ring, m);
>                          }
>                  }
> -               rte_pktmbuf_free(m);
> +               // rte_pktmbuf_free(m);
>          }
> @@ -741,6 +750,12 @@ test_refcnt_iter(unsigned int lcore, unsigned int iter,
>          while (!rte_ring_empty(refcnt_mbuf_ring))
>                  ;
> 
> +       if (NULL != m) {
> +               if (1 != rte_mbuf_refcnt_read(m))
> +                       printf("m ref is %u\n", rte_mbuf_refcnt_read(m));
> +               rte_pktmbuf_free(m);
> +       }
> +
>          /* check that all mbufs are back into mempool by now */
>          for (wn = 0; wn != REFCNT_MAX_TIMEOUT; wn++) {
>                  if ((i = rte_mempool_avail_count(refcnt_pool)) == n) {
> 
> That means all your mbufs (except the last one) will still be allocated.
> So the test would fail - as it should, I think.
> 
> > And
> > it seems that Juhamatti & Jacod found some reverting action several
> > months ago.
> 
> Didn't get that one either.
> Konstantin



More information about the dev mailing list