[dpdk-dev] A question of DPDK ring buffer

Chen, Bo D bo.d.chen at intel.com
Tue Aug 20 11:13:32 CEST 2013


Hi Olivier,

Please see my comments.
	do {
		prod_head = r->prod.head;
		cons_tail = r->cons.tail;
		prod_next = prod_head + n;
		success = rte_atomic32_cmpset(&r->prod.head, prod_head, prod_next);
		
		/*
		  * Why not enqueue data here? It would be just a couple of pointers assignment, not taking too much time. 
		  * Then the entire CAS loop contains both pointer adjustment and data enqueue, and the dequeue operation would not have a chance to interfere data producing.
    		  * The next wait loop can be removed accordingly.
		/*		

	} while (unlikely(success == 0));

	/*
	while (unlikely(r->prod.tail != prod_head))
		rte_pause();

	r->prod.tail = prod_next;
	*/


Regards,
Bob


-----Original Message-----
From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier MATZ
Sent: Tuesday, August 20, 2013 4:22 PM
To: Bob Chen
Cc: dev
Subject: Re: [dpdk-dev] A question of DPDK ring buffer

Hello Ben,

> OK, here is the question: Why DPDK has to maintain that public 
> prod_tail structure? Is it really necessary to endure a while loop here?

If you remove this wait loop, you can trigger an issue. Imagine a case where core 0 wants to add an object in the ring: it does the CAS, modifying prod_head. At this time it is interrupted for some reason (maybe by the kernel) before writing the object pointer in the ring, and thus before the modification of prod_tail.

During this time, core 1 wants to enqueue another object: it does the CAS, then writes the object pointer, then modifies prod_head (without waiting the core 0 as we removed the wait loop).

Now the state ring is wrong: it shows 2 objects, but one object pointer is invalid. If you try to dequeue the objects, it will return an bad pointer.

Of course, the interruption by the kernel should be avoided as much as possible, but even without beeing interrupted, a similar scenario can occur if a core is slower than another to enqueue its data (due to a cache miss for instance, or because the first core enqueues more objects than the other).

To convince you, I think you can remove the wait loop and run the ring test in app/test/test_ring.c, I suppose it won't work.

Regards,
Olivier


More information about the dev mailing list