[dpdk-dev] rte_ring's dequeue appears to be slow
Dor Green
dorgreen1 at gmail.com
Mon Apr 6 14:18:21 CEST 2015
I have an app which captures packets on a single core and then passes
to multiple workers on different lcores, using the ring queues.
While I manage to capture packets at 10Gbps, when I send it to the
processing lcores there is substantial packet loss. At first I figured
it's the processing I do on the packets and optimized that, which did
help it a little but did not alleviate the problem.
I used Intel VTune amplifier to profile the program, and on all
profiling checks that I did there, the majority of the time in the
program is spent in "__rte_ring_sc_do_dequeue" (about 70%). I was
wondering if anyone can tell me how to optimize this, or if I'm using
the queues incorrectly, or maybe even doing the profiling wrong
(because I do find it weird that this dequeuing is so slow).
My program architecture is as follows (replaced consts with actual values):
A queue is created for each processing lcore:
rte_ring_create(qname, swsize, NUMA_SOCKET, 1024*1024,
RING_F_SP_ENQ | RING_F_SC_DEQ);
The processing core enqueues packets one by one, to each of the queues
(the packet burst size is 256):
rte_ring_sp_enqueue(lc[queue_index].queue, (void *const)pkts[i]);
Which are then dequeued in bulk in the processor lcores:
rte_ring_sc_dequeue_bulk(lc->queue, (void**) &mbufs, 128);
I'm using 16 1GB hugepages, running the new 2.0 version. If there's
any further info required about the program, let me know.
Thank you.
More information about the dev
mailing list