[dpdk-dev] [PATCH v2 1/3] mempool: add stack (lifo) mempool handler

Olivier Matz olivier.matz at 6wind.com
Mon Jun 20 10:17:36 CEST 2016


Hi David,

On 06/17/2016 04:18 PM, Hunt, David wrote:
>> After reading it, I realize that it's nearly exactly the same code than
>> in "app/test: test external mempool handler".
>> http://patchwork.dpdk.org/dev/patchwork/patch/12896/
>>
>> We should drop one of them. If this stack handler is really useful for
>> a performance use-case, it could go in librte_mempool. At the first
>> read, the code looks like a demo example : it uses a simple spinlock for
>> concurrent accesses to the common pool. Maybe the mempool cache hides
>> this cost, in this case we could also consider removing the use of the
>> rte_ring.
> 
> While I agree that the code is similar, the handler in the test is a
> ring based handler,
> where as this patch adds an array based handler.

Not sure I'm getting what you are saying. Do you mean stack instead
of array?

Actually, both are stacks when talking about bulks of objects. If we
consider each objects one by one, that's true the order will differ.
But as discussed in [1], the cache code already reverses the order of
objects when doing a mempool_get(). I'd say the reversing in cache code
is not really needed (only the order of object bulks should remain the
same). A rte_memcpy() looks to be faster, but it would require to do
some real-life tests to validate or unvalidate this theory.

So to conclude, I still think both code in app/test and lib/mempool are
quite similar, and only one of them should be kept.

[1] http://www.dpdk.org/ml/archives/dev/2016-May/039873.html

> I think that the case for leaving it in as a test for the standard
> handler as part of the
> previous mempool handler is valid, but maybe there is a case for
> removing it if
> we add the stack handler. Maybe a future patch?
> 
>> Do you have some some performance numbers? Do you know if it scales
>> with the number of cores?
> 
> For the mempool_perf_autotest, I'm seeing a 30% increase in performance
> for the
> local cache use-case for 1 - 36 cores (results vary within those tests
> between
> 10-45% gain, but with an average of 30% gain over all the tests.).
> 
> However, for the tests with no local cache configured, throughput of the
> enqueue/dequeue
> drops by about 30%, with the 36 core yelding the largest drop of 40%. So
> this handler would
> not be recommended in no-cache applications.

Interesting, thanks. If you also have real-life (I mean network)
performance tests, I'd be interested too.

Ideally, we should have a documentation explaining in which cases a
handler or another should be used. However, if we don't know this
today, I'm not opposed to add this new handler in 16.07, and let people
do their tests and comment, then describe it properly for 16.11.

What do you think?


Regards,
Olivier


More information about the dev mailing list