[dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy

Hunt, David david.hunt at intel.com
Fri Jun 24 17:56:39 CEST 2016


Hi Jerin,

I just ran a couple of tests on this patch on the latest master head on 
a couple of machines. An older quad socket E5-4650 and a quad socket 
E5-2699 v3

E5-4650:
I'm seeing a gain of 2% for un-cached tests and a gain of 9% on the 
cached tests.

E5-2699 v3:
I'm seeing a loss of 0.1% for un-cached tests and a gain of 11% on the 
cached tests.

This is purely the autotest comparison, I don't have traffic generator 
results. But based on the above, I don't think there are any performance 
issues with the patch.

Regards,
Dave.




On 24/5/2016 4:17 PM, Jerin Jacob wrote:
> On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote:
>> Hi Jerin,
>>
>>
>> On 05/24/2016 04:50 PM, Jerin Jacob wrote:
>>> Signed-off-by: Jerin Jacob <jerin.jacob at caviumnetworks.com>
>>> ---
>>>   lib/librte_mempool/rte_mempool.h | 5 ++---
>>>   1 file changed, 2 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
>>> index ed2c110..ebe399a 100644
>>> --- a/lib/librte_mempool/rte_mempool.h
>>> +++ b/lib/librte_mempool/rte_mempool.h
>>> @@ -74,6 +74,7 @@
>>>   #include <rte_memory.h>
>>>   #include <rte_branch_prediction.h>
>>>   #include <rte_ring.h>
>>> +#include <rte_memcpy.h>
>>>   
>>>   #ifdef __cplusplus
>>>   extern "C" {
>>> @@ -917,7 +918,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table,
>>>   		    unsigned n, __rte_unused int is_mp)
>>>   {
>>>   	struct rte_mempool_cache *cache;
>>> -	uint32_t index;
>>>   	void **cache_objs;
>>>   	unsigned lcore_id = rte_lcore_id();
>>>   	uint32_t cache_size = mp->cache_size;
>>> @@ -946,8 +946,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table,
>>>   	 */
>>>   
>>>   	/* Add elements back into the cache */
>>> -	for (index = 0; index < n; ++index, obj_table++)
>>> -		cache_objs[index] = *obj_table;
>>> +	rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);
>>>   
>>>   	cache->len += n;
>>>   
>>>
>> The commit title should be "mempool" instead of "mbuf".
> I will fix it.
>
>> Are you seeing some performance improvement by using rte_memcpy()?
> Yes, In some case, In default case, It was replaced with memcpy by the
> compiler itself(gcc 5.3). But when I tried external mempool manager patch and
> then performance dropped almost 800Kpps. Debugging further it turns out that
> external mempool managers unrelated change was knocking out the memcpy.
> explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still
> unknown(In my test setup, packets are in the local cache, so it must be
> something do with __mempool_put_bulk text alignment change or similar.
>
> Anyone else observed performance drop with external poolmanager?
>
> Jerin
>
>> Regards
>> Olivier



More information about the dev mailing list