[dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

Xie, Huawei huawei.xie at intel.com
Tue Feb 23 06:35:08 CET 2016


On 2/22/2016 10:52 PM, Xie, Huawei wrote:
> On 2/4/2016 1:24 AM, Olivier MATZ wrote:
>> Hi,
>>
>> On 01/27/2016 02:56 PM, Panu Matilainen wrote:
>>> Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part of
>>> the library ABI and should not be listed in the version map.
>>>
>>> I assume its inline for performance reasons, but then you lose the
>>> benefits of dynamic linking such as ability to fix bugs and/or improve
>>> itby just updating the library. Since the point of having a bulk API is
>>> to improve performance by reducing the number of calls required, does it
>>> really have to be inline? As in, have you actually measured the
>>> difference between inline and non-inline and decided its worth all the
>>> downsides?
>> Agree with Panu. It would be interesting to compare the performance
>> between inline and non inline to decide whether inlining it or not.
> Will update after i gathered more data. inline could show obvious
> performance difference in some cases.

Panu and Oliver:
I write a simple benchmark. This benchmark run 10M rounds, in each round
8 mbufs are allocated through bulk API, and then freed.
These are the CPU cycles measured(Intel(R) Xeon(R) CPU E5-2680 0 @
2.70GHz, CPU isolated, timer interrupt disabled, rcu offloaded).
Btw, i have removed some exceptional data, the frequency of which is
like 1/10. Sometimes observed user usage suddenly disappeared, no clue
what happened.

With 8 mbufs allocated, there is about 6% performance increase using inline.
inline            non-inline
2780738888        2950309416
2834853696        2951378072
2823015320        2954500888
2825060032        2958939912
2824499804        2898938284
2810859720        2944892796
2852229420        3014273296
2787308500        2956809852
2793337260        2958674900
2822223476        2954346352
2785455184        2925719136
2821528624        2937380416
2822922136        2974978604
2776645920        2947666548
2815952572        2952316900
2801048740        2947366984
2851462672        2946469004

With 16 mbufs allocated, we could still observe obvious performance
difference, though only 1%-2%

inline            non-inline
5519987084        5669902680
5538416096        5737646840
5578934064        5590165532
5548131972        5767926840
5625585696        5831345628
5558282876        5662223764
5445587768        5641003924
5559096320        5775258444
5656437988        5743969272
5440939404        5664882412
5498875968        5785138532
5561652808        5737123940
5515211716        5627775604
5550567140        5630790628
5665964280        5589568164
5591295900        5702697308

With 32/64 mbufs allocated, the deviation of the data itself would hide
the performance difference.

So we prefer using inline for performance.
>> Also, it would be nice to have a simple test function in
>> app/test/test_mbuf.c. For instance, you could update
>> test_one_pktmbuf() to take a mbuf pointer as a parameter and remove
>> the mbuf allocation from the function. Then it could be called with
>> a mbuf allocated with rte_pktmbuf_alloc() (like before) and with
>> all the mbufs of rte_pktmbuf_alloc_bulk().

Don't quite get you. Is it that we write two cases, one case allocate
mbuf through rte_pktmbuf_alloc_bulk and one use rte_pktmbuf_alloc? It is
good to have. I could do this after this patch.
>>
>> Regards,
>> Olivier
>>
>



More information about the dev mailing list