[dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API
Xie, Huawei
huawei.xie at intel.com
Tue Feb 23 06:35:08 CET 2016
On 2/22/2016 10:52 PM, Xie, Huawei wrote:
> On 2/4/2016 1:24 AM, Olivier MATZ wrote:
>> Hi,
>>
>> On 01/27/2016 02:56 PM, Panu Matilainen wrote:
>>> Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part of
>>> the library ABI and should not be listed in the version map.
>>>
>>> I assume its inline for performance reasons, but then you lose the
>>> benefits of dynamic linking such as ability to fix bugs and/or improve
>>> itby just updating the library. Since the point of having a bulk API is
>>> to improve performance by reducing the number of calls required, does it
>>> really have to be inline? As in, have you actually measured the
>>> difference between inline and non-inline and decided its worth all the
>>> downsides?
>> Agree with Panu. It would be interesting to compare the performance
>> between inline and non inline to decide whether inlining it or not.
> Will update after i gathered more data. inline could show obvious
> performance difference in some cases.
Panu and Oliver:
I write a simple benchmark. This benchmark run 10M rounds, in each round
8 mbufs are allocated through bulk API, and then freed.
These are the CPU cycles measured(Intel(R) Xeon(R) CPU E5-2680 0 @
2.70GHz, CPU isolated, timer interrupt disabled, rcu offloaded).
Btw, i have removed some exceptional data, the frequency of which is
like 1/10. Sometimes observed user usage suddenly disappeared, no clue
what happened.
With 8 mbufs allocated, there is about 6% performance increase using inline.
inline non-inline
2780738888 2950309416
2834853696 2951378072
2823015320 2954500888
2825060032 2958939912
2824499804 2898938284
2810859720 2944892796
2852229420 3014273296
2787308500 2956809852
2793337260 2958674900
2822223476 2954346352
2785455184 2925719136
2821528624 2937380416
2822922136 2974978604
2776645920 2947666548
2815952572 2952316900
2801048740 2947366984
2851462672 2946469004
With 16 mbufs allocated, we could still observe obvious performance
difference, though only 1%-2%
inline non-inline
5519987084 5669902680
5538416096 5737646840
5578934064 5590165532
5548131972 5767926840
5625585696 5831345628
5558282876 5662223764
5445587768 5641003924
5559096320 5775258444
5656437988 5743969272
5440939404 5664882412
5498875968 5785138532
5561652808 5737123940
5515211716 5627775604
5550567140 5630790628
5665964280 5589568164
5591295900 5702697308
With 32/64 mbufs allocated, the deviation of the data itself would hide
the performance difference.
So we prefer using inline for performance.
>> Also, it would be nice to have a simple test function in
>> app/test/test_mbuf.c. For instance, you could update
>> test_one_pktmbuf() to take a mbuf pointer as a parameter and remove
>> the mbuf allocation from the function. Then it could be called with
>> a mbuf allocated with rte_pktmbuf_alloc() (like before) and with
>> all the mbufs of rte_pktmbuf_alloc_bulk().
Don't quite get you. Is it that we write two cases, one case allocate
mbuf through rte_pktmbuf_alloc_bulk and one use rte_pktmbuf_alloc? It is
good to have. I could do this after this patch.
>>
>> Regards,
>> Olivier
>>
>
More information about the dev
mailing list