[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

Jianbo Liu jianbo.liu at linaro.org
Mon Sep 26 07:38:58 CEST 2016


On 26 September 2016 at 13:25, Wang, Zhihong <zhihong.wang at intel.com> wrote:
>
>
>> -----Original Message-----
>> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
>> Sent: Monday, September 26, 2016 1:13 PM
>> To: Wang, Zhihong <zhihong.wang at intel.com>
>> Cc: Thomas Monjalon <thomas.monjalon at 6wind.com>; dev at dpdk.org; Yuanhan
>> Liu <yuanhan.liu at linux.intel.com>; Maxime Coquelin
>> <maxime.coquelin at redhat.com>
>> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
>>
>> On 25 September 2016 at 13:41, Wang, Zhihong <zhihong.wang at intel.com>
>> wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
>> >> Sent: Friday, September 23, 2016 9:41 PM
>> >> To: Jianbo Liu <jianbo.liu at linaro.org>
>> >> Cc: dev at dpdk.org; Wang, Zhihong <zhihong.wang at intel.com>; Yuanhan Liu
>> >> <yuanhan.liu at linux.intel.com>; Maxime Coquelin
>> >> <maxime.coquelin at redhat.com>
>> ....
>> > This patch does help in ARM for small packets like 64B sized ones,
>> > this actually proves the similarity between x86 and ARM in terms
>> > of caching optimization in this patch.
>> >
>> > My estimation is based on:
>> >
>> >  1. The last patch are for mrg_rxbuf=on, and since you said it helps
>> >     perf, we can ignore it for now when we discuss mrg_rxbuf=off
>> >
>> >  2. Vhost enqueue perf =
>> >     Ring overhead + Virtio header overhead + Data memcpy overhead
>> >
>> >  3. This patch helps small packets traffic, which means it helps
>> >     ring + virtio header operations
>> >
>> >  4. So, when you say perf drop when packet size larger than 512B,
>> >     this is most likely caused by memcpy in ARM not working well
>> >     with this patch
>> >
>> > I'm not saying glibc's memcpy is not good enough, it's just that
>> > this is a rather special use case. And since we see specialized
>> > memcpy + this patch give better performance than other combinations
>> > significantly on x86, we suggest to hand-craft a specialized memcpy
>> > for it.
>> >
>> > Of course on ARM this is still just my speculation, and we need to
>> > either prove it or find the actual root cause.
>> >
>> > It can be **REALLY HELPFUL** if you could help to test this patch on
>> > ARM for mrg_rxbuf=on cases to see if this patch is in fact helpful
>> > to ARM at all, since mrg_rxbuf=on the more widely used cases.
>> >
>> Actually it's worse than mrg_rxbuf=off.
>
> I mean compare the perf of original vs. original + patch with
> mrg_rxbuf turned on. Is there any perf improvement?
>
Yes, orig + patch + on is better than orig + on, but orig + patch + on
is worse than orig + patch + off.


More information about the dev mailing list