[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

Jianbo Liu jianbo.liu at linaro.org
Mon Sep 26 07:12:46 CEST 2016


On 25 September 2016 at 13:41, Wang, Zhihong <zhihong.wang at intel.com> wrote:
>
>
>> -----Original Message-----
>> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
>> Sent: Friday, September 23, 2016 9:41 PM
>> To: Jianbo Liu <jianbo.liu at linaro.org>
>> Cc: dev at dpdk.org; Wang, Zhihong <zhihong.wang at intel.com>; Yuanhan Liu
>> <yuanhan.liu at linux.intel.com>; Maxime Coquelin
>> <maxime.coquelin at redhat.com>
....
> This patch does help in ARM for small packets like 64B sized ones,
> this actually proves the similarity between x86 and ARM in terms
> of caching optimization in this patch.
>
> My estimation is based on:
>
>  1. The last patch are for mrg_rxbuf=on, and since you said it helps
>     perf, we can ignore it for now when we discuss mrg_rxbuf=off
>
>  2. Vhost enqueue perf =
>     Ring overhead + Virtio header overhead + Data memcpy overhead
>
>  3. This patch helps small packets traffic, which means it helps
>     ring + virtio header operations
>
>  4. So, when you say perf drop when packet size larger than 512B,
>     this is most likely caused by memcpy in ARM not working well
>     with this patch
>
> I'm not saying glibc's memcpy is not good enough, it's just that
> this is a rather special use case. And since we see specialized
> memcpy + this patch give better performance than other combinations
> significantly on x86, we suggest to hand-craft a specialized memcpy
> for it.
>
> Of course on ARM this is still just my speculation, and we need to
> either prove it or find the actual root cause.
>
> It can be **REALLY HELPFUL** if you could help to test this patch on
> ARM for mrg_rxbuf=on cases to see if this patch is in fact helpful
> to ARM at all, since mrg_rxbuf=on the more widely used cases.
>
Actually it's worse than mrg_rxbuf=off.


More information about the dev mailing list