[dpdk-dev] [RFC PATCH] net/virtio: Align Virtio-net header on cache line in receive path

Maxime Coquelin maxime.coquelin at redhat.com
Mon Mar 6 15:11:28 CET 2017



On 03/06/2017 09:46 AM, Yuanhan Liu wrote:
> On Wed, Mar 01, 2017 at 08:36:24AM +0100, Maxime Coquelin wrote:
>>
>>
>> On 02/23/2017 06:49 AM, Yuanhan Liu wrote:
>>> On Wed, Feb 22, 2017 at 10:36:36AM +0100, Maxime Coquelin wrote:
>>>>
>>>>
>>>> On 02/22/2017 02:37 AM, Yuanhan Liu wrote:
>>>>> On Tue, Feb 21, 2017 at 06:32:43PM +0100, Maxime Coquelin wrote:
>>>>>> This patch aligns the Virtio-net header on a cache-line boundary to
>>>>>> optimize cache utilization, as it puts the Virtio-net header (which
>>>>>> is always accessed) on the same cache line as the packet header.
>>>>>>
>>>>>> For example with an application that forwards packets at L2 level,
>>>>>> a single cache-line will be accessed with this patch, instead of
>>>>>> two before.
>>>>>
>>>>> I'm assuming you were testing pkt size <= (64 - hdr_size)?
>>>>
>>>> No, I tested with 64 bytes packets only.
>>>
>>> Oh, my bad, I overlooked it. While you were saying "a single cache
>>> line", I was thinking putting the virtio net hdr and the "whole"
>>> packet data in single cache line, which is not possible for pkt
>>> size 64B.
>>>
>>>> I run some more tests this morning with different packet sizes,
>>>> and also with changing the mbuf size on guest side to have multi-
>>>> buffers packets:
>>>>
>>>> +-------+--------+--------+-------------------------+
>>>> | Txpkt | Rxmbuf | v17.02 | v17.02 + vnet hdr align |
>>>> +-------+--------+--------+-------------------------+
>>>> |    64 |   2048 |  11.05 |                   11.78 |
>>>> |   128 |   2048 |  10.66 |                   11.48 |
>>>> |   256 |   2048 |  10.47 |                   11.21 |
>>>> |   512 |   2048 |  10.22 |                   10.88 |
>>>> |  1024 |   2048 |   7.65 |                    7.84 |
>>>> |  1500 |   2048 |   6.25 |                    6.45 |
>>>> |  2000 |   2048 |   5.31 |                    5.43 |
>>>> |  2048 |   2048 |   5.32 |                    4.25 |
>>>> |  1500 |    512 |   3.89 |                    3.98 |
>>>> |  2048 |    512 |   1.96 |                    2.02 |
>>>> +-------+--------+--------+-------------------------+
>>>
>>> Could you share more info, say is it a PVP test? Is mergeable on?
>>> What's the fwd mode?
>>
>> No, this is not PVP benchmark, I have neither another server nor a packet
>> generator connected to my Haswell machine back-to-back.
>>
>> This is simple micro-benchmark, vhost PMD in txonly, Virtio PMD in
>> rxonly. In this configuration, mergeable is ON and no offload disabled
>> in QEMU cmdline.
>
> Okay, I see. So the boost, as you have stated, comes from saving two
> cache line access to one. Before that, vhost write 2 cache lines,
> while the virtio pmd reads 2 cache lines: one for reading the header,
> another one for reading the ether header, for updating xstats (there
> is no ether access in the fwd mode you tested).
>
>> That's why I would be interested in more testing on recent hardware
>> with PVP benchmark. Is it something that could be run in Intel lab?
>
> I think Yao Lei could help on that? But as stated, I think it may
> break the performance for bit packets. And I also won't expect big
> boost even for 64B in PVP test, judging that it's only 6% boost in
> micro bechmarking.
That would be great.
Note that on SandyBridge, on which I see a drop in perf with
microbenchmark, I get a 4% gain on PVP benchmark. So on recent hardware
that show a gain on microbenchmark, I'm curious of the gain with PVP
bench.

Cheers,
Maxime


More information about the dev mailing list