[dpdk-dev] [PATCH] virtio: tx with can_push when VERSION_1 is set

Maxime Coquelin maxime.coquelin at redhat.com
Wed Nov 9 15:51:56 CET 2016


Hi Pierre,

On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote:
> Hello Maxime,
>
> Sorry for the late reply.
>
>
>> Le 8 nov. 2016 à 10:44, Maxime Coquelin <maxime.coquelin at redhat.com> a écrit :
>>
>> Hi Pierre,
>>
>> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote:
>>> Current virtio driver advertises VERSION_1 support,
>>> but does not handle device's VERSION_1 support when
>>> sending packets (it looks for ANY_LAYOUT feature,
>>> which is absent).
>>>
>>> This patch enables 'can_push' in tx path when VERSION_1
>>> is advertised by the device.
>>>
>>> This significantly improves small packets forwarding rate
>>> towards devices advertising VERSION_1 feature.
>> I think it depends whether offloading is enabled or not.
>> If no offloading enabled, I measured significant drop.
>> Indeed, when no offloading is enabled, the Tx path in Virtio
>> does not access the virtio header before your patch, as the header is memset to zero at device init time.
>> With your patch, it gets memset to zero at every transmit in the hot
>> path.
>
> Right. On the virtio side that is true, but on the device side, we have to access the header anyway.
No more now, if no offload features have been negotiated.
I have done a patch that landed in v16.11 to skip header parsing in
this case.
That said, we still have to access its descriptor.

> And accessing two descriptors (with the address resolution and memory fetch which comes with it)
> is a costy operation compared to a single one.
> In the case indirect descriptors are used, this is 1 desc access instead or 3.
I agree this is far from being optimal.

> And in the case chained descriptors are used, this doubles the number of packets that you can put in your queue.
>
> Those are the results in my PHY -> VM (testpmd) -> PHY setup
> Traffic is flowing bidirectionally. Numbers are for lossless-rates.
>
> When chained buffers are used for dpdk's TX: 2x2.13Mpps
> When indirect descriptors are used for dpdk's TX: 2x2.38Mpps
> When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps
When I tried it, I also did PVP 0% benchmark, and I got opposite 
results. Chained and indirect cases were significantly better.

My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM
forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding
was macswap mode on guest side.

I also saw some perf regression when running simple tespmd test on both
ends.

Yuanhan, did you run some benchmark with your series enabling
ANY_LAYOUT?

>
> I must also note that qemu 2.5 does not seem to deal with VERSION_1 and ANY_LAYOUT correctly.
> The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not catastrophic. But just note that you will not see performance in some cases with qemu 2.5.

Thanks for the info.

Regards,
Maxime


More information about the dev mailing list