[dpdk-dev] [PATCH 5/5] net/virtio: fix Tso when mbuf is shared

Olivier Matz olivier.matz at 6wind.com
Tue Jan 17 12:18:25 CET 2017

Hi Yuanhan,

On Mon, 16 Jan 2017 14:48:19 +0800, Yuanhan Liu
<yuanhan.liu at linux.intel.com> wrote:
> On Mon, Jan 09, 2017 at 06:46:25PM +0100, Olivier Matz wrote:
> > The virtio specifications requires that the L4 checksum is set to
> > the pseudo header checksum. You can search for "pseudo header" in
> > the following doc:
> > http://docs.oasis-open.org/virtio/virtio/v1.0/cs04/virtio-v1.0-cs04.pdf
> > 
> > Especially in, we can see that if we use the csum flag, we
> > must set the checksum to phdr, and if we do tso, we must set the
> > csum flag.
> > 
> > We can check that this is really needed with Linux vhost by
> > replaying the test plan described at [1].
> > 
> > [1] http://dpdk.org/ml/archives/dev/2016-October/048793.html
> > 
> > If we add the following patch to disable the checksum fix (on top of
> > this patchset), the test1 "large packets (lro/tso)" won't work.
> > 
> > --- a/drivers/net/virtio/virtio_rxtx.c
> > +++ b/drivers/net/virtio/virtio_rxtx.c
> > @@ -224,6 +224,9 @@
> >         uint32_t tmp;
> >         int shared = 0;
> >  
> > +        if (1)
> > +               return 0;
> > +
> >         /* mbuf is write-only, we need to copy the headers in a
> > linear buffer */ if (unlikely(rte_pktmbuf_data_is_shared(m, 0,
> > hdrlen))) { shared = 1;
> > 
> > 
> > In one direction ("flow1" in the test desc), large packets are
> > transmitted from host on the ixgbe interface, and received by the
> > guest. Then, testpmd bridges the packet to the virtio interface. But
> > the packet is not received by the host.  
> I hope I could have time to dig this further, since, honestly, I don't
> quite like this patch: it makes things un-maintainable.

Well, I'm not that proud of the patch, but that's the best solution
I've found. Nevertheless saying it makes things un-maintainable looks a
bit excessive to me :)

The option of reallocating a mbuf, copy and fix network headers in it
looks even more complex to me (that was my first approach).

> Besides that, I think we have similar issue with nic drivers. See the
> rte_net_intel_cksum_flags_prepare() function introduced at commit
> 4fb7e803eb1a ("ethdev: add Tx preparation").

Yes, that was discussed a bit. See [1] and the subsequent mails.

My opinion is that tx_burst() should not change the mbuf data, it's
always been like this. For Intel NICs, there is no issue since the DPDK
API is derived from Intel NICs API, so there is no fix to do in the
mbuf data.

For tx_prepare(), it's explicitly said that it can update the data.
If tx_prepare() becomes mandatory, it will naturally fix this issue
without modifying the driver, because the phdr csum calculation will be
done in tx_prepare().

An alternative is to mark this as a known issue for now, and wait until
tx_prepare() is mandatory.

> Cc more people here. And here is a quick background for them: NIC
> drivers doing TSO need change the mbuf (say, for cksum updating),
> however, as Stephen pointed out, we could not do that if the mbuf is
> shared: I don't see such checks in the driver code as well.
> > There are at least 2 options for this one:
> > 
> > - try to use 2 different descriptors (the patch is probably harder,
> >   and it may slow-down the case where ANY_LAYOUT is supported)
> > 
> > - refuse to initialize with TSO enabled if ANY_LAYOUT is not
> > supported.
> > 
> > If you think ANY_LAYOUT is most likely true today, we could choose
> > option 2. Let me know what's your preference here.  
> Maybe we could go with a simpler one: COW. Yeah, it costs more, but
> this would be rare, that it should be OKay, right? Besides, we just
> need copy the heading mbuf.

Could you detail what you mean by COW in this context? Do you mean
reallocating a new mbuf? If yes, it's not only a problem of cost:

- There is no mempool pointer associated to tx queues, so we cannot
  allocate a mbuf. Reusing a mempool pointer from the current mbuf looks
  risky, because it can be a special pool, like a pool dedicated for
  clones, without data.

- It makes allocation error quite hard to manage, it would require some
  rework in tx functions.

Thanks for your review and the discussion. Let me know what you think.


More information about the dev mailing list