[dpdk-dev] [PATCH 4/6] vhost: add Tx zero copy

Maxime Coquelin maxime.coquelin at redhat.com
Tue Aug 23 17:40:20 CEST 2016



On 08/23/2016 04:31 PM, Yuanhan Liu wrote:
> On Tue, Aug 23, 2016 at 04:04:30PM +0200, Maxime Coquelin wrote:
>>
>>
>> On 08/23/2016 10:10 AM, Yuanhan Liu wrote:
>>> The basic idea of Tx zero copy is, instead of copying data from the
>>> desc buf, here we let the mbuf reference the desc buf addr directly.
>>>
>>> Doing so, however, has one major issue: we can't update the used ring
>>> at the end of rte_vhost_dequeue_burst. Because we don't do the copy
>>> here, an update of the used ring would let the driver to reclaim the
>>> desc buf. As a result, DPDK might reference a stale memory region.
>>>
>>> To update the used ring properly, this patch does several tricks:
>>>
>>> - when mbuf references a desc buf, refcnt is added by 1.
>>>
>>>  This is to pin lock the mbuf, so that a mbuf free from the DPDK
>>>  won't actually free it, instead, refcnt is subtracted by 1.
>>>
>>> - We chain all those mbuf together (by tailq)
>>>
>>>  And we check it every time on the rte_vhost_dequeue_burst entrance,
>>>  to see if the mbuf is freed (when refcnt equals to 1). If that
>>>  happens, it means we are the last user of this mbuf and we are
>>>  safe to update the used ring.
>>>
>>> - "struct zcopy_mbuf" is introduced, to associate an mbuf with the
>>>  right desc idx.
>>>
>>> Tx zero copy is introduced for performance reason, and some rough tests
>>> show about 40% perfomance boost for packet size 1400B. FOr small packets,
>>> (e.g. 64B), it actually slows a bit down. That is expected because this
>>> patch introduces some extra works, and it outweighs the benefit from
>>> saving few bytes copy.
>>>
>>> Signed-off-by: Yuanhan Liu <yuanhan.liu at linux.intel.com>
>>> ---
>>> lib/librte_vhost/vhost.c      |   2 +
>>> lib/librte_vhost/vhost.h      |  21 ++++++
>>> lib/librte_vhost/vhost_user.c |  41 +++++++++-
>>> lib/librte_vhost/virtio_net.c | 169 +++++++++++++++++++++++++++++++++++++-----
>>> 4 files changed, 214 insertions(+), 19 deletions(-)
>>>
>> ...
>>
>>> rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>>> 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
>>> @@ -823,6 +943,30 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>>> 	if (unlikely(vq->enabled == 0))
>>> 		return 0;
>>>
>>> +	if (dev->tx_zero_copy) {
>>> +		struct zcopy_mbuf *zmbuf, *next;
>>> +		int nr_updated = 0;
>>> +
>>> +		for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list);
>>> +		     zmbuf != NULL; zmbuf = next) {
>>> +			next = TAILQ_NEXT(zmbuf, next);
>>> +
>>> +			if (mbuf_is_consumed(zmbuf->mbuf)) {
>>> +				used_idx = vq->last_used_idx++ & (vq->size - 1);
>>> +				update_used_ring(dev, vq, used_idx,
>>> +						 zmbuf->desc_idx);
>>> +				nr_updated += 1;
>>> +
>>> +				TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next);
>>> +				rte_pktmbuf_free(zmbuf->mbuf);
>>> +				put_zmbuf(zmbuf);
>>> +				vq->nr_zmbuf -= 1;
>>> +			}
>> Shouldn't you break the loop here as soon as a mbuf is not consumed?
>
> I have thought of that as well, as a micro optimization. But I was
> wondering what if a heading mbuf is pin locked by the DPDK APP? Then
> the whole chain would be blocked. This should be rare, but I think
> we should think of the worst case.
>
> Besides that, the performance boost I got is quite decent, that I think
> we could drop this micro optimization.

Forget my comment, this was a misunderstanding of the code on my side.

Regards,
Maxime


More information about the dev mailing list