[dpdk-dev] [PATCH 4/6] vhost: add Tx zero copy

Maxime Coquelin maxime.coquelin at redhat.com
Tue Aug 23 16:04:30 CEST 2016



On 08/23/2016 10:10 AM, Yuanhan Liu wrote:
> The basic idea of Tx zero copy is, instead of copying data from the
> desc buf, here we let the mbuf reference the desc buf addr directly.
>
> Doing so, however, has one major issue: we can't update the used ring
> at the end of rte_vhost_dequeue_burst. Because we don't do the copy
> here, an update of the used ring would let the driver to reclaim the
> desc buf. As a result, DPDK might reference a stale memory region.
>
> To update the used ring properly, this patch does several tricks:
>
> - when mbuf references a desc buf, refcnt is added by 1.
>
>   This is to pin lock the mbuf, so that a mbuf free from the DPDK
>   won't actually free it, instead, refcnt is subtracted by 1.
>
> - We chain all those mbuf together (by tailq)
>
>   And we check it every time on the rte_vhost_dequeue_burst entrance,
>   to see if the mbuf is freed (when refcnt equals to 1). If that
>   happens, it means we are the last user of this mbuf and we are
>   safe to update the used ring.
>
> - "struct zcopy_mbuf" is introduced, to associate an mbuf with the
>   right desc idx.
>
> Tx zero copy is introduced for performance reason, and some rough tests
> show about 40% perfomance boost for packet size 1400B. FOr small packets,
> (e.g. 64B), it actually slows a bit down. That is expected because this
> patch introduces some extra works, and it outweighs the benefit from
> saving few bytes copy.
>
> Signed-off-by: Yuanhan Liu <yuanhan.liu at linux.intel.com>
> ---
>  lib/librte_vhost/vhost.c      |   2 +
>  lib/librte_vhost/vhost.h      |  21 ++++++
>  lib/librte_vhost/vhost_user.c |  41 +++++++++-
>  lib/librte_vhost/virtio_net.c | 169 +++++++++++++++++++++++++++++++++++++-----
>  4 files changed, 214 insertions(+), 19 deletions(-)
>
...

>  rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>  	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
> @@ -823,6 +943,30 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>  	if (unlikely(vq->enabled == 0))
>  		return 0;
>
> +	if (dev->tx_zero_copy) {
> +		struct zcopy_mbuf *zmbuf, *next;
> +		int nr_updated = 0;
> +
> +		for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list);
> +		     zmbuf != NULL; zmbuf = next) {
> +			next = TAILQ_NEXT(zmbuf, next);
> +
> +			if (mbuf_is_consumed(zmbuf->mbuf)) {
> +				used_idx = vq->last_used_idx++ & (vq->size - 1);
> +				update_used_ring(dev, vq, used_idx,
> +						 zmbuf->desc_idx);
> +				nr_updated += 1;
> +
> +				TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next);
> +				rte_pktmbuf_free(zmbuf->mbuf);
> +				put_zmbuf(zmbuf);
> +				vq->nr_zmbuf -= 1;
> +			}
Shouldn't you break the loop here as soon as a mbuf is not consumed?
Indeed, they might not be consumed sequentially, and would cause
last_used_idx to be incremented whereas it shouldn't.



More information about the dev mailing list