[dpdk-dev] [PATCH 4/6] vhost: add Tx zero copy
Maxime Coquelin
maxime.coquelin at redhat.com
Tue Aug 23 16:04:30 CEST 2016
On 08/23/2016 10:10 AM, Yuanhan Liu wrote:
> The basic idea of Tx zero copy is, instead of copying data from the
> desc buf, here we let the mbuf reference the desc buf addr directly.
>
> Doing so, however, has one major issue: we can't update the used ring
> at the end of rte_vhost_dequeue_burst. Because we don't do the copy
> here, an update of the used ring would let the driver to reclaim the
> desc buf. As a result, DPDK might reference a stale memory region.
>
> To update the used ring properly, this patch does several tricks:
>
> - when mbuf references a desc buf, refcnt is added by 1.
>
> This is to pin lock the mbuf, so that a mbuf free from the DPDK
> won't actually free it, instead, refcnt is subtracted by 1.
>
> - We chain all those mbuf together (by tailq)
>
> And we check it every time on the rte_vhost_dequeue_burst entrance,
> to see if the mbuf is freed (when refcnt equals to 1). If that
> happens, it means we are the last user of this mbuf and we are
> safe to update the used ring.
>
> - "struct zcopy_mbuf" is introduced, to associate an mbuf with the
> right desc idx.
>
> Tx zero copy is introduced for performance reason, and some rough tests
> show about 40% perfomance boost for packet size 1400B. FOr small packets,
> (e.g. 64B), it actually slows a bit down. That is expected because this
> patch introduces some extra works, and it outweighs the benefit from
> saving few bytes copy.
>
> Signed-off-by: Yuanhan Liu <yuanhan.liu at linux.intel.com>
> ---
> lib/librte_vhost/vhost.c | 2 +
> lib/librte_vhost/vhost.h | 21 ++++++
> lib/librte_vhost/vhost_user.c | 41 +++++++++-
> lib/librte_vhost/virtio_net.c | 169 +++++++++++++++++++++++++++++++++++++-----
> 4 files changed, 214 insertions(+), 19 deletions(-)
>
...
> rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
> @@ -823,6 +943,30 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> if (unlikely(vq->enabled == 0))
> return 0;
>
> + if (dev->tx_zero_copy) {
> + struct zcopy_mbuf *zmbuf, *next;
> + int nr_updated = 0;
> +
> + for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list);
> + zmbuf != NULL; zmbuf = next) {
> + next = TAILQ_NEXT(zmbuf, next);
> +
> + if (mbuf_is_consumed(zmbuf->mbuf)) {
> + used_idx = vq->last_used_idx++ & (vq->size - 1);
> + update_used_ring(dev, vq, used_idx,
> + zmbuf->desc_idx);
> + nr_updated += 1;
> +
> + TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next);
> + rte_pktmbuf_free(zmbuf->mbuf);
> + put_zmbuf(zmbuf);
> + vq->nr_zmbuf -= 1;
> + }
Shouldn't you break the loop here as soon as a mbuf is not consumed?
Indeed, they might not be consumed sequentially, and would cause
last_used_idx to be incremented whereas it shouldn't.
More information about the dev
mailing list