[dpdk-dev] [PATCH v3 00/19] Vhost-user: Implement device IOTLB support

Yuanhan Liu yliu at fridaylinux.org
Fri Oct 6 08:24:26 CEST 2017


Series applied to dpdk-next-virtio.

Thanks.

	--yliu

On Thu, Oct 05, 2017 at 10:36:08AM +0200, Maxime Coquelin wrote:
> This v3 lists the feature in the release note, and fixes the bug in
> is_vring_iotlb_update() reported by Yuanhan.
> 
> The purpose of this series is to add support for
> VIRTIO_F_IOMMU_PLATFORM feature, by implementing device IOTLB in the
> vhost-user backend. It improves the guest safety by enabling the
> possibility to isolate the Virtio device.
> 
> It makes possible to use Virtio PMD in guest with using VFIO driver
> without enable_unsafe_noiommu_mode parameter set, so that the DPDK
> application on guest can only access memory its has been allowed to,
> and preventing malicious/buggy DPDK application in guest to make
> vhost-user backend write random guest memory. Note that Virtio-net
> Kernel driver also support IOMMU.
> 
> The series depends on Qemu's "vhost-user: Specify and implement
> device IOTLB support" [0], available upstream and which will be part
> of Qemu v2.10 release.
> 
> Performance-wise, even if this RFC has still room for optimizations,
> no performance degradation is noticed with static mappings (i.e. DPDK
> on guest) with PVP benchmark:
> 	Traffic Generator: Moongen (lua-trafficgen)
> 	Acceptable Loss: 0.005%
> 	Validation run time: 1 min
> 	Guest DPDK version/commit: v17.05
> 	QEMU version/commit: master (6db174aed1fd)
> 	Virtio features: default
> 	CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
> 	NIC: 2 x X710
> 	Page size: 1G host/1G guest
> 	Results (bidirectional, total of the two flows):
> 	 - base: 18.8Mpps
> 	 - base + IOTLB series, IOMMU OFF: 18.8Mpps
> 	 - base + IOTLB series, IOMMU ON: 18.8Mpps (14.5Mpps w/o PATCH 21/21)
> 
> This is explained because IOTLB misses, which are very costly, only
> happen at startup time. Indeed, once used, the buffers are not
> invalidated, so if the IOTLB cache is large enough, there will be only
> cache hits. Also, the use of 1G huge pages improves the IOTLB cache
> searching time by reducing the number of entries.
> 
> With 2M hugepages, a performance degradation is seen with IOMMU on:
> 	Traffic Generator: Moongen (lua-trafficgen)
> 	Acceptable Loss: 0.005%
> 	Validation run time: 1 min
> 	Guest DPDK version/commit: v17.05
> 	QEMU version/commit: master (6db174aed1fd)
> 	Virtio features: default
> 	CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
> 	NIC: 2 x X710
> 	Page size: 2M host/2M guest
> 	Results (bidirectional, total of the two flows):
> 	 - base: 18.8Mpps
> 	 - base + IOTLB series, IOMMU OFF: 18.8Mpps
> 	 - base + IOTLB series, IOMMU ON: 13.5Mpps (12.4Mpps wo PATCH 21/21)
> 
> A possible improvement would be to merge contiguous IOTLB entries sharing
> the same permissions. A very rough patch implementing this idea fixes
> the performance degradation (18.8Mpps), but the required work to clean
> it would delay this series after v17.11.
> 
> With dynamic mappings (i.e. Virtio-net kernel driver), this is another
> story. The performance is so poor it makes it almost unusable. Indeed,
> since the Kernel driver unmaps the buffers as soon as they are handled,
> almost all descriptors buffers addresses translations result in an IOTLB
> miss. There is not much that can be done on DPDK side, except maybe
> batching IOTLB miss requests no to break bursts, but it would require
> a big rework. In Qemu, we may consider enabling IOMMU MAP notifications,
> so that DPDK receives the IOTLB updates without having to send IOTLB miss
> request.
> 
> Regarding the design choices:
>  - I initially intended to use userspace RCU library[1] for the cache
> implementation, but it would have added an external dependency, and the
> lib is not available in all distros. Qemu for example got rid of this
> dependency by copying some of the userspace RCU lib parts into Qemu tree,
> but this is not possible with DPDK due to licensing issues (RCU lib is
> LGPL v2). Thanks to Jason advice, I implemented the cache using rd/wr
> locks.
>  - I initially implemented a per-device IOTLB cache, but the concurrent
> acccesses on the IOTLB lock had huge impact on performance (~-40% in
> bidirectionnal, expect even worse with multiqueue). I move to a per-
> virtqueue IOTLB design, which prevents this concurrency.
>  - The slave IOTLB miss request supports reply-ack feature in spec, but
> this version doesn't block or busy-wait for the corresponding update so
> that other queues sharing the same lcore can be processed in the meantime.
> 
> For those who would like to test the series, I made it available on
> gitlab[2] (vhost_user_iotlb_v1 tag). The guest kernel command line requires
> the intel_iommu=on parameter, and the guest should be started with and
> iommu device attached to the virtio-net device. For example:
> 
> ./qemu-system-x86_64 \
>   -enable-kvm -m 4096 -smp 2 \
>   -M q35,kernel-irqchip=split \
>   -cpu host \
>   -device intel-iommu,device-iotlb=on,intremap \
>   -device ioh3420,id=root.1,chassis=1 \
>   -chardev socket,id=char0,path=/tmp/vhost-user1 \
>   -netdev type=vhost-user,id=hn2,chardev=char0 \
>   -device virtio-net-pci,netdev=hn2,id=v0,mq=off,mac=$MAC,bus=root.1,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \
> ...
> 
> [0]: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg00520.html
> [1]: http://liburcu.org/
> [2]: https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_user_iotlb_v2
> 
> Changes since v1:
> =================
> - Add feature support in release note (Yuanhan)
> - Fix return value in is_vring_iotlb_update() (Yuanhan)
> 
> Changes since v1:
> =================
> - Implement random cache entry eviction instead of removing all entries
>   when cache is full (Yuanhan)
> - Adds bounds check on vring_idx (Remy)
> - Initialize slave_req_fd to -1 (Tiwei)
> - Remove virtio-net device lock (Tiwei)
> - Simplify vhost_user_iotlb_cache_remove() (Tiwei)
> - Squash iotlb lock usage optimization patch
> - Inline noiommu part of vhost_iova_to_vva to remove performance
>   regressions with IOMMU=off
> - Reworked iotlb files copyrights
> 
> Changes since RFC:
> ==================
> - Fix memory leak in error patch reported by Jens
> - Rework wait for IOTLB update by stopping the burst to let other
>   queues to be processed, if any. It implies the introduction an
>   iotlb_pending_list, so that iotlb miss requests aren't sent multiple
>   times for a same address.
> - Optimize iotlb lock usage to recover to same as IOMMU off performance
> - Fix device locking issue in rte_vhost_dequeue_burst() error path
> - Change virtio_dev_rx error handling for consistency with mergeable rx,
>   and to ease returning in case of IOTLB misses.
> - Fix checkpatch warnings reported by checkpatch at dpdk.org
> 
> Maxime Coquelin (19):
>   Revert "vhost: workaround MQ fails to startup"
>   vhost: make error handling consistent in rx path
>   vhost: prepare send_vhost_message() to slave requests
>   vhost: add support to slave requests channel
>   vhost: declare missing IOMMU-related definitions for old kernels
>   vhost: add iotlb helper functions
>   vhost: iotlb: add pending miss request list and helpers
>   vhost-user: add support to IOTLB miss slave requests
>   vhost: initialize vrings IOTLB caches
>   vhost-user: handle IOTLB update and invalidate requests
>   vhost: introduce guest IOVA to backend VA helper
>   vhost: use the guest IOVA to host VA helper
>   vhost: enable rings at the right time
>   vhost: don't dereference invalid dev pointer after its reallocation
>   vhost: postpone rings addresses translation
>   vhost-user: translate ring addresses when IOMMU enabled
>   vhost-user: iommu: postpone device creation until ring are mapped
>   vhost: iommu: Invalidate vring in case of matching IOTLB invalidate
>   vhost: enable IOMMU support
> 
>  doc/guides/rel_notes/release_17_11.rst |   4 +
>  lib/librte_vhost/Makefile              |   4 +-
>  lib/librte_vhost/iotlb.c               | 352 +++++++++++++++++++++++++++++++++
>  lib/librte_vhost/iotlb.h               |  76 +++++++
>  lib/librte_vhost/vhost.c               | 115 +++++++++--
>  lib/librte_vhost/vhost.h               |  61 +++++-
>  lib/librte_vhost/vhost_user.c          | 312 +++++++++++++++++++++++++----
>  lib/librte_vhost/vhost_user.h          |  20 +-
>  lib/librte_vhost/virtio_net.c          |  96 +++++++--
>  9 files changed, 962 insertions(+), 78 deletions(-)
>  create mode 100644 lib/librte_vhost/iotlb.c
>  create mode 100644 lib/librte_vhost/iotlb.h
> 
> -- 
> 2.13.6


More information about the dev mailing list