[PATCH v5 00/26] Add VDUSE support to Vhost library

Maxime Coquelin maxime.coquelin at redhat.com
Wed Jun 7 16:58:46 CEST 2023



On 6/7/23 08:48, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>> Sent: Tuesday, June 6, 2023 4:18 PM
>> To: dev at dpdk.org; Xia, Chenbo <chenbo.xia at intel.com>;
>> david.marchand at redhat.com; mkp at redhat.com; fbl at redhat.com;
>> jasowang at redhat.com; Liang, Cunming <cunming.liang at intel.com>; Xie, Yongji
>> <xieyongji at bytedance.com>; echaudro at redhat.com; eperezma at redhat.com;
>> amorenoz at redhat.com; lulu at redhat.com
>> Cc: Maxime Coquelin <maxime.coquelin at redhat.com>
>> Subject: [PATCH v5 00/26] Add VDUSE support to Vhost library
>>
>> This series introduces a new type of backend, VDUSE,
>> to the Vhost library.
>>
>> VDUSE stands for vDPA device in Userspace, it enables
>> implementing a Virtio device in userspace and have it
>> attached to the Kernel vDPA bus.
>>
>> Once attached to the vDPA bus, it can be used either by
>> Kernel Virtio drivers, like virtio-net in our case, via
>> the virtio-vdpa driver. Doing that, the device is visible
>> to the Kernel networking stack and is exposed to userspace
>> as a regular netdev.
>>
>> It can also be exposed to userspace thanks to the
>> vhost-vdpa driver, via a vhost-vdpa chardev that can be
>> passed to QEMU or Virtio-user PMD.
>>
>> While VDUSE support is already available in upstream
>> Kernel, a couple of patches are required to support
>> network device type:
>>
>> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
>>
>> In order to attach the created VDUSE device to the vDPA
>> bus, a recent iproute2 version containing the vdpa tool is
>> required.
>>
>> Benchmark results:
>> ==================
>>
>> On this v2, PVP reference benchmark has been run & compared with
>> Vhost-user.
>>
>> When doing macswap forwarding in the worload, no difference is seen.
>> When doing io forwarding in the workload, we see 4% performance
>> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
>> explained by the use of the IOTLB layer in the Vhost-library when using
>> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
>>
>> Usage:
>> ======
>>
>> 1. Probe required Kernel modules
>> # modprobe vdpa
>> # modprobe vduse
>> # modprobe virtio-vdpa
>>
>> 2. Build (require vduse kernel headers to be available)
>> # meson build
>> # ninja -C build
>>
>> 3. Create a VDUSE device (vduse0) using Vhost PMD with
>> testpmd (with 4 queue pairs in this example)
>> # ./build/app/dpdk-testpmd --no-pci --
>> vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --
>> txq=4 --rxq=4
>>
>> 4. Attach the VDUSE device to the vDPA bus
>> # vdpa dev add name vduse0 mgmtdev vduse
>> => The virtio-net netdev shows up (eth0 here)
>> # ip l show eth0
>> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
>> mode DEFAULT group default qlen 1000
>>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
>>
>> 5. Start/stop traffic in testpmd
>> testpmd> start
>> testpmd> show port stats 0
>>    ######################## NIC statistics for port 0
>> ########################
>>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>>    RX-errors: 0
>>    RX-nombuf:  0
>>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
>>
>>    Throughput (since last show)
>>    Rx-pps:            0          Rx-bps:            0
>>    Tx-pps:            0          Tx-bps:            0
>>
>> ##########################################################################
>> ##
>> testpmd> stop
>>
>> 6. Detach the VDUSE device from the vDPA bus
>> # vdpa dev del vduse0
>>
>> 7. Quit testpmd
>> testpmd> quit
>>
>> Known issues & remaining work:
>> ==============================
>> - Fix issue in FD manager (still polling while FD has been removed)
>> - Add Netlink support in Vhost library
>> - Support device reconnection
>>   -> a temporary patch to support reconnection via a tmpfs file is
>> available,
>>      upstream solution would be in-kernel and is being developed.
>>   -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-
>> /commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
>> - Support packed ring
>> - Provide more performance benchmark results
>>
>> Changes in v5:
>> ==============
>> - Delay starting/stopping the device to after having replied to the VDUSE
>>    event in order to avoid a deadlock encountered when testing with OVS.
> 
> Could you explain more to help me understand the deadlock issue?

Sure.

The V5 fixes an ABBA deadlock involving OVS mutex and kernel
rtnl_lock(), two OVS threads and the vdpa tool process.

We have an OVS bridge with a mlx5 port already added.
We add the vduse port to the same bridge.
Then we use the iproute2 vdpa tool to attach the vduse device the the
kernel vdpa bus. when doing this the rtnl lock is taken when the virtio-
net device is probed, and VDUSE_SET_STATUS gets sent and waits for its
reply.

This VDUSE_SET_STATUS request is handled by the DPDK VDUSE event
handler, and if DRIVER_OK bit is set the Vhsot .new_device() callback is
called, which triggers a bridge reconfiguration.

On bridge reconfiguration, the mlx5 port takes the OVS mutex and
performs an ioctl() which tries to take the rtnl lock, but is is already
owned by the vdpa tool.

The vduse_events thread is stucked waiting for the OVS mutex, so the 
reply to the VDUSE_SET_STATUS event is never sent, and the vdpa tool
process is stucked for 30 seconds, until a timeout happens.

When the timeourt happen, everything is unblocked, but the VDUSE device
has been marked as broken, and so not usable anymore.

I could reproduce and provide you the backtraces of the different
threads if you wish.

Anyway, I think it makes sense to perform the device startup after
having replied to VDUSE_SET_STATUS request, as it just mean the device
has taken into account the new status of the driver.

Hope it clarifies, let me know if you need more details.

Thanks,
Maxime

> Thanks,
> Chenbo
> 
>> - Mention reconnection support lack in the release note.
>>
>> Changes in v4:
>> ==============
>> - Applied patch 1 and patch 2 from v3
>> - Rebased on top of Eelco series
>> - Fix coredump clear in IOTLB cache removal (David)
>> - Remove uneeded ret variable in vhost_vring_inject_irq (David)
>> - Fixed release note (David, Chenbo)
>>
>> Changes in v2/v3:
>> =================
>> - Fixed mem_set_dump() parameter (patch 4)
>> - Fixed accidental comment change (patch 7, Chenbo)
>> - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
>> - move change from patch 12 to 13 (Chenbo)
>> - Enable locks annotation for control queue (Patch 17)
>> - Send control queue notification when used descriptors enqueued (Patch 17)
>> - Lock control queue IOTLB lock (Patch 17)
>> - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
>> - Set VDUSE dev FD as NONBLOCK (Patch 18)
>> - Enable more Virtio features (Patch 18)
>> - Remove calls to pthread_setcancelstate() (Patch 22)
>> - Add calls to fdset_pipe_notify() when adding and deleting FDs from a set
>> (Patch 22)
>> - Use RTE_DIM() to get requests string array size (Patch 22)
>> - Set reply result for IOTLB update message (Patch 25, Chenbo)
>> - Fix queues enablement with multiqueue (Patch 26)
>> - Move kickfd creation for better logging (Patch 26)
>> - Improve logging (Patch 26)
>> - Uninstall cvq kickfd in case of handler installation failure (Patch 27)
>> - Enable CVQ notifications once handler is installed (Patch 27)
>> - Don't advertise multiqueue and control queue if app only request single
>> queue pair (Patch 27)
>> - Add release notes
>>
>> Maxime Coquelin (26):
>>    vhost: fix IOTLB entries overlap check with previous entry
>>    vhost: add helper of IOTLB entries coredump
>>    vhost: add helper for IOTLB entries shared page check
>>    vhost: don't dump unneeded pages with IOTLB
>>    vhost: change to single IOTLB cache per device
>>    vhost: add offset field to IOTLB entries
>>    vhost: add page size info to IOTLB entry
>>    vhost: retry translating IOVA after IOTLB miss
>>    vhost: introduce backend ops
>>    vhost: add IOTLB cache entry removal callback
>>    vhost: add helper for IOTLB misses
>>    vhost: add helper for interrupt injection
>>    vhost: add API to set max queue pairs
>>    net/vhost: use API to set max queue pairs
>>    vhost: add control virtqueue support
>>    vhost: add VDUSE device creation and destruction
>>    vhost: add VDUSE callback for IOTLB miss
>>    vhost: add VDUSE callback for IOTLB entry removal
>>    vhost: add VDUSE callback for IRQ injection
>>    vhost: add VDUSE events handler
>>    vhost: add support for virtqueue state get event
>>    vhost: add support for VDUSE status set event
>>    vhost: add support for VDUSE IOTLB update event
>>    vhost: add VDUSE device startup
>>    vhost: add multiqueue support to VDUSE
>>    vhost: add VDUSE device stop
>>
>>   doc/guides/prog_guide/vhost_lib.rst    |   4 +
>>   doc/guides/rel_notes/release_23_07.rst |  12 +
>>   drivers/net/vhost/rte_eth_vhost.c      |   3 +
>>   lib/vhost/iotlb.c                      | 333 +++++++------
>>   lib/vhost/iotlb.h                      |  45 +-
>>   lib/vhost/meson.build                  |   5 +
>>   lib/vhost/rte_vhost.h                  |  17 +
>>   lib/vhost/socket.c                     |  72 ++-
>>   lib/vhost/vduse.c                      | 646 +++++++++++++++++++++++++
>>   lib/vhost/vduse.h                      |  33 ++
>>   lib/vhost/version.map                  |   1 +
>>   lib/vhost/vhost.c                      |  70 ++-
>>   lib/vhost/vhost.h                      |  57 ++-
>>   lib/vhost/vhost_user.c                 |  51 +-
>>   lib/vhost/vhost_user.h                 |   2 +-
>>   lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
>>   lib/vhost/virtio_net_ctrl.h            |  10 +
>>   17 files changed, 1409 insertions(+), 238 deletions(-)
>>   create mode 100644 lib/vhost/vduse.c
>>   create mode 100644 lib/vhost/vduse.h
>>   create mode 100644 lib/vhost/virtio_net_ctrl.c
>>   create mode 100644 lib/vhost/virtio_net_ctrl.h
>>
>> --
>> 2.40.1
> 



More information about the dev mailing list