[dpdk-dev] [PATCH 0/4] virtio support for container

Qiu, Michael michael.qiu at intel.com
Tue Jan 26 07:02:03 CET 2016


On 1/11/2016 2:43 AM, Tan, Jianfeng wrote:
> This patchset is to provide high performance networking interface (virtio)
> for container-based DPDK applications. The way of starting DPDK apps in
> containers with ownership of NIC devices exclusively is beyond the scope.
> The basic idea here is to present a new virtual device (named eth_cvio),
> which can be discovered and initialized in container-based DPDK apps using
> rte_eal_init(). To minimize the change, we reuse already-existing virtio
> frontend driver code (driver/net/virtio/).
>  
> Compared to QEMU/VM case, virtio device framework (translates I/O port r/w
> operations into unix socket/cuse protocol, which is originally provided in
> QEMU), is integrated in virtio frontend driver. So this converged driver
> actually plays the role of original frontend driver and the role of QEMU
> device framework.
>  
> The major difference lies in how to calculate relative address for vhost.
> The principle of virtio is that: based on one or multiple shared memory
> segments, vhost maintains a reference system with the base addresses and
> length for each segment so that an address from VM comes (usually GPA,
> Guest Physical Address) can be translated into vhost-recognizable address
> (named VVA, Vhost Virtual Address). To decrease the overhead of address
> translation, we should maintain as few segments as possible. In VM's case,
> GPA is always locally continuous. In container's case, CVA (Container
> Virtual Address) can be used. Specifically:
> a. when set_base_addr, CVA address is used;
> b. when preparing RX's descriptors, CVA address is used;
> c. when transmitting packets, CVA is filled in TX's descriptors;
> d. in TX and CQ's header, CVA is used.
>  
> How to share memory? In VM's case, qemu always shares all physical layout
> to backend. But it's not feasible for a container, as a process, to share
> all virtual memory regions to backend. So only specified virtual memory
> regions (with type of shared) are sent to backend. It's a limitation that
> only addresses in these areas can be used to transmit or receive packets.
>
> Known issues
>
> a. When used with vhost-net, root privilege is required to create tap
> device inside.
> b. Control queue and multi-queue are not supported yet.
> c. When --single-file option is used, socket_id of the memory may be
> wrong. (Use "numactl -N x -m x" to work around this for now)
>  
> How to use?
>
> a. Apply this patchset.
>
> b. To compile container apps:
> $: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
> $: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
> $: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
> $: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
>
> c. To build a docker image using Dockerfile below.
> $: cat ./Dockerfile
> FROM ubuntu:latest
> WORKDIR /usr/src/dpdk
> COPY . /usr/src/dpdk
> ENV PATH "$PATH:/usr/src/dpdk/examples/l2fwd/build/"
> $: docker build -t dpdk-app-l2fwd .
>
> d. Used with vhost-user
> $: ./examples/vhost/build/vhost-switch -c 3 -n 4 \
> 	--socket-mem 1024,1024 -- -p 0x1 --stats 1
> $: docker run -i -t -v <path_to_vhost_unix_socket>:/var/run/usvhost \
> 	-v /dev/hugepages:/dev/hugepages \
> 	dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \
> 	--vdev=eth_cvio0,path=/var/run/usvhost -- -p 0x1
>
> f. Used with vhost-net
> $: modprobe vhost
> $: modprobe vhost-net
> $: docker run -i -t --privileged \
> 	-v /dev/vhost-net:/dev/vhost-net \
> 	-v /dev/net/tun:/dev/net/tun \
> 	-v /dev/hugepages:/dev/hugepages \
> 	dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \
> 	--vdev=eth_cvio0,path=/dev/vhost-net -- -p 0x1

We'd better add a ifname, like
--vdev=eth_cvio0,path=/dev/vhost-net,ifname=tap0, so that user could add
the tap to the bridge first.

Thanks,
Michael
>
> By the way, it's not necessary to run in a container.
>
> Signed-off-by: Huawei Xie <huawei.xie at intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com>
>
> Jianfeng Tan (4):
>   mem: add --single-file to create single mem-backed file
>   mem: add API to obstain memory-backed file info
>   virtio/vdev: add ways to interact with vhost
>   virtio/vdev: add a new vdev named eth_cvio
>
>  config/common_linuxapp                     |   5 +
>  drivers/net/virtio/Makefile                |   4 +
>  drivers/net/virtio/vhost.c                 | 734 +++++++++++++++++++++++++++++
>  drivers/net/virtio/vhost.h                 | 192 ++++++++
>  drivers/net/virtio/virtio_ethdev.c         | 338 ++++++++++---
>  drivers/net/virtio/virtio_ethdev.h         |   4 +
>  drivers/net/virtio/virtio_pci.h            |  52 +-
>  drivers/net/virtio/virtio_rxtx.c           |  11 +-
>  drivers/net/virtio/virtio_rxtx_simple.c    |  14 +-
>  drivers/net/virtio/virtqueue.h             |  13 +-
>  lib/librte_eal/common/eal_common_options.c |  17 +
>  lib/librte_eal/common/eal_internal_cfg.h   |   1 +
>  lib/librte_eal/common/eal_options.h        |   2 +
>  lib/librte_eal/common/include/rte_memory.h |  16 +
>  lib/librte_eal/linuxapp/eal/eal_memory.c   |  82 +++-
>  15 files changed, 1392 insertions(+), 93 deletions(-)
>  create mode 100644 drivers/net/virtio/vhost.c
>  create mode 100644 drivers/net/virtio/vhost.h
>



More information about the dev mailing list