[dpdk-dev] [PATCH v2 0/5] virtio support for container

Thomas Monjalon thomas.monjalon at 6wind.com
Wed Apr 13 18:14:41 CEST 2016


Hi Jianfeng,

Thanks for raising the container issues and proposing some solutions.
General comments below.

2016-02-05 19:20, Jianfeng Tan:
> This patchset is to provide high performance networking interface (virtio)
> for container-based DPDK applications. The way of starting DPDK apps in
> containers with ownership of NIC devices exclusively is beyond the scope.
> The basic idea here is to present a new virtual device (named eth_cvio),
> which can be discovered and initialized in container-based DPDK apps using
> rte_eal_init(). To minimize the change, we reuse already-existing virtio
> frontend driver code (driver/net/virtio/).
>  
> Compared to QEMU/VM case, virtio device framework (translates I/O port r/w
> operations into unix socket/cuse protocol, which is originally provided in
> QEMU), is integrated in virtio frontend driver. So this converged driver
> actually plays the role of original frontend driver and the role of QEMU
> device framework.
>  
> The major difference lies in how to calculate relative address for vhost.
> The principle of virtio is that: based on one or multiple shared memory
> segments, vhost maintains a reference system with the base addresses and
> length for each segment so that an address from VM comes (usually GPA,
> Guest Physical Address) can be translated into vhost-recognizable address
> (named VVA, Vhost Virtual Address). To decrease the overhead of address
> translation, we should maintain as few segments as possible. In VM's case,
> GPA is always locally continuous. In container's case, CVA (Container
> Virtual Address) can be used. Specifically:
> a. when set_base_addr, CVA address is used;
> b. when preparing RX's descriptors, CVA address is used;
> c. when transmitting packets, CVA is filled in TX's descriptors;
> d. in TX and CQ's header, CVA is used.
>  
> How to share memory? In VM's case, qemu always shares all physical layout
> to backend. But it's not feasible for a container, as a process, to share
> all virtual memory regions to backend. So only specified virtual memory
> regions (with type of shared) are sent to backend. It's a limitation that
> only addresses in these areas can be used to transmit or receive packets.
> 
> Known issues
> 
> a. When used with vhost-net, root privilege is required to create tap
> device inside.
> b. Control queue and multi-queue are not supported yet.
> c. When --single-file option is used, socket_id of the memory may be
> wrong. (Use "numactl -N x -m x" to work around this for now)

There are 2 different topics in this patchset:
1/ How to provide networking in containers
2/ How to provide memory in containers

1/ You have decided to use the virtio spec to bridge the host
with its containers. But there is no virtio device in a container
and no vhost interface in the host (except the kernel one).
So you are extending virtio to work as a vdev inside the container.
Could you explain what is the datapath between virtio and the host app?
Does it need to use a fake device from Qemu as Tetsuya has done?

Do you think there can be some alternatives to vhost/virtio in containers?

2/ The memory management is already a mess and it's going worst.
I think we need to think the requirements first and then write a proper
implementation to cover every identified needs.
I have started a new thread to cover this part:
	http://thread.gmane.org/gmane.comp.networking.dpdk.devel/37445


More information about the dev mailing list