[dpdk-dev] [RFC 0/5] virtio support for container

Tan, Jianfeng jianfeng.tan at intel.com
Tue Nov 24 07:19:07 CET 2015



> -----Original Message-----
> From: Zhuangyanying [mailto:ann.zhuangyanying at huawei.com]
> Sent: Tuesday, November 24, 2015 11:53 AM
> To: Tan, Jianfeng; dev at dpdk.org
> Cc: mst at redhat.com; mukawa at igel.co.jp; nakajima.yoshihiro at lab.ntt.co.jp;
> Qiu, Michael; Guohongzhen; Zhoujingbin; Zhangbo (Oscar); gaoxiaoqiu;
> Zhbzg; Xie, Huawei
> Subject: RE: [RFC 0/5] virtio support for container
> 
> 
> 
> > -----Original Message-----
> > From: Jianfeng Tan [mailto:jianfeng.tan at intel.com]
> > Sent: Friday, November 06, 2015 2:31 AM
> > To: dev at dpdk.org
> > Cc: mst at redhat.com; mukawa at igel.co.jp;
> nakajima.yoshihiro at lab.ntt.co.jp;
> > michael.qiu at intel.com; Guohongzhen; Zhoujingbin; Zhuangyanying;
> Zhangbo
> > (Oscar); gaoxiaoqiu; Zhbzg; huawei.xie at intel.com; Jianfeng Tan
> > Subject: [RFC 0/5] virtio support for container
> >
...
> > 2.1.4
> 
> This patch arose a good idea to add an extra abstracted IO layer,  which
> would make it simple to extend the function to the kernel mode switch(such
> as OVS). That's great.
> But I have one question here:
>     it's the issue on VHOST_USER_SET_MEM_TABLE. you alloc memory from
> tmpfs filesyste, just one fd, could used rte_memseg_info_get() to
> 	directly get the memory topology, However, things change in kernel-
> space, because mempool should be created on each container's
> 	hugetlbfs(rather than tmpfs), which is seperated from each other, at
> last, considering of the ioctl's parameter.
>        My solution is as follows for your reference:
> /*
> 	reg = mem->regions;
> 	reg->guest_phys_addr = (__u64) ((struct virtqueue *)(dev->data-
> >rx_queues[0]))->mpool->elt_va_start;
> 	reg->userspace_addr = reg->guest_phys_addr;
> 	reg->memory_size = ((struct virtqueue *)(dev->data-
> >rx_queues[0]))->mpool->elt_va_end - reg->guest_phys_addr;
> 
> 	reg = mem->regions + 1;
> 	reg->guest_phys_addr = (__u64)(((struct virtqueue *)(dev->data-
> >tx_queues[0]))->virtio_net_hdr_mem);
> 	reg->userspace_addr = reg->guest_phys_addr;
> 	reg->memory_size = vq_size * internals->vtnet_hdr_size;
> */
> 	   But it's a little ugly, any better idea?

Hi Yanying,

Your solution seems ok for me when used with kernel vhost-net, because vhost
kthread just shares the same mm_struct with virtio process. But it will not work
with vhost-user, which realize memory sharing through putting fd in sendmsg().
Worse, it will not work with userspace vhost_cuse (see
lib/librte_vhost/vhost_cuse/), either, because current implementation supposes
VM's physical memory is backed by one huge file. Actually, what we need to do
Is enhancing userspace vhost_cuse, so that it supports cross-file memory region.

With below solutions to support hugetlbfs FYI:

To support hugetlbfs, my previous idea is to use -v option of "docker run"
to map hugetlbfs into its /dev/shm, so that we can create a "huge" shm file
on hugetlbfs. But this seems not accepted by others.

You mentioned the situation that DPDK now creates a file for each hugepage.
Maybe we just need to share all these hugepages with vhost. To minimize the
memory translation effort, we need to require that we use as few pages as
possible. Can you accept this solution?

Thanks,
Jianfeng


More information about the dev mailing list