[dpdk-dev] [PATCH v2 5/7] net/virtio_user: add vhost kernel support

Tan, Jianfeng jianfeng.tan at intel.com
Wed Jan 11 04:13:09 CET 2017



On 1/11/2017 10:42 AM, Jason Wang wrote:
>
>
> On 2017年01月10日 14:11, Tan, Jianfeng wrote:
>>
>> Hi Jason,
>>
>>
>> On 1/9/2017 12:39 PM, Jason Wang wrote:
>>>
>>>
>>> On 2016年12月23日 15:14, Jianfeng Tan wrote:
>>>> This patch add support vhost kernel as the backend for virtio_user.
>>>> Three main hook functions are added:
>>>>    - vhost_kernel_setup() to open char device, each vq pair needs one
>>>>      vhostfd;
>>>>    - vhost_kernel_ioctl() to communicate control messages with vhost
>>>>      kernel module;
>>>>    - vhost_kernel_enable_queue_pair() to open tap device and set it
>>>>      as the backend of corresonding vhost fd (that is to say, vq 
>>>> pair).
>>>>
>>>> Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com>
>>>> ---
>>>>
>> [...]
>>>> +/* TUNSETIFF ifr flags */
>>>> +#define IFF_TAP          0x0002
>>>> +#define IFF_NO_PI        0x1000
>>>> +#define IFF_ONE_QUEUE    0x2000
>>>> +#define IFF_VNET_HDR     0x4000
>>>> +#define IFF_MULTI_QUEUE  0x0100
>>>> +#define IFF_ATTACH_QUEUE 0x0200
>>>> +#define IFF_DETACH_QUEUE 0x0400
>>>
>>> Do we really want to duplicate those things which has been exposed 
>>> by uapi here?
>>
>> You mean those defined in <linux/if_tun.h>? Redefine those common 
>> macros, or include standard header file, with respective pros and 
>> cons. DPDK prefers the redefinition way as far as I understand, 
>> doesn't it?
>>
>
> Well, if you really want to do this, you may want to use an 
> independent file. Then you can sync it with linux headers with a bash 
> script.

Agreed!

[...]
>>>> +static struct vhost_memory_kernel *
>>>> +prepare_vhost_memory_kernel(void)
>>>> +{
>>>> +    uint32_t i, j, k = 0;
>>>> +    struct rte_memseg *seg;
>>>> +    struct vhost_memory_region *mr;
>>>> +    struct vhost_memory_kernel *vm;
>>>> +
>>>> +    vm = malloc(sizeof(struct vhost_memory_kernel) +
>>>> +            VHOST_KERNEL_MAX_REGIONS *
>>>> +            sizeof(struct vhost_memory_region));
>>>> +
>>>> +    for (i = 0; i < RTE_MAX_MEMSEG; ++i) {
>>>> +        seg = &rte_eal_get_configuration()->mem_config->memseg[i];
>>>> +        if (!seg->addr)
>>>> +            break;
>>>
>>> If we're sure the number of regions is less than 64(or the module 
>>> parameter read from /sys), can we avoid the iteration here?
>>
>> The "if" statement under the "for" statement can save us from all 
>> RTE_MAX_MEMSEG iteration.
>
> But if we know the number of regions is short than the limit, there's 
> even no need for this?

The problem is: we don't have a variable to know how many segments are 
there in DPDK. Anyway, we need to report error in the situation that # 
of segments > # of regions. Or can you give a more specific example?

[...]
>>>> +    if (dev->ifname)
>>>> +        strncpy(ifr.ifr_name, dev->ifname, IFNAMSIZ);
>>>> +    else
>>>> +        strncpy(ifr.ifr_name, "tap%d", IFNAMSIZ);
>>>> +    if (ioctl(tapfd, TUNSETIFF, (void *)&ifr) == -1) {
>>>> +        PMD_DRV_LOG(ERR, "TUNSETIFF failed: %s", strerror(errno));
>>>> +        goto error;
>>>> +    }
>>>
>>> This requires CAP_NET_ADMIN, so we should really consider to accept 
>>> a pre-created fd here.
>>
>> It sounds like a future work for me. So far, all DPDK apps are 
>> running in privileged mode (including CAP_NET_ADMIN?).
>>
>
> That's not safe. 

Yes.

> Accepting a per-created fd can solve this, and can even support macvtap

So you are proposing a way to specify the virtio_user parameter like, 
--vdev=virtio_user,tapfd=fd1,xxx, right? As replied in your another 
comment, it's a great way to go. But I'm afraid we have no time to 
implement that in this cycle.

Thanks,
Jianfeng


More information about the dev mailing list