[dpdk-dev] [PATCH v2 5/7] net/virtio_user: add vhost kernel support

Jason Wang jasowang at redhat.com
Wed Jan 11 04:23:02 CET 2017



On 2017年01月11日 11:13, Tan, Jianfeng wrote:
>
>
> On 1/11/2017 10:42 AM, Jason Wang wrote:
>>
>>
>> On 2017年01月10日 14:11, Tan, Jianfeng wrote:
>>>
>>> Hi Jason,
>>>
>>>
>>> On 1/9/2017 12:39 PM, Jason Wang wrote:
>>>>
>>>>
>>>> On 2016年12月23日 15:14, Jianfeng Tan wrote:
>>>>> This patch add support vhost kernel as the backend for virtio_user.
>>>>> Three main hook functions are added:
>>>>>    - vhost_kernel_setup() to open char device, each vq pair needs one
>>>>>      vhostfd;
>>>>>    - vhost_kernel_ioctl() to communicate control messages with vhost
>>>>>      kernel module;
>>>>>    - vhost_kernel_enable_queue_pair() to open tap device and set it
>>>>>      as the backend of corresonding vhost fd (that is to say, vq 
>>>>> pair).
>>>>>
>>>>> Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com>
>>>>> ---
>>>>>
>>> [...]
>>>>> +/* TUNSETIFF ifr flags */
>>>>> +#define IFF_TAP          0x0002
>>>>> +#define IFF_NO_PI        0x1000
>>>>> +#define IFF_ONE_QUEUE    0x2000
>>>>> +#define IFF_VNET_HDR     0x4000
>>>>> +#define IFF_MULTI_QUEUE  0x0100
>>>>> +#define IFF_ATTACH_QUEUE 0x0200
>>>>> +#define IFF_DETACH_QUEUE 0x0400
>>>>
>>>> Do we really want to duplicate those things which has been exposed 
>>>> by uapi here?
>>>
>>> You mean those defined in <linux/if_tun.h>? Redefine those common 
>>> macros, or include standard header file, with respective pros and 
>>> cons. DPDK prefers the redefinition way as far as I understand, 
>>> doesn't it?
>>>
>>
>> Well, if you really want to do this, you may want to use an 
>> independent file. Then you can sync it with linux headers with a bash 
>> script.
>
> Agreed!
>
> [...]
>>>>> +static struct vhost_memory_kernel *
>>>>> +prepare_vhost_memory_kernel(void)
>>>>> +{
>>>>> +    uint32_t i, j, k = 0;
>>>>> +    struct rte_memseg *seg;
>>>>> +    struct vhost_memory_region *mr;
>>>>> +    struct vhost_memory_kernel *vm;
>>>>> +
>>>>> +    vm = malloc(sizeof(struct vhost_memory_kernel) +
>>>>> +            VHOST_KERNEL_MAX_REGIONS *
>>>>> +            sizeof(struct vhost_memory_region));
>>>>> +
>>>>> +    for (i = 0; i < RTE_MAX_MEMSEG; ++i) {
>>>>> +        seg = &rte_eal_get_configuration()->mem_config->memseg[i];
>>>>> +        if (!seg->addr)
>>>>> +            break;
>>>>
>>>> If we're sure the number of regions is less than 64(or the module 
>>>> parameter read from /sys), can we avoid the iteration here?
>>>
>>> The "if" statement under the "for" statement can save us from all 
>>> RTE_MAX_MEMSEG iteration.
>>
>> But if we know the number of regions is short than the limit, there's 
>> even no need for this?
>
> The problem is: we don't have a variable to know how many segments are 
> there in DPDK. Anyway, we need to report error in the situation that # 
> of segments > # of regions. Or can you give a more specific example?
>
> [...]

If we don't know #segments, I'm fine with current code.

>>>>> +    if (dev->ifname)
>>>>> +        strncpy(ifr.ifr_name, dev->ifname, IFNAMSIZ);
>>>>> +    else
>>>>> +        strncpy(ifr.ifr_name, "tap%d", IFNAMSIZ);
>>>>> +    if (ioctl(tapfd, TUNSETIFF, (void *)&ifr) == -1) {
>>>>> +        PMD_DRV_LOG(ERR, "TUNSETIFF failed: %s", strerror(errno));
>>>>> +        goto error;
>>>>> +    }
>>>>
>>>> This requires CAP_NET_ADMIN, so we should really consider to accept 
>>>> a pre-created fd here.
>>>
>>> It sounds like a future work for me. So far, all DPDK apps are 
>>> running in privileged mode (including CAP_NET_ADMIN?).
>>>
>>
>> That's not safe. 
>
> Yes.
>
>> Accepting a per-created fd can solve this, and can even support macvtap
>
> So you are proposing a way to specify the virtio_user parameter like, 
> --vdev=virtio_user,tapfd=fd1,xxx, right? As replied in your another 
> comment, it's a great way to go. But I'm afraid we have no time to 
> implement that in this cycle.
>
> Thanks,
> Jianfeng

Ok, just want to show its advantages. It can be added on top.

And two more suggestions:
- better to split tap support out of vhost file
- kernel support more than 8 queues on recent kernel, so there's no need 
to limit it to 8. When running on old kernel, TUNSETIFF will fail which 
can give user a hint to reduce #queues.

Thanks




More information about the dev mailing list