[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

Tetsuya Mukawa mukawa at igel.co.jp
Thu Dec 24 08:58:50 CET 2015


On 2015/12/24 14:37, Rich Lane wrote:
> On Wed, Dec 23, 2015 at 7:09 PM, Tetsuya Mukawa <mukawa at igel.co.jp> wrote:
>
>> On 2015/12/22 13:47, Rich Lane wrote:
>>> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <
>> yuanhan.liu at linux.intel.com>
>>> wrote:
>>>
>>>> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
>>>>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
>>>> in a few
>>>>> ways:
>>>> Rich, thanks for the info!
>>>>
>>>>> 1. new_device/destroy_device: Link state change (will be covered by the
>>>> link
>>>>> status interrupt).
>>>>> 2. new_device: Add first queue to datapath.
>>>> I'm wondering why vring_state_changed() is not used, as it will also be
>>>> triggered at the beginning, when the default queue (the first queue) is
>>>> enabled.
>>>>
>>> Turns out I'd misread the code and it's already using the
>>> vring_state_changed callback for the
>>> first queue. Not sure if this is intentional but vring_state_changed is
>>> called for the first queue
>>> before new_device.
>>>
>>>
>>>>> 3. vring_state_changed: Add/remove queue to datapath.
>>>>> 4. destroy_device: Remove all queues (vring_state_changed is not called
>>>> when
>>>>> qemu is killed).
>>>> I had a plan to invoke vring_state_changed() to disable all vrings
>>>> when destroy_device() is called.
>>>>
>>> That would be good.
>>>
>>>
>>>>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>>>> You can get the 'struct virtio_net' dev from all above callbacks.
>>>
>>>> 1. Link status interrupt.
>>>>
>>>> To vhost pmd, new_device()/destroy_device() equals to the link status
>>>> interrupt, where new_device() is a link up, and destroy_device() is link
>>>> down().
>>>>
>>>>
>>>>> 2. New queue_state_changed callback. Unlike vring_state_changed this
>>>> should
>>>>> cover the first queue at new_device and removal of all queues at
>>>>> destroy_device.
>>>> As stated above, vring_state_changed() should be able to do that, except
>>>> the one on destroy_device(), which is not done yet.
>>>>
>>>>> 3. Per-queue or per-device NUMA node info.
>>>> You can query the NUMA node info implicitly by get_mempolicy(); check
>>>> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
>>>>
>>> Your suggestions are exactly how my application is already working. I was
>>> commenting on the
>>> proposed changes to the vhost PMD API. I would prefer to
>>> use RTE_ETH_EVENT_INTR_LSC
>>> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
>>> of these vhost-specific
>>> hacks. The queue state change callback is the one new API that needs to
>> be
>>> added because
>>> normal NICs don't have this behavior.
>>>
>>> You could add another rte_eth_event_type for the queue state change
>>> callback, and pass the
>>> queue ID, RX/TX direction, and enable bit through cb_arg.
>> Hi Rich,
>>
>> So far, EAL provides rte_eth_dev_callback_register() for event handling.
>> DPDK app can register callback handler and "callback argument".
>> And EAL will call callback handler with the argument.
>> Anyway, vhost library and PMD cannot change the argument.
>>
> You're right, I'd mistakenly thought that the PMD controlled the void *
> passed to the callback.
>
> Here's a thought:
>
>     struct rte_eth_vhost_queue_event {
>         uint16_t queue_id;
>         bool rx;
>         bool enable;
>     };
>
>     int rte_eth_vhost_get_queue_event(uint8_t port_id, struct
> rte_eth_vhost_queue_event *event);
>
> On receiving the ethdev event the application could repeatedly call
> rte_eth_vhost_get_queue_event
> to find out what happened.

Hi Rich and Yuanhan,

I guess we have 2 implementations here.

1. rte_eth_vhost_get_queue_event() returns each event.
2. rte_eth_vhost_get_queue_status() returns current status of the queues.

I guess option "2" is more generic manner to handle interrupts from
device driver.
In the case of option "1", if DPDK application doesn't call
rte_eth_vhost_get_queue_event(), the vhost PMD needs to keep all events.
This may exhaust memory.

One more example is current link status interrupt handling.
Actually ethdev API just returns current status of the port.
What do you think?

>
> An issue with having the application dig into struct virtio_net is that it
> can only be safely accessed from
> a callback on the vhost thread.

Here is one of example how to invoke a callback handler registered by
DPDK application from the PMD.

  _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC);

Above function is called by interrupt handling thread of the PMDs.

Please check implementation of above function.
A callback handler that DPDK application registers is called in
"interrupt handling context".
(I mean the interrupt handling thread of the PMD calls the callback
handler of DPDK application also.)
Anyway, I guess the callback handler of DPDK application can access to
"struct virtio_net" safely.

> A typical application running its control
> plane on lcore 0 would need to
> copy all the relevant info from struct virtio_net before sending it over.

Could you please describe it more?
Sorry, probably I don't understand correctly which restriction make you
copy data.
(As described above, the callback handler registered by DPDK application
can safely access "to struct virtio_net". Does this solve the copy issue?)

> As you mentioned, queues for a single vhost port could be located on
> different NUMA nodes. I think this
> is an uncommon scenario but if needed you could add an API to retrieve the
> NUMA node for a given port
> and queue.
>

I agree this is very specific for vhost, because in the case of generic
PCI device, all queues of a port are on same NUMA node.
Anyway, because it's very specific for vhost, I am not sure we should
add ethdev API to handle this.

If we handle it by vhost PMD API, we probably have 2 options also here.

1. Extend "struct rte_eth_vhost_queue_event , and use
rte_eth_vhost_get_queue_event() like you described.
struct rte_eth_vhost_queue_event
{
        uint16_t queue_id;
        bool rx;
        bool enable;
+      int socket_id;
};

2. rte_eth_vhost_get_queue_status() returns current socket_ids of all
queues.

Tetsuya


More information about the dev mailing list