[dpdk-dev] [RFC PATCH 0/6] Restructure EAL device model for bus support

Shreyansh Jain shreyansh.jain at nxp.com
Wed Nov 23 10:45:02 CET 2016


I should have replied to this earlier, apologies.

On Sunday 20 November 2016 09:00 PM, David Marchand wrote:
> On Thu, Nov 17, 2016 at 6:29 AM, Shreyansh Jain <shreyansh.jain at nxp.com> wrote:
>> DPDK has been inherently a PCI inclined framework. Because of this, the
>> design of device tree (or list) within DPDK is also PCI inclined. A non-PCI
>> device doesn't have a way of being expressed without using hooks started from
>> EAL to PMD.
>>
>> With this cover letter, some patches are presented which try to break this
>> strict linkage of EAL with PCI devices. Aim is to generalize the device
>> hierarchy on the lines of how Linux handles it:
>>
>>         device A1
>>           |
>>   +==.===='==============.============+ Bus A.
>>      |                    `--> driver A11     \
>>   device A2                `-> driver A12      \______
>>                                                 |CPU |
>>                                                 /`````
>>         device A1                              /
>>           |                                   /
>>   +==.===='==============.============+ Bus A`
>>      |                    `--> driver B11
>>   device A2                `-> driver B12
>>
>> Simply put:
>>  - a bus is connect to CPU (or core)
>>  - devices are conneted to Bus
>>  - drivers are running instances which manage one or more devices
>>  - bus is responsible for identifying devices (and interrupt propogation)
>>  - driver is responsible for initializing the device
>>
>> (*Reusing text from email [1])
>> In context of DPDK EAL:
>>  - a generic bus (not a driver, not a device). I don't know how to categorize
>>    a bus. It is certainly not a device, and then handler for a bus (physical)
>>    can be considered a 'bus driver'. So, just 'rte_bus'.
>>  - there is a bus for each physical implementation (or virtual). So, a rte_bus
>>    Object for PCI, VDEV, ABC, DEF and so on.
>>  - Buses are registered just like a PMD - RTE_PMD_BUS_REGISTER()
>>  - Each registered bus is part of a doubly list.
>>    -- Each device inherits rte_bus
>>    -- Each driver inherits rte_bus
>>    -- Device and Drivers lists are part of rte_bus
>>  - eth_driver is no more required - it was just a holder for PMDs to register
>>    themselves. It can be replaced with rte_xxx_driver and corresponding init/
>>    uninit moved to rte_driver
>>  - rte_eth_dev modified to disassociate itself from rte_pci_device and connect
>>    to generic rte_device
>>
>> Once again, improvising from [1]:
>>
>>                                   __ rte_bus_list
>>                                  /
>>                      +----------'---+
>>                      |rte_bus       |
>>                      | driver_list------> List of rte_bus specific
>>                      | device_list----    devices
>>                      | scan         | `-> List of rte_bus associated
>>                      | match        |     drivers
>>                      | dump         |
>>                      | ..some refcnt| (#)
>>                      +--|------|----+
>>               _________/        \_________
>>     +--------/----+                     +-\---------------+
>>     |rte_device   |                     |rte_driver       |
>>     | rte_bus     |                     | rte_bus         |
>>     | rte_driver  |(#)                  | init            |
>>     |             |                     | uninit          |
>>     |  devargs    |                     | dev_private_size|
>>     +---||--------+                     | drv_flags       |(#)
>>         ||                              | intr_handle(2*) |(#)
>>         | \                             +----------\\\----+
>>         |  \_____________                           \\\
>>         |                \                          |||
>>  +------|---------+ +----|----------+               |||
>>  |rte_pci_device  | |rte_xxx_device | (4*)          |||
>>  | PCI specific   | | xxx device    |               |||
>>  | info (mem,)    | | specific fns  |              / | \
>>  +----------------+ +---------------+             /  |  \
>>                             _____________________/  /    \
>>                            /                    ___/      \
>>             +-------------'--+    +------------'---+    +--'------------+
>>             |rte_pci_driver  |    |rte_vdev_driver |    |rte_xxx_driver |
>>             | PCI id table,  |    | <probably,     |    | ....          |
>>             | other driver   |    |  nothing>      |    +---------------+
>>             | data           |    +----------------+
>>             +----------------+
>>
>>     (1*) Problem is that probe functions have different arguments. So,
>>          generalizing them might be some rework in the respective device
>>          layers
>>     (2*) Interrupt handling for each driver type might be different. I am not
>>          sure how to generalize that either. This is grey area for me.
>>     (3*) Probably exposing a bitmask for device capabilities. Nothing similar
>>          exists now to relate it. Don't know if that is useful. Allowing
>>          applications to question a device about what it supports and what it
>>          doesn't - making it more flexible at application layer (but more code
>>          as well.)
>>     (4*) Even vdev would be an instantiated as a device. It is not being done
>>          currently.
>>     (#)  Items which are still pending
>>
>> With this cover letter, some patches have been posted. These are _only_ for
>> discussion purpose. They are not complete and neither compilable.
>> All the patches, except 0001, have sufficient context about what the changes
>> are and rationale for same. Obviously, code is best answer.
>
> As said by Jan B., I too think that splitting the patches into smaller
> patchsets is important.
>
> Looking at your datamodel, some quick comments :
> - Why init/uninit in rte_driver ? Why not probe/remove ?

No specific reason. At first I wasn't planning to replace driver->init 
with driver->probe (which rte_pci_driver) is doing. But, I will do this 
in v2.

> - I don't think dev_private_size makes sense in rte_driver. The bus is
> responsible for creating rte_device objects (embedded or not in
> rte_$bus_device objects). If a driver needs to allocate some priv (for
> whatever needs), the driver should do the allocation itself (or maybe
> a helper for current eth_driver), then reference the original
> rte_$bus_device object it got from the probe method.

first off, this came as a suggestion on the ML somewhere as far as I 
remember.
Secondly, your point makes sense. I will move it back.

>
>
> For a first patchset, I would see:
> - introduce the rte_bus object. In rte_eal_init, for each bus, we call
> the scan method. Then, for each bus, we find the appropriate
> rte_driver using the bus match method then call the probe method. If
> the probe succeeds, the rte_device points to the associated
> rte_driver,
> - migrate the pci scan code to a pci bus (scan looks at sysfs for
> linux / ioctl for bsd + devargs for blacklist / whitelist ?), match is
> the same at what is done in rte_eal_pci_probe_one_driver() at the
> moment,
> - migrate the vdev init code to a vdev bus (scan looks at devargs):
> this is new, we must create rte_device objects for vdev drivers to use
> later

Thanks for outlay - I agree with that.

>
> Then we can talk about the next steps once the bus is in place.

I will post the new set probably tomorrow or day-after.

>
>

-
Shreyansh


More information about the dev mailing list