[dpdk-dev] [RFC PATCH 0/4] pktdev

Wiles, Keith keith.wiles at intel.com
Mon Apr 20 15:30:53 CEST 2015



On 4/20/15, 8:19 AM, "Wiles, Keith" <keith.wiles at intel.com> wrote:

>
>
>From: Marc Sune <marc.sune at bisdn.de<mailto:marc.sune at bisdn.de>>
>Date: Monday, April 20, 2015 at 1:51 AM
>To: Keith Wiles <keith.wiles at intel.com<mailto:keith.wiles at intel.com>>,
>"dev at dpdk.org<mailto:dev at dpdk.org>" <dev at dpdk.org<mailto:dev at dpdk.org>>
>Subject: Re: [dpdk-dev] [RFC PATCH 0/4] pktdev
>
>
>
>On 17/04/15 21:50, Wiles, Keith wrote:
>
>Hi Marc and Bruce,
>
>Hi Keith, Bruce,
>
>
>On 4/17/15, 1:49 PM, "Marc Sune"
><marc.sune at bisdn.de><mailto:marc.sune at bisdn.de> wrote:
>
>
>
>On 17/04/15 17:16, Bruce Richardson wrote:
>
>
>Hi all,
>
>to continue this discussion a bit more, here is my, slightly different,
>slant
>on what a pktdev abstraction may look like.
>
>The primary objective I had in mind when drafting this is to provide the
>minimal abstraction that can be *easily* used as a common device
>abstraction for
>existing (and future) device types to be passed to dataplane code. The
>patchset
>demonstrates this by defining a minimal interface for pktdev - since I
>firmly
>believe the interface should be as small as possible - and then showing
>how that
>common interface can be used to unify rings and ethdevs under a common
>API for the
>datapath. I believe any attempt to unify things much beyond this to the
>control
>plane or setup phase is not worth doing - at least not initially - as at
>init time the code always needs to be aware of the underlying resource
>type in
>order to configure it properly for dataplane use.
>
>The overall objective I look to achieve is illustrated by the final
>patch in
>the series, which is a sample app where the same code is used for all
>cores,
>irrespective of the underlying device type.
>
>To get to that point, patch 1 defines the minimal API - just RX and TX.
>The .c
>file in the library is empty for simplicity, though I would see some
>functionality moving there when/if it makes sense e.g. the callback
>support
>from ethdev, as is done in Keith's patchset.
>Patch 2 then makes very minimal changes to ethdev to allow ethdevs to
>be used
>as pktdevs, and to make use of the pktdev functions when appropriate
>Patch 3 was, for me, the key test for this implementation - how hard
>was it to
>make an rte_ring usable as a pktdev too. Two single-line functions for
>RX/TX
>and a separate "converter" function proved to be all that was necessary
>here -
>and I believe simpler solutions may be possible too, as the extra
>structures
>allocated on conversion could be merged into the rte_ring structure
>itself and
>initialized on ring creation if we prefer that option. It is
>hoped/presumed that
>wrapping other structures, such as KNI, may prove to be just as easily
>done.
>[Not attempted yet - left as an exercise for the reader :-)].
>
>Now, in terms of pktdev vs ethdev, there is nothing in this proposal
>that
>cannot also be done using ethdev AFAIK. However, pktdev as outlined here
>should make the process far easier than trying to create a full PMD for
>something.
>All NIC specific functions, including things like stop/start, are
>stripped out,
>as they don't make sense for an rte_ring or other software objects.
>Also, the other thing this provides is that we can move away from just
>using
>port ids. Instead in the same way as we now reference
>rings/mempools/KNIs etc
>via pointer, we can do the same with ethernet ports as pktdevs on the
>data path.
>There was discussion previously on moving beyond 8-bit port ids. If we
>look to
>use ethdev as a common abstraction, I feel that change will soon have
>to be made
>causing a large amount of code churn.
>
>
>Hi Richard,
>
>First thank you both for taking the time to look at this. I did not not
>reply to Keith because you Richard summarized most of my concerns.
>
>I had a brief look to this second proposal. It is more aligned to what I
>had in mind. But still I feel it is slightly too complicated. I don't
>like much the necessary (in your approach) MACRO-like pkt_dev_data
>struct. It is also slightly inconvenient that the user has to do:
>
>+       struct rte_pkt_dev *in = rte_eth_get_dev(0);
>
>+               struct rte_pkt_dev *out = rte_ring_get_dev(
>+                               rte_ring_create(name, 4096,
>rte_socket_id(), 0));
>
>
>
>What about something like (~pseudo-code):
>
>rte_pkt_dev_data.h:
>
>   enum rte_pkt_dev_type{
>        RTE_PKT_DEV_ETH,
>        RTE_PKT_DEV_RING,
>        RTE_PKT_DEV_KNI,
>        //Keep adding as more PMDs are supported
>   };
>
>
>   //This struct may be redundant if there is nothing more
>   struct rte_pkt_dev_data{
>        enum rte_pkt_dev_type;
>        //Placeholder, maybe we need more...
>   };
>
>   //Make RX/TX pktdev APIs more readable, but not really needed
>   typedef void pkt_dev_t;
>
>(In all PMDs and e.g. KNI and RINGs):
>
> struct rte_eth_dev {
>    struct rte_pkt_dev_data pkt_dev;//
><++++++++++++++++++++++++++++++++++
>    eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
>    eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function.
>*/
>    struct rte_eth_dev_data *data;  /**< Pointer to device data */
>    /...
>
>rte_pkt_dev.h:
>
>      #include <rte_ethdev.h>
>      //Include PMD (and non-PMD) TX/RX headers...
>
>   static inline uint16_t
>   rte_pkt_tx_burst(pkt_dev_t* dev, uint16_t queue_id,
>             struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>   {
>        switch (((struct rte_pkt_dev_data*)dev)->type){
>            case  RTE_PKT_DEV_ETH:
>                struct rte_eth_dev* eth_dev = (struct
>rte_eth_dev*)pkt_dev;
>                rte_pkt_tx_burst(eth_dev, queue_id, tx_pkts, nb_pkts);
>                break;
>            case RTE_PKT_DEV_RING:
>                //...
>        }
>   }
>
>   //...
>
>
>I do not see the functional different from my proposal other then a bit
>more search/replace, which does effect some of the PMD¹s. The changes to
>the PMD is just to use the macros as we now have a nested structure.
>
>I think what I am proposing is different. See below.
>
Start of Keith's comment.
>
>The nesting and movement some structures and code to a common location is
>to provide a better way to add other devices. Moving code into a common
>set of structures and APIs should only improve the code as we get more
>devices added to DPDK. The basic flow and design is the same.
End of Keith’s comment.
>
>In your proposal you are movig TX/RX queues as well as some shared stuff
>and, most importantly, you are using function pointers to execute the
>RX/TX routines.
>
>What I was proposing is to try to add the minimum common shared state in
>order to properly demultiplex the RX/TX call and have a common set of
>abstract calls (the pkt_dev type). In a way, I was proposing to
>deliberately not have a shared struct rte_dev_data because I think the
>internals of the "pkt_dev" can be very different across devices (e.g.
>queues in kni vs eth port vs. crypto?). I treat the pkt_dev as a "black
>box" that conforms to TX/RX API, leaving the developer of that device to
>define its internal structures as it better suites the needs. I only use
>each of the specific device type TX/RX APIs (external to us, pkt_dev
>library) in rte_pkt_dev.h. This also simplifies the refactor required to
>eventually integrate the rte_pkt_dev library and builds it "on top" of
>the existing APIs.
>
>The other important difference with both, Bruce and your approach, and
>mine is the use of function pointers for RX/TX. I don't use them, which
>makes the entire abstracted TX/RX (including the final RX/TX routines
>itself) functions be "inlinable".

Making something inlinable is not based on using a pointer or not, is that
what you mean?
>
>Btw, I forgot to add something basic in the previous pseudo-code. The
>different types have to be conditionally compiled according to
>compiled-in DPDK libs:
>
>rte_pkt_dev.h:
>
>    #include <rte_config.h>
>
>    //Eth devices
>    #ifdef RTE_LIBRTE_ETHER
>    #include <rte_ethdev.h>
>    #endif
>
>    //KNI
>    #ifdef RTE_LIBRTE_KNI
>    #include <rte_kni.h>
>    #endif
>
>    //...
>    //Include PMD (and non-PMD) TX/RX headers...
>
>   static inline uint16_t
>   rte_pkt_tx_burst(pkt_dev_t* dev, uint16_t queue_id,
>             struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>   {
>        switch (((struct rte_pkt_dev_data*)dev)->type){
>            #ifdef RTE_LIBRTE_ETHER
>            case  RTE_PKT_DEV_ETH:
>                struct rte_eth_dev* eth_dev = (struct
>rte_eth_dev*)pkt_dev;
>                rte_pkt_tx_burst(eth_dev, queue_id, tx_pkts, nb_pkts);
>                break;
>            #endif
>
>            #ifdef RTE_LIBRTE_KNI
>            case RTE_PKT_DEV_KNI:
>                //...
>                break;
>            #endif
>
>            default:
>                    //Corrupted type or unsupported (without compiled
>support)
>                    //Ignore or fail(fatal error)?
>                    break;
>        }
>   }
>
>   //...
>
>This adds a switch into the performance path, not a great solution IMO.
>
>
>From rte_common_device.h (format screwed with cut/paste and not all of the
>code)
>
>
>struct rte_dev_data {
>        char name[RTE_DEV_NAME_MAX_LEN]; /**< Unique identifier name */
>
>        void    **rx_queues;            /**< Array of pointers to RX
>queues. */
>        void    **tx_queues;            /**< Array of pointers to TX
>queues. */
>
>        uint16_t nb_rx_queues;          /**< Number of RX queues. */
>        uint16_t nb_tx_queues;          /**< Number of TX queues. */
>
>        uint16_t mtu;                   /**< Maximum Transmission Unit. */
>        uint8_t  unit_id;               /**< Unit/Port ID for this
>instance */
>        uint8_t  _pad0;                 /* alignment filler */
>
>        void    *dev_private;           /**< PMD-specific private data */
>        uint64_t rx_mbuf_alloc_failed;  /**< RX ring mbuf allocation
>failures. */
>        uint32_t min_rx_buf_size;       /**< Common rx buffer size
>handled by all
>queues */
>};
>
>struct rte_dev_info {
>        struct rte_pci_device   *pci_dev;   /**< Device PCI information.
>*/
>        const char      *driver_name;           /**< Device Driver name.
>*/
>        unsigned int    if_index;               /**< Index to bound host
>interface, or 0
>if none. */
>        /* Use if_indextoname() to translate into an interface name. */
>        unsigned int    _pad0;
>};
>
>
>
>struct rte_dev {
>        dev_rx_burst_t          rx_pkt_burst;   /**< Pointer to PMD
>receive
>function. */
>        dev_tx_burst_t          tx_pkt_burst;   /**< Pointer to PMD
>transmit
>function. */
>        void                                    *data;          /**<
>Pointer to device data */
>        struct rte_dev_global   *gbl_data;      /**< Pointer to device
>global data
>*/
>        const struct rte_dev_drv *driver;       /**< Driver for this
>device */
>        void                                    *dev_ops;       /**<
>Functions exported by PMD */
>        struct rte_pci_device   *pci_dev;       /**< PCI info. supplied by
>probing */
>        struct rte_dev_info             *dev_info;              /**<
>Pointer to the device info
>structure */
>
>        /** User application callback for interrupts if present */
>        struct rte_dev_cb_list  link_intr_cbs;
>        /**
>        * User-supplied functions called from rx_burst to post-process
>        * received packets before passing them to the user
>        */
>        struct rte_dev_rxtx_callback **post_rx_burst_cbs;
>        /**
>        * User-supplied functions called from tx_burst to pre-process
>        * received packets before passing them to the driver for
>transmission.
>        */
>        struct rte_dev_rxtx_callback **pre_tx_burst_cbs;
>
>What is the indended use of these callbacks and what is the benefit? If
>the user is supplying the callback, why not putting it directly after the
>RX and before the TX call in user's code (anyway)?

Keith’s Comment.
>
>Just moving the callback routines next to the Rx/Tx routine pointers does
>not make a lot of different in this structure, right?
Keith’s comment end.
>
>
>
>        enum rte_dev_type               dev_type;       /**< Flag
>indicating the device type */
>        uint8_t                 attached;        /**< Flag indicating the
>port is
>attached */
>};
>
>From rte_ethdev.h
>
>struct rte_eth_dev_info {
>        struct rte_dev_info     di;     /**< Common device information */
>
>        /* Rest of structure is for private device data */
>        uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
>        uint32_t max_rx_pktlen; /**< Maximum configurable length of RX
>pkt. */
>        uint16_t max_rx_queues; /**< Maximum number of RX queues. */
>        uint16_t max_tx_queues; /**< Maximum number of TX queues. */
>        uint32_t max_mac_addrs; /**< Maximum number of MAC addresses. */
>        uint32_t max_hash_mac_addrs;
>        /** Maximum number of hash MAC addresses for MTA and UTA. */
>        uint16_t max_vfs; /**< Maximum number of VFs. */
>        uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
>        uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
>        uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
>        uint16_t reta_size;
>        /**< Device redirection table size, the total number of entries.
>*/
>        /** Bit mask of RSS offloads, the bit offset also means flow type
>*/
>        uint64_t flow_type_rss_offloads;
>        struct rte_eth_rxconf default_rxconf; /**< Default RX
>configuration */
>        struct rte_eth_txconf default_txconf; /**< Default TX
>configuration */
>        uint16_t vmdq_queue_base; /**< First queue ID for VMDQ pools. */
>        uint16_t vmdq_queue_num; /**< Queue number for VMDQ pools. */
>        uint16_t vmdq_pool_base; /**< First ID of VMDQ pools. */
>};
>
>struct rte_eth_dev_data {
>        struct rte_dev_data             dd;                     /**<
>Common device data */
>
>        /* Rest of the structure is private data */
>        struct rte_eth_dev_sriov sriov;    /**< SRIOV data */
>
>        struct rte_eth_link dev_link;
>        /**< Link-level information & status */
>
>        struct rte_eth_conf dev_conf;   /**< Configuration applied to
>device. */
>        struct ether_addr* mac_addrs;   /**< Device Ethernet Link
>address. */
>        uint64_t mac_pool_sel[ETH_NUM_RECEIVE_MAC_ADDR];
>        /** bitmap array of associating Ethernet MAC addresses to pools */
>        struct ether_addr* hash_mac_addrs;
>
>        /** Device Ethernet MAC addresses of hash filtering. */
>        uint8_t promiscuous  : 1,   /**< RX promiscuous mode ON(1) /
>OFF(0). */
>                scattered_rx : 1,       /**< RX of scattered packets is
>ON(1) / OFF(0) */
>                all_multicast: 1,   /**< RX all multicast mode ON(1) /
>OFF(0). */
>                dev_started  : 1;   /**< Device state: STARTED(1) /
>STOPPED(0). */
>};
>
>In the case of rte_ethdev.h the rte_eth_dev turns into a macro '#define
>rte_eth_dev rte_dev', in other devices they may have a rte_xyz_dev
>structure rte_dev as the first member and then private members. Also later
>if we find something private to ethdev then a structure can be created.
>
>
>Most of the code changes outside of these changes are because using a
>macro to fix the code was easier. Rewriting the functions to use real
>variables instead of casting void pointer could yield without the macros.
>Not sure as the performance impact in rewriting some of these functions to
>eliminate the macros.
>
>This is the heart of what I am proposing and it just so happens to allow
>Bruce¹ ideas as well. I have another patch I have not sent that includes
>Bruce¹s ideas using my suggestions.
>
>Being able to use a single API for all devices does have its benefits IMO,
>but I want to cleanup ethdev to allow for other devices to be added to the
>system.
>
>At some point (when the cryptodev is done) I want to add it to this
>framework, plus add more APIs for crypto similar to Linux Crypto API or
>something we can agree is a reasonable set of APIs for DPDK.
>
>I currently don't see the benefit of abstracting beyond the RX/TX
>functions, and I see more cons than pros. Doing so we will also make
>librte_pkt_dev sort of mandatory, even if you are not using it our
>abstracted RX/TX functions at all.

Keith Comment.
>
>I do not see why we would not pull these common items out. Can you list
>your cons as I do not see your point here.
Keith Comment
>
>
>regards
>marc
>
>
>
>Regards,
>
>++Keith
>
>
>Maybe it is oversimplified? With this approach tough, we would barely
>touch the PMDs and the functionality can be built on top of the existing
>PMDs.
>
>Thoughts?
>Marc
>
>
>
>
>Bruce Richardson (4):
>   Add example pktdev implementation
>   Make ethdev explicitly a subclass of pktdev
>   add support for a ring to be a pktdev
>   example app showing pktdevs used in a chain
>
>  config/common_bsdapp           |   5 +
>  config/common_linuxapp         |   5 +
>  examples/pktdev/Makefile       |  57 +++++++++++
>  examples/pktdev/basicfwd.c     | 222
>+++++++++++++++++++++++++++++++++++++++++
>  lib/Makefile                   |   1 +
>  lib/librte_ether/rte_ethdev.h  |  26 ++---
>  lib/librte_pktdev/Makefile     |  56 +++++++++++
>  lib/librte_pktdev/rte_pktdev.c |  35 +++++++
>  lib/librte_pktdev/rte_pktdev.h | 144 ++++++++++++++++++++++++++
>  lib/librte_ring/rte_ring.c     |  41 ++++++++
>  lib/librte_ring/rte_ring.h     |   3 +
>  11 files changed, 582 insertions(+), 13 deletions(-)
>  create mode 100644 examples/pktdev/Makefile
>  create mode 100644 examples/pktdev/basicfwd.c
>  create mode 100644 lib/librte_pktdev/Makefile
>  create mode 100644 lib/librte_pktdev/rte_pktdev.c
>  create mode 100644 lib/librte_pktdev/rte_pktdev.h
>
>
>



More information about the dev mailing list