[dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring

Billy McFall bmcfall at redhat.com
Tue Dec 20 15:15:50 CET 2016


Thank you for your responses, see inline.

On Tue, Dec 20, 2016 at 7:58 AM, Adrien Mazarguil
<adrien.mazarguil at 6wind.com> wrote:
> On Tue, Dec 20, 2016 at 12:17:10PM +0000, Ananyev, Konstantin wrote:
>>
>>
>> > -----Original Message-----
>> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Adrien Mazarguil
>> > Sent: Tuesday, December 20, 2016 11:28 AM
>> > To: Billy McFall <bmcfall at redhat.com>
>> > Cc: thomas.monjalon at 6wind.com; Lu, Wenzhuo <wenzhuo.lu at intel.com>; dev at dpdk.org; Stephen Hemminger
>> > <stephen at networkplumber.org>
>> > Subject: Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
>> >
>> > Hi Billy,
>> >
>> > On Fri, Dec 16, 2016 at 07:48:49AM -0500, Billy McFall wrote:
>> > > Add a new API to force free consumed buffers on TX ring. API will return
>> > > the number of packets freed (0-n) or error code if feature not supported
>> > > (-ENOTSUP) or input invalid (-ENODEV).
>> > >
>> > > Because rte_eth_tx_buffer() may be used, and mbufs may still be held
>> > > in local buffer, the API also accepts *buffer and *sent. Before
>> > > attempting to free, rte_eth_tx_buffer_flush() is called to make sure
>> > > all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
>> > > if threshold is not met.
>> > >
>> > > Signed-off-by: Billy McFall <bmcfall at redhat.com>
>> > > ---
>> > >  lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
>> > >  1 file changed, 56 insertions(+)
>> > >
>> > > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
>> > > index 9678179..e3f2be4 100644
>> > > --- a/lib/librte_ether/rte_ethdev.h
>> > > +++ b/lib/librte_ether/rte_ethdev.h
>> > > @@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
>> > >  typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
>> > >  /**< @internal Check DD bit of specific RX descriptor */
>> > >
>> > > +typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
>> > > +/**< @internal Force mbufs to be from TX ring. */
>> > > +
>> > >  typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
>> > >   uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
>> > >
>> > > @@ -1467,6 +1470,7 @@ struct eth_dev_ops {
>> > >   eth_rx_disable_intr_t      rx_queue_intr_disable;
>> > >   eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
>> > >   eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
>> > > + eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
>> > >   eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
>> > >   eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
>> > >   flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
>> > > @@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
>> > >  }
>> > >
>> > >  /**
>> > > + * Request the driver to free mbufs currently cached by the driver. The
>> > > + * driver will only free the mbuf if it is no longer in use.
>> > > + *
>> > > + * @param port_id
>> > > + *   The port identifier of the Ethernet device.
>> > > + * @param queue_id
>> > > + *   The index of the transmit queue through which output packets must be
>> > > + *   sent.
>> > > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
>> > > + *   to rte_eth_dev_configure().
>> > > + * @param free_cnt
>> > > + *   Maximum number of packets to free. Use 0 to indicate all possible packets
>> > > + *   should be freed. Note that a packet may be using multiple mbufs.
>> > > + * @param buffer
>> > > + *   Buffer used to collect packets to be sent. If provided, the buffer will
>> > > + *   be flushed, even if the current length is less than buffer->size. Pass NULL
>> > > + *   if buffer has already been flushed.
>> > > + * @param sent
>> > > + *   Pointer to return number of packets sent if buffer has packets to be sent.
>> > > + *   If *buffer is supplied, *sent must also be supplied.
>> > > + * @return
>> > > + *   Failure: < 0
>> > > + *     -ENODEV: Invalid interface
>> > > + *     -ENOTSUP: Driver does not support function
>> > > + *   Success: >= 0
>> > > + *     0-n: Number of packets freed. More packets may still remain in ring that
>> > > + *     are in use.
>> > > + */
>> > > +
>> > > +static inline int
>> > > +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
>> > > +         struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
>> > > +{
>> > > + struct rte_eth_dev *dev = &rte_eth_devices[port_id];
>> > > +
>> > > + /* Validate Input Data. Bail if not valid or not supported. */
>> > > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>> > > + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
>> > > +
>> > > + /*
>> > > +  * If transmit buffer is provided and there are still packets to be
>> > > +  * sent, then send them before attempting to free pending mbufs.
>> > > +  */
>> > > + if (buffer && sent)
>> > > +         *sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
>> > > +
>> > > + /* Call driver to free pending mbufs. */
>> > > + return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
>> > > +                 free_cnt);
>> > > +}
>> > > +
>> > > +/**
>> > >   * Configure a callback for buffered packets which cannot be sent
>> > >   *
>> > >   * Register a specific callback to be called when an attempt is made to send
>> >

I will remove the buffer/sent parameters. It will be the applications
responsibility
to make sure rte_eth_tx_buffer_flush() is called.

I don't feel strongly about the free_cnt parameter. It was in the
original request
so that if there was a large ring buffer, the API could bail early
without having
to go through all the entire ring. It might be a little unrealistic
for the application
to truly know how many mbufs it wants freed. Also, as an example, the I40e
driver already has a i40e_tx_free_bufs(...) function, so by dropping
the free_cnt
parameter, this function could be reused without having to account for
the free_cnt.

>> > Just a thought to follow-up on Stephen's comment to further simplify this
>> > API, how about not adding any new eth_dev_ops but instead defining what
>> > should happen during an empty TX burst call (tx_burst() with 0 packets).
>> >

In the original API request thread, see dpdk-dev mailing list from 11/21/2016
with subject "Adding API to force freeing consumed buffers in TX ring",
overloading the existing API with nb_pkts == 0 was suggested and consensus
was to go with new API. I lean towards a new API since this is a special case
most applications won't use, but I will go with the community on whether to
enhance the existing burst functionality or add a new API.

>> > Several PMDs already have a check for this scenario and start by cleaning up
>> > completed packets anyway, they effectively partially implement this
>> > definition for free already.
>>
>> Many PMDs  start by cleaning up only when number of free entries
>> drop below some point.

True, but the original request for this API was for the scenario where packets
are being flooded and the application wanted to reuse mbuf to avoid a packet
copy. So the API was to request the driver to free "done" mbufs outside of any
threshold.

>> Also in that case the author would have to modify (and test) all existing TX routinies.
>> So I think a separate API call seems more plausible.
>
> Not necessarily, as I understand this API in its current form only suggests
> that a PMD should release a few mbufs from a queue if possible, without any
> guarantee, PMDs are not forced to comply.
>
> I think the threshold you mention is a valid reason not to release them, and
> it wouldn't change a thing to existing tx_burst() implementations in the
> meantime (only documentation).
>
> This threshold could also be bypassed rather painlessly in the
> "if (unlikely(nb_pkts == 0))" case that all PMDs already check for in a
> way or another.
>
>> Though I am agree with previous comment from Stephen that last two parameters
>> are redundant and would just overcomplicate things.
>> tin
>>
>> >
>> > The main difference with this API would be that you wouldn't know how many
>> > mbufs were freed and wouldn't collect them into an array. However most
>> > applications have one mbuf pool and/or know where they come from, so they
>> > can just query the pool or attempt to re-allocate from it after doing empty
>> > bursts in case of starvation.
>> >
>> > [1] http://dpdk.org/ml/archives/dev/2016-December/052469.html
>
> --
> Adrien Mazarguil
> 6WIND


More information about the dev mailing list