[dpdk-dev,1/3] ethdev: New API to free consumed buffers in TX ring

Message ID 20161216124851.2640-2-bmcfall@redhat.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel compilation success Compilation OK

Commit Message

Billy McFall Dec. 16, 2016, 12:48 p.m. UTC
  Add a new API to force free consumed buffers on TX ring. API will return
the number of packets freed (0-n) or error code if feature not supported
(-ENOTSUP) or input invalid (-ENODEV).

Because rte_eth_tx_buffer() may be used, and mbufs may still be held
in local buffer, the API also accepts *buffer and *sent. Before
attempting to free, rte_eth_tx_buffer_flush() is called to make sure
all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
if threshold is not met.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)
  

Comments

Stephen Hemminger Dec. 16, 2016, 4:28 p.m. UTC | #1
On Fri, 16 Dec 2016 07:48:49 -0500
Billy McFall <bmcfall@redhat.com> wrote:

> /**
> + * Request the driver to free mbufs currently cached by the driver. The
> + * driver will only free the mbuf if it is no longer in use.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the transmit queue through which output packets must be
> + *   sent.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param free_cnt
> + *   Maximum number of packets to free. Use 0 to indicate all possible packets
> + *   should be freed. Note that a packet may be using multiple mbufs.
> + * @param buffer
> + *   Buffer used to collect packets to be sent. If provided, the buffer will
> + *   be flushed, even if the current length is less than buffer->size. Pass NULL
> + *   if buffer has already been flushed.
> + * @param sent
> + *   Pointer to return number of packets sent if buffer has packets to be sent.
> + *   If *buffer is supplied, *sent must also be supplied.
> + * @return
> + *   Failure: < 0
> + *     -ENODEV: Invalid interface
> + *     -ENOTSUP: Driver does not support function
> + *   Success: >= 0
> + *     0-n: Number of packets freed. More packets may still remain in ring that
> + *     are in use.
> + */
> +
> +static inline int
> +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
> +		struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)


This API is more complex than it needs to be.
For the typical use case of OOM kind of cleanup, this is overkill.
There is no need for:
  free_cnt - device driver should just free all
  buffer/param - the application should not care.

The DPDK model is that once mbuf's are passed to device, the device "owns"
the mbuf. I think changing that model is just going to break things for
no gain.  It does make sense to have a "please cleanup your mbufs" call.
If application is using special mbuf's then it can use the normal callback
on done model.
  
Adrien Mazarguil Dec. 20, 2016, 11:27 a.m. UTC | #2
Hi Billy,

On Fri, Dec 16, 2016 at 07:48:49AM -0500, Billy McFall wrote:
> Add a new API to force free consumed buffers on TX ring. API will return
> the number of packets freed (0-n) or error code if feature not supported
> (-ENOTSUP) or input invalid (-ENODEV).
> 
> Because rte_eth_tx_buffer() may be used, and mbufs may still be held
> in local buffer, the API also accepts *buffer and *sent. Before
> attempting to free, rte_eth_tx_buffer_flush() is called to make sure
> all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
> if threshold is not met.
> 
> Signed-off-by: Billy McFall <bmcfall@redhat.com>
> ---
>  lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 9678179..e3f2be4 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
>  typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
>  /**< @internal Check DD bit of specific RX descriptor */
>  
> +typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
> +/**< @internal Force mbufs to be from TX ring. */
> +
>  typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
>  	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
>  
> @@ -1467,6 +1470,7 @@ struct eth_dev_ops {
>  	eth_rx_disable_intr_t      rx_queue_intr_disable;
>  	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
>  	eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
> +	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
>  	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
>  	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
>  	flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
> @@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
>  }
>  
>  /**
> + * Request the driver to free mbufs currently cached by the driver. The
> + * driver will only free the mbuf if it is no longer in use.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the transmit queue through which output packets must be
> + *   sent.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param free_cnt
> + *   Maximum number of packets to free. Use 0 to indicate all possible packets
> + *   should be freed. Note that a packet may be using multiple mbufs.
> + * @param buffer
> + *   Buffer used to collect packets to be sent. If provided, the buffer will
> + *   be flushed, even if the current length is less than buffer->size. Pass NULL
> + *   if buffer has already been flushed.
> + * @param sent
> + *   Pointer to return number of packets sent if buffer has packets to be sent.
> + *   If *buffer is supplied, *sent must also be supplied.
> + * @return
> + *   Failure: < 0
> + *     -ENODEV: Invalid interface
> + *     -ENOTSUP: Driver does not support function
> + *   Success: >= 0
> + *     0-n: Number of packets freed. More packets may still remain in ring that
> + *     are in use.
> + */
> +
> +static inline int
> +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
> +		struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +
> +	/* Validate Input Data. Bail if not valid or not supported. */
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
> +
> +	/*
> +	 * If transmit buffer is provided and there are still packets to be
> +	 * sent, then send them before attempting to free pending mbufs.
> +	 */
> +	if (buffer && sent)
> +		*sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> +
> +	/* Call driver to free pending mbufs. */
> +	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
> +			free_cnt);
> +}
> +
> +/**
>   * Configure a callback for buffered packets which cannot be sent
>   *
>   * Register a specific callback to be called when an attempt is made to send

Just a thought to follow-up on Stephen's comment to further simplify this
API, how about not adding any new eth_dev_ops but instead defining what
should happen during an empty TX burst call (tx_burst() with 0 packets).

Several PMDs already have a check for this scenario and start by cleaning up
completed packets anyway, they effectively partially implement this
definition for free already.

The main difference with this API would be that you wouldn't know how many
mbufs were freed and wouldn't collect them into an array. However most
applications have one mbuf pool and/or know where they come from, so they
can just query the pool or attempt to re-allocate from it after doing empty
bursts in case of starvation.

[1] http://dpdk.org/ml/archives/dev/2016-December/052469.html
  
Ananyev, Konstantin Dec. 20, 2016, 12:17 p.m. UTC | #3
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> Sent: Tuesday, December 20, 2016 11:28 AM
> To: Billy McFall <bmcfall@redhat.com>
> Cc: thomas.monjalon@6wind.com; Lu, Wenzhuo <wenzhuo.lu@intel.com>; dev@dpdk.org; Stephen Hemminger
> <stephen@networkplumber.org>
> Subject: Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
> 
> Hi Billy,
> 
> On Fri, Dec 16, 2016 at 07:48:49AM -0500, Billy McFall wrote:
> > Add a new API to force free consumed buffers on TX ring. API will return
> > the number of packets freed (0-n) or error code if feature not supported
> > (-ENOTSUP) or input invalid (-ENODEV).
> >
> > Because rte_eth_tx_buffer() may be used, and mbufs may still be held
> > in local buffer, the API also accepts *buffer and *sent. Before
> > attempting to free, rte_eth_tx_buffer_flush() is called to make sure
> > all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
> > if threshold is not met.
> >
> > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> > ---
> >  lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 56 insertions(+)
> >
> > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> > index 9678179..e3f2be4 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
> >  typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
> >  /**< @internal Check DD bit of specific RX descriptor */
> >
> > +typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
> > +/**< @internal Force mbufs to be from TX ring. */
> > +
> >  typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
> >  	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
> >
> > @@ -1467,6 +1470,7 @@ struct eth_dev_ops {
> >  	eth_rx_disable_intr_t      rx_queue_intr_disable;
> >  	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
> >  	eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
> > +	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
> >  	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
> >  	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
> >  	flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
> > @@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
> >  }
> >
> >  /**
> > + * Request the driver to free mbufs currently cached by the driver. The
> > + * driver will only free the mbuf if it is no longer in use.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param queue_id
> > + *   The index of the transmit queue through which output packets must be
> > + *   sent.
> > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> > + *   to rte_eth_dev_configure().
> > + * @param free_cnt
> > + *   Maximum number of packets to free. Use 0 to indicate all possible packets
> > + *   should be freed. Note that a packet may be using multiple mbufs.
> > + * @param buffer
> > + *   Buffer used to collect packets to be sent. If provided, the buffer will
> > + *   be flushed, even if the current length is less than buffer->size. Pass NULL
> > + *   if buffer has already been flushed.
> > + * @param sent
> > + *   Pointer to return number of packets sent if buffer has packets to be sent.
> > + *   If *buffer is supplied, *sent must also be supplied.
> > + * @return
> > + *   Failure: < 0
> > + *     -ENODEV: Invalid interface
> > + *     -ENOTSUP: Driver does not support function
> > + *   Success: >= 0
> > + *     0-n: Number of packets freed. More packets may still remain in ring that
> > + *     are in use.
> > + */
> > +
> > +static inline int
> > +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
> > +		struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
> > +{
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +
> > +	/* Validate Input Data. Bail if not valid or not supported. */
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
> > +
> > +	/*
> > +	 * If transmit buffer is provided and there are still packets to be
> > +	 * sent, then send them before attempting to free pending mbufs.
> > +	 */
> > +	if (buffer && sent)
> > +		*sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> > +
> > +	/* Call driver to free pending mbufs. */
> > +	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
> > +			free_cnt);
> > +}
> > +
> > +/**
> >   * Configure a callback for buffered packets which cannot be sent
> >   *
> >   * Register a specific callback to be called when an attempt is made to send
> 
> Just a thought to follow-up on Stephen's comment to further simplify this
> API, how about not adding any new eth_dev_ops but instead defining what
> should happen during an empty TX burst call (tx_burst() with 0 packets).
> 
> Several PMDs already have a check for this scenario and start by cleaning up
> completed packets anyway, they effectively partially implement this
> definition for free already.

Many PMDs  start by cleaning up only when number of free entries
drop below some point.
Also in that case the author would have to modify (and test) all existing TX routinies.
So I think a separate API call seems more plausible.
Though I am agree with previous comment from Stephen that last two parameters
are redundant and would just overcomplicate things.
tin

> 
> The main difference with this API would be that you wouldn't know how many
> mbufs were freed and wouldn't collect them into an array. However most
> applications have one mbuf pool and/or know where they come from, so they
> can just query the pool or attempt to re-allocate from it after doing empty
> bursts in case of starvation.
> 
> [1] http://dpdk.org/ml/archives/dev/2016-December/052469.html
> 
> --
> Adrien Mazarguil
> 6WIND
  
Adrien Mazarguil Dec. 20, 2016, 12:58 p.m. UTC | #4
On Tue, Dec 20, 2016 at 12:17:10PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> > Sent: Tuesday, December 20, 2016 11:28 AM
> > To: Billy McFall <bmcfall@redhat.com>
> > Cc: thomas.monjalon@6wind.com; Lu, Wenzhuo <wenzhuo.lu@intel.com>; dev@dpdk.org; Stephen Hemminger
> > <stephen@networkplumber.org>
> > Subject: Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
> > 
> > Hi Billy,
> > 
> > On Fri, Dec 16, 2016 at 07:48:49AM -0500, Billy McFall wrote:
> > > Add a new API to force free consumed buffers on TX ring. API will return
> > > the number of packets freed (0-n) or error code if feature not supported
> > > (-ENOTSUP) or input invalid (-ENODEV).
> > >
> > > Because rte_eth_tx_buffer() may be used, and mbufs may still be held
> > > in local buffer, the API also accepts *buffer and *sent. Before
> > > attempting to free, rte_eth_tx_buffer_flush() is called to make sure
> > > all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
> > > if threshold is not met.
> > >
> > > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> > > ---
> > >  lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 56 insertions(+)
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> > > index 9678179..e3f2be4 100644
> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
> > >  typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
> > >  /**< @internal Check DD bit of specific RX descriptor */
> > >
> > > +typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
> > > +/**< @internal Force mbufs to be from TX ring. */
> > > +
> > >  typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
> > >  	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
> > >
> > > @@ -1467,6 +1470,7 @@ struct eth_dev_ops {
> > >  	eth_rx_disable_intr_t      rx_queue_intr_disable;
> > >  	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
> > >  	eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
> > > +	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
> > >  	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
> > >  	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
> > >  	flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
> > > @@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
> > >  }
> > >
> > >  /**
> > > + * Request the driver to free mbufs currently cached by the driver. The
> > > + * driver will only free the mbuf if it is no longer in use.
> > > + *
> > > + * @param port_id
> > > + *   The port identifier of the Ethernet device.
> > > + * @param queue_id
> > > + *   The index of the transmit queue through which output packets must be
> > > + *   sent.
> > > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> > > + *   to rte_eth_dev_configure().
> > > + * @param free_cnt
> > > + *   Maximum number of packets to free. Use 0 to indicate all possible packets
> > > + *   should be freed. Note that a packet may be using multiple mbufs.
> > > + * @param buffer
> > > + *   Buffer used to collect packets to be sent. If provided, the buffer will
> > > + *   be flushed, even if the current length is less than buffer->size. Pass NULL
> > > + *   if buffer has already been flushed.
> > > + * @param sent
> > > + *   Pointer to return number of packets sent if buffer has packets to be sent.
> > > + *   If *buffer is supplied, *sent must also be supplied.
> > > + * @return
> > > + *   Failure: < 0
> > > + *     -ENODEV: Invalid interface
> > > + *     -ENOTSUP: Driver does not support function
> > > + *   Success: >= 0
> > > + *     0-n: Number of packets freed. More packets may still remain in ring that
> > > + *     are in use.
> > > + */
> > > +
> > > +static inline int
> > > +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
> > > +		struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
> > > +{
> > > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > > +
> > > +	/* Validate Input Data. Bail if not valid or not supported. */
> > > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
> > > +
> > > +	/*
> > > +	 * If transmit buffer is provided and there are still packets to be
> > > +	 * sent, then send them before attempting to free pending mbufs.
> > > +	 */
> > > +	if (buffer && sent)
> > > +		*sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> > > +
> > > +	/* Call driver to free pending mbufs. */
> > > +	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
> > > +			free_cnt);
> > > +}
> > > +
> > > +/**
> > >   * Configure a callback for buffered packets which cannot be sent
> > >   *
> > >   * Register a specific callback to be called when an attempt is made to send
> > 
> > Just a thought to follow-up on Stephen's comment to further simplify this
> > API, how about not adding any new eth_dev_ops but instead defining what
> > should happen during an empty TX burst call (tx_burst() with 0 packets).
> > 
> > Several PMDs already have a check for this scenario and start by cleaning up
> > completed packets anyway, they effectively partially implement this
> > definition for free already.
> 
> Many PMDs  start by cleaning up only when number of free entries
> drop below some point.
> Also in that case the author would have to modify (and test) all existing TX routinies.
> So I think a separate API call seems more plausible.

Not necessarily, as I understand this API in its current form only suggests
that a PMD should release a few mbufs from a queue if possible, without any
guarantee, PMDs are not forced to comply.

I think the threshold you mention is a valid reason not to release them, and
it wouldn't change a thing to existing tx_burst() implementations in the
meantime (only documentation).

This threshold could also be bypassed rather painlessly in the
"if (unlikely(nb_pkts == 0))" case that all PMDs already check for in a
way or another.

> Though I am agree with previous comment from Stephen that last two parameters
> are redundant and would just overcomplicate things.
> tin
> 
> > 
> > The main difference with this API would be that you wouldn't know how many
> > mbufs were freed and wouldn't collect them into an array. However most
> > applications have one mbuf pool and/or know where they come from, so they
> > can just query the pool or attempt to re-allocate from it after doing empty
> > bursts in case of starvation.
> > 
> > [1] http://dpdk.org/ml/archives/dev/2016-December/052469.html
  
Billy McFall Dec. 20, 2016, 2:15 p.m. UTC | #5
Thank you for your responses, see inline.

On Tue, Dec 20, 2016 at 7:58 AM, Adrien Mazarguil
<adrien.mazarguil@6wind.com> wrote:
> On Tue, Dec 20, 2016 at 12:17:10PM +0000, Ananyev, Konstantin wrote:
>>
>>
>> > -----Original Message-----
>> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
>> > Sent: Tuesday, December 20, 2016 11:28 AM
>> > To: Billy McFall <bmcfall@redhat.com>
>> > Cc: thomas.monjalon@6wind.com; Lu, Wenzhuo <wenzhuo.lu@intel.com>; dev@dpdk.org; Stephen Hemminger
>> > <stephen@networkplumber.org>
>> > Subject: Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
>> >
>> > Hi Billy,
>> >
>> > On Fri, Dec 16, 2016 at 07:48:49AM -0500, Billy McFall wrote:
>> > > Add a new API to force free consumed buffers on TX ring. API will return
>> > > the number of packets freed (0-n) or error code if feature not supported
>> > > (-ENOTSUP) or input invalid (-ENODEV).
>> > >
>> > > Because rte_eth_tx_buffer() may be used, and mbufs may still be held
>> > > in local buffer, the API also accepts *buffer and *sent. Before
>> > > attempting to free, rte_eth_tx_buffer_flush() is called to make sure
>> > > all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
>> > > if threshold is not met.
>> > >
>> > > Signed-off-by: Billy McFall <bmcfall@redhat.com>
>> > > ---
>> > >  lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
>> > >  1 file changed, 56 insertions(+)
>> > >
>> > > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
>> > > index 9678179..e3f2be4 100644
>> > > --- a/lib/librte_ether/rte_ethdev.h
>> > > +++ b/lib/librte_ether/rte_ethdev.h
>> > > @@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
>> > >  typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
>> > >  /**< @internal Check DD bit of specific RX descriptor */
>> > >
>> > > +typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
>> > > +/**< @internal Force mbufs to be from TX ring. */
>> > > +
>> > >  typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
>> > >   uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
>> > >
>> > > @@ -1467,6 +1470,7 @@ struct eth_dev_ops {
>> > >   eth_rx_disable_intr_t      rx_queue_intr_disable;
>> > >   eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
>> > >   eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
>> > > + eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
>> > >   eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
>> > >   eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
>> > >   flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
>> > > @@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
>> > >  }
>> > >
>> > >  /**
>> > > + * Request the driver to free mbufs currently cached by the driver. The
>> > > + * driver will only free the mbuf if it is no longer in use.
>> > > + *
>> > > + * @param port_id
>> > > + *   The port identifier of the Ethernet device.
>> > > + * @param queue_id
>> > > + *   The index of the transmit queue through which output packets must be
>> > > + *   sent.
>> > > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
>> > > + *   to rte_eth_dev_configure().
>> > > + * @param free_cnt
>> > > + *   Maximum number of packets to free. Use 0 to indicate all possible packets
>> > > + *   should be freed. Note that a packet may be using multiple mbufs.
>> > > + * @param buffer
>> > > + *   Buffer used to collect packets to be sent. If provided, the buffer will
>> > > + *   be flushed, even if the current length is less than buffer->size. Pass NULL
>> > > + *   if buffer has already been flushed.
>> > > + * @param sent
>> > > + *   Pointer to return number of packets sent if buffer has packets to be sent.
>> > > + *   If *buffer is supplied, *sent must also be supplied.
>> > > + * @return
>> > > + *   Failure: < 0
>> > > + *     -ENODEV: Invalid interface
>> > > + *     -ENOTSUP: Driver does not support function
>> > > + *   Success: >= 0
>> > > + *     0-n: Number of packets freed. More packets may still remain in ring that
>> > > + *     are in use.
>> > > + */
>> > > +
>> > > +static inline int
>> > > +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
>> > > +         struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
>> > > +{
>> > > + struct rte_eth_dev *dev = &rte_eth_devices[port_id];
>> > > +
>> > > + /* Validate Input Data. Bail if not valid or not supported. */
>> > > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>> > > + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
>> > > +
>> > > + /*
>> > > +  * If transmit buffer is provided and there are still packets to be
>> > > +  * sent, then send them before attempting to free pending mbufs.
>> > > +  */
>> > > + if (buffer && sent)
>> > > +         *sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
>> > > +
>> > > + /* Call driver to free pending mbufs. */
>> > > + return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
>> > > +                 free_cnt);
>> > > +}
>> > > +
>> > > +/**
>> > >   * Configure a callback for buffered packets which cannot be sent
>> > >   *
>> > >   * Register a specific callback to be called when an attempt is made to send
>> >

I will remove the buffer/sent parameters. It will be the applications
responsibility
to make sure rte_eth_tx_buffer_flush() is called.

I don't feel strongly about the free_cnt parameter. It was in the
original request
so that if there was a large ring buffer, the API could bail early
without having
to go through all the entire ring. It might be a little unrealistic
for the application
to truly know how many mbufs it wants freed. Also, as an example, the I40e
driver already has a i40e_tx_free_bufs(...) function, so by dropping
the free_cnt
parameter, this function could be reused without having to account for
the free_cnt.

>> > Just a thought to follow-up on Stephen's comment to further simplify this
>> > API, how about not adding any new eth_dev_ops but instead defining what
>> > should happen during an empty TX burst call (tx_burst() with 0 packets).
>> >

In the original API request thread, see dpdk-dev mailing list from 11/21/2016
with subject "Adding API to force freeing consumed buffers in TX ring",
overloading the existing API with nb_pkts == 0 was suggested and consensus
was to go with new API. I lean towards a new API since this is a special case
most applications won't use, but I will go with the community on whether to
enhance the existing burst functionality or add a new API.

>> > Several PMDs already have a check for this scenario and start by cleaning up
>> > completed packets anyway, they effectively partially implement this
>> > definition for free already.
>>
>> Many PMDs  start by cleaning up only when number of free entries
>> drop below some point.

True, but the original request for this API was for the scenario where packets
are being flooded and the application wanted to reuse mbuf to avoid a packet
copy. So the API was to request the driver to free "done" mbufs outside of any
threshold.

>> Also in that case the author would have to modify (and test) all existing TX routinies.
>> So I think a separate API call seems more plausible.
>
> Not necessarily, as I understand this API in its current form only suggests
> that a PMD should release a few mbufs from a queue if possible, without any
> guarantee, PMDs are not forced to comply.
>
> I think the threshold you mention is a valid reason not to release them, and
> it wouldn't change a thing to existing tx_burst() implementations in the
> meantime (only documentation).
>
> This threshold could also be bypassed rather painlessly in the
> "if (unlikely(nb_pkts == 0))" case that all PMDs already check for in a
> way or another.
>
>> Though I am agree with previous comment from Stephen that last two parameters
>> are redundant and would just overcomplicate things.
>> tin
>>
>> >
>> > The main difference with this API would be that you wouldn't know how many
>> > mbufs were freed and wouldn't collect them into an array. However most
>> > applications have one mbuf pool and/or know where they come from, so they
>> > can just query the pool or attempt to re-allocate from it after doing empty
>> > bursts in case of starvation.
>> >
>> > [1] http://dpdk.org/ml/archives/dev/2016-December/052469.html
>
> --
> Adrien Mazarguil
> 6WIND
  
Adrien Mazarguil Dec. 23, 2016, 9:45 a.m. UTC | #6
Hi Billy,

On Tue, Dec 20, 2016 at 09:15:50AM -0500, Billy McFall wrote:
> Thank you for your responses, see inline.
> 
> On Tue, Dec 20, 2016 at 7:58 AM, Adrien Mazarguil
> <adrien.mazarguil@6wind.com> wrote:
> > On Tue, Dec 20, 2016 at 12:17:10PM +0000, Ananyev, Konstantin wrote:
> >>
> >>
> >> > -----Original Message-----
> >> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> >> > Sent: Tuesday, December 20, 2016 11:28 AM
> >> > To: Billy McFall <bmcfall@redhat.com>
> >> > Cc: thomas.monjalon@6wind.com; Lu, Wenzhuo <wenzhuo.lu@intel.com>; dev@dpdk.org; Stephen Hemminger
> >> > <stephen@networkplumber.org>
> >> > Subject: Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
> >> >
> >> > Hi Billy,
> >> >
> >> > On Fri, Dec 16, 2016 at 07:48:49AM -0500, Billy McFall wrote:
> >> > > Add a new API to force free consumed buffers on TX ring. API will return
> >> > > the number of packets freed (0-n) or error code if feature not supported
> >> > > (-ENOTSUP) or input invalid (-ENODEV).
> >> > >
> >> > > Because rte_eth_tx_buffer() may be used, and mbufs may still be held
> >> > > in local buffer, the API also accepts *buffer and *sent. Before
> >> > > attempting to free, rte_eth_tx_buffer_flush() is called to make sure
> >> > > all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
> >> > > if threshold is not met.
> >> > >
> >> > > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> >> > > ---
> >> > >  lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
> >> > >  1 file changed, 56 insertions(+)
> >> > >
> >> > > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> >> > > index 9678179..e3f2be4 100644
> >> > > --- a/lib/librte_ether/rte_ethdev.h
> >> > > +++ b/lib/librte_ether/rte_ethdev.h
> >> > > @@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
> >> > >  typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
> >> > >  /**< @internal Check DD bit of specific RX descriptor */
> >> > >
> >> > > +typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
> >> > > +/**< @internal Force mbufs to be from TX ring. */
> >> > > +
> >> > >  typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
> >> > >   uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
> >> > >
> >> > > @@ -1467,6 +1470,7 @@ struct eth_dev_ops {
> >> > >   eth_rx_disable_intr_t      rx_queue_intr_disable;
> >> > >   eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
> >> > >   eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
> >> > > + eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
> >> > >   eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
> >> > >   eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
> >> > >   flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
> >> > > @@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
> >> > >  }
> >> > >
> >> > >  /**
> >> > > + * Request the driver to free mbufs currently cached by the driver. The
> >> > > + * driver will only free the mbuf if it is no longer in use.
> >> > > + *
> >> > > + * @param port_id
> >> > > + *   The port identifier of the Ethernet device.
> >> > > + * @param queue_id
> >> > > + *   The index of the transmit queue through which output packets must be
> >> > > + *   sent.
> >> > > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> >> > > + *   to rte_eth_dev_configure().
> >> > > + * @param free_cnt
> >> > > + *   Maximum number of packets to free. Use 0 to indicate all possible packets
> >> > > + *   should be freed. Note that a packet may be using multiple mbufs.
> >> > > + * @param buffer
> >> > > + *   Buffer used to collect packets to be sent. If provided, the buffer will
> >> > > + *   be flushed, even if the current length is less than buffer->size. Pass NULL
> >> > > + *   if buffer has already been flushed.
> >> > > + * @param sent
> >> > > + *   Pointer to return number of packets sent if buffer has packets to be sent.
> >> > > + *   If *buffer is supplied, *sent must also be supplied.
> >> > > + * @return
> >> > > + *   Failure: < 0
> >> > > + *     -ENODEV: Invalid interface
> >> > > + *     -ENOTSUP: Driver does not support function
> >> > > + *   Success: >= 0
> >> > > + *     0-n: Number of packets freed. More packets may still remain in ring that
> >> > > + *     are in use.
> >> > > + */
> >> > > +
> >> > > +static inline int
> >> > > +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
> >> > > +         struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
> >> > > +{
> >> > > + struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> >> > > +
> >> > > + /* Validate Input Data. Bail if not valid or not supported. */
> >> > > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> >> > > + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
> >> > > +
> >> > > + /*
> >> > > +  * If transmit buffer is provided and there are still packets to be
> >> > > +  * sent, then send them before attempting to free pending mbufs.
> >> > > +  */
> >> > > + if (buffer && sent)
> >> > > +         *sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> >> > > +
> >> > > + /* Call driver to free pending mbufs. */
> >> > > + return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
> >> > > +                 free_cnt);
> >> > > +}
> >> > > +
> >> > > +/**
> >> > >   * Configure a callback for buffered packets which cannot be sent
> >> > >   *
> >> > >   * Register a specific callback to be called when an attempt is made to send
> >> >
> 
> I will remove the buffer/sent parameters. It will be the applications
> responsibility
> to make sure rte_eth_tx_buffer_flush() is called.
> 
> I don't feel strongly about the free_cnt parameter. It was in the
> original request
> so that if there was a large ring buffer, the API could bail early
> without having
> to go through all the entire ring. It might be a little unrealistic
> for the application
> to truly know how many mbufs it wants freed. Also, as an example, the I40e
> driver already has a i40e_tx_free_bufs(...) function, so by dropping
> the free_cnt
> parameter, this function could be reused without having to account for
> the free_cnt.
> 
> >> > Just a thought to follow-up on Stephen's comment to further simplify this
> >> > API, how about not adding any new eth_dev_ops but instead defining what
> >> > should happen during an empty TX burst call (tx_burst() with 0 packets).
> >> >
> 
> In the original API request thread, see dpdk-dev mailing list from 11/21/2016
> with subject "Adding API to force freeing consumed buffers in TX ring",
> overloading the existing API with nb_pkts == 0 was suggested and consensus
> was to go with new API. I lean towards a new API since this is a special case
> most applications won't use, but I will go with the community on whether to
> enhance the existing burst functionality or add a new API.

OK, I've just read the original thread.

> >> > Several PMDs already have a check for this scenario and start by cleaning up
> >> > completed packets anyway, they effectively partially implement this
> >> > definition for free already.
> >>
> >> Many PMDs  start by cleaning up only when number of free entries
> >> drop below some point.
> 
> True, but the original request for this API was for the scenario where packets
> are being flooded and the application wanted to reuse mbuf to avoid a packet
> copy. So the API was to request the driver to free "done" mbufs outside of any
> threshold.

Understood, so it's more than just a polite suggestion to PMDs that
implement this call. In my opinion it's still better to avoid adding a new
callback for that purpose since applications cannot rely on a specific
outcome, it cannot guarantee any mbuf would be freed, not unlike calling
tx_burst() with 0 packets.

That's a separate discussion, however perhaps making struct eth_dev_ops part
of the public API was not such a good idea after all. We're unable to
maintain ABI compatibility across releases because of it.

New callbacks would be met with less resistance (at least on my side) if
this whole ABI compat thing was not an issue.

> >> Also in that case the author would have to modify (and test) all existing TX routinies.
> >> So I think a separate API call seems more plausible.
> >
> > Not necessarily, as I understand this API in its current form only suggests
> > that a PMD should release a few mbufs from a queue if possible, without any
> > guarantee, PMDs are not forced to comply.
> >
> > I think the threshold you mention is a valid reason not to release them, and
> > it wouldn't change a thing to existing tx_burst() implementations in the
> > meantime (only documentation).
> >
> > This threshold could also be bypassed rather painlessly in the
> > "if (unlikely(nb_pkts == 0))" case that all PMDs already check for in a
> > way or another.
> >
> >> Though I am agree with previous comment from Stephen that last two parameters
> >> are redundant and would just overcomplicate things.
> >> tin
> >>
> >> >
> >> > The main difference with this API would be that you wouldn't know how many
> >> > mbufs were freed and wouldn't collect them into an array. However most
> >> > applications have one mbuf pool and/or know where they come from, so they
> >> > can just query the pool or attempt to re-allocate from it after doing empty
> >> > bursts in case of starvation.
> >> >
> >> > [1] http://dpdk.org/ml/archives/dev/2016-December/052469.html
> >
> > --
> > Adrien Mazarguil
> > 6WIND
  

Patch

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 9678179..e3f2be4 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1150,6 +1150,9 @@  typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
 typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
 /**< @internal Check DD bit of specific RX descriptor */
 
+typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
+/**< @internal Force mbufs to be from TX ring. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1467,6 +1470,7 @@  struct eth_dev_ops {
 	eth_rx_disable_intr_t      rx_queue_intr_disable;
 	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
 	eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
+	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
 	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
 	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
 	flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
@@ -2943,6 +2947,58 @@  rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
 }
 
 /**
+ * Request the driver to free mbufs currently cached by the driver. The
+ * driver will only free the mbuf if it is no longer in use.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param free_cnt
+ *   Maximum number of packets to free. Use 0 to indicate all possible packets
+ *   should be freed. Note that a packet may be using multiple mbufs.
+ * @param buffer
+ *   Buffer used to collect packets to be sent. If provided, the buffer will
+ *   be flushed, even if the current length is less than buffer->size. Pass NULL
+ *   if buffer has already been flushed.
+ * @param sent
+ *   Pointer to return number of packets sent if buffer has packets to be sent.
+ *   If *buffer is supplied, *sent must also be supplied.
+ * @return
+ *   Failure: < 0
+ *     -ENODEV: Invalid interface
+ *     -ENOTSUP: Driver does not support function
+ *   Success: >= 0
+ *     0-n: Number of packets freed. More packets may still remain in ring that
+ *     are in use.
+ */
+
+static inline int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
+		struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	/* Validate Input Data. Bail if not valid or not supported. */
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
+
+	/*
+	 * If transmit buffer is provided and there are still packets to be
+	 * sent, then send them before attempting to free pending mbufs.
+	 */
+	if (buffer && sent)
+		*sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
+
+	/* Call driver to free pending mbufs. */
+	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
+			free_cnt);
+}
+
+/**
  * Configure a callback for buffered packets which cannot be sent
  *
  * Register a specific callback to be called when an attempt is made to send