[dpdk-dev,v7,1/3] ethdev: new API to free consumed buffers in Tx ring

Message ID 20170315180226.5999-2-bmcfall@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

Billy McFall March 15, 2017, 6:02 p.m. UTC
  Add a new API to force free consumed buffers on Tx ring. API will return
the number of packets freed (0-n) or error code if feature not supported
(-ENOTSUP) or input invalid (-ENODEV).

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 doc/guides/conf.py                      |  7 +++++--
 doc/guides/nics/features/default.ini    |  4 +++-
 doc/guides/prog_guide/poll_mode_drv.rst | 28 ++++++++++++++++++++++++++++
 doc/guides/rel_notes/release_17_05.rst  |  7 ++++++-
 lib/librte_ether/rte_ethdev.c           | 14 ++++++++++++++
 lib/librte_ether/rte_ethdev.h           | 31 +++++++++++++++++++++++++++++++
 6 files changed, 87 insertions(+), 4 deletions(-)
  

Comments

Olivier Matz March 23, 2017, 10:37 a.m. UTC | #1
Hi Billy,

On Wed, 15 Mar 2017 14:02:24 -0400, Billy McFall <bmcfall@redhat.com> wrote:
> Add a new API to force free consumed buffers on Tx ring. API will return
> the number of packets freed (0-n) or error code if feature not supported
> (-ENOTSUP) or input invalid (-ENODEV).
> 
> Signed-off-by: Billy McFall <bmcfall@redhat.com>
> ---
>  doc/guides/conf.py                      |  7 +++++--
>  doc/guides/nics/features/default.ini    |  4 +++-
>  doc/guides/prog_guide/poll_mode_drv.rst | 28 ++++++++++++++++++++++++++++
>  doc/guides/rel_notes/release_17_05.rst  |  7 ++++++-
>  lib/librte_ether/rte_ethdev.c           | 14 ++++++++++++++
>  lib/librte_ether/rte_ethdev.h           | 31 +++++++++++++++++++++++++++++++
>  6 files changed, 87 insertions(+), 4 deletions(-)
> 

[...]

> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> @@ -249,6 +249,34 @@ One descriptor in the TX ring is used as a sentinel to avoid a hardware race con
>  
>      When configuring for DCB operation, at port initialization, both the number of transmit queues and the number of receive queues must be set to 128.
>  
> +Free Tx mbuf on Demand
> +~~~~~~~~~~~~~~~~~~~~~~
> +
> +Many of the drivers don't release the mbuf back to the mempool, or local cache, immediately after the packet has been
> +transmitted.
> +Instead, they leave the mbuf in their Tx ring and either perform a bulk release when the ``tx_rs_thresh`` has been
> +crossed or free the mbuf when a slot in the Tx ring is needed.
> +
> +An application can request the driver to release used mbufs with the ``rte_eth_tx_done_cleanup()`` API.
> +This API requests the driver to release mbufs that are no longer in use, independent of whether or not the
> +``tx_rs_thresh`` has been crossed.
> +There are two scenarios when an application may want the mbuf released immediately:
> +
> +* When a given packet needs to be sent to multiple destination interfaces (either for Layer 2 flooding or Layer 3
> +  multi-cast).
> +  One option is to make a copy of the packet or a copy of the header portion that needs to be manipulated.
> +  A second option is to transmit the packet and then poll the ``rte_eth_tx_done_cleanup()`` API until the reference
> +  count on the packet is decremented.
> +  Then the same packet can be transmitted to the next destination interface.

By reading this paragraph, it's not so clear to me that the packet
that will be transmitted on all interfaces will be different from
one port to another.

Maybe it could be reworded to insist on that?


> +
> +* If an application is designed to make multiple runs, like a packet generator, and one run has completed.
> +  The application may want to reset to a clean state.

I'd reword into:

Some applications are designed to make multiple runs, like a packet generator.
Between each run, the application may want to reset to a clean state.

What do you mean by "clean state"? All mbufs returned into the mempools?
Why would a packet generator need that? For performance?

Also, do we want to ensure that all packets are actually transmitted?
Can we do that with this API or should we use another API like
rte_eth_tx_descriptor_status() [1] ?

[1] http://dpdk.org/dev/patchwork/patch/21549/


Thanks,
Olivier
  
Billy McFall March 23, 2017, 1:32 p.m. UTC | #2
Thank you for your comments. See inline.

On Thu, Mar 23, 2017 at 6:37 AM, Olivier MATZ <olivier.matz@6wind.com>
wrote:

> Hi Billy,
>
> On Wed, 15 Mar 2017 14:02:24 -0400, Billy McFall <bmcfall@redhat.com>
> wrote:
> > Add a new API to force free consumed buffers on Tx ring. API will return
> > the number of packets freed (0-n) or error code if feature not supported
> > (-ENOTSUP) or input invalid (-ENODEV).
> >
> > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> > ---
> >  doc/guides/conf.py                      |  7 +++++--
> >  doc/guides/nics/features/default.ini    |  4 +++-
> >  doc/guides/prog_guide/poll_mode_drv.rst | 28
> ++++++++++++++++++++++++++++
> >  doc/guides/rel_notes/release_17_05.rst  |  7 ++++++-
> >  lib/librte_ether/rte_ethdev.c           | 14 ++++++++++++++
> >  lib/librte_ether/rte_ethdev.h           | 31
> +++++++++++++++++++++++++++++++
> >  6 files changed, 87 insertions(+), 4 deletions(-)
> >
>
> [...]
>
> > --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > @@ -249,6 +249,34 @@ One descriptor in the TX ring is used as a sentinel
> to avoid a hardware race con
> >
> >      When configuring for DCB operation, at port initialization, both
> the number of transmit queues and the number of receive queues must be set
> to 128.
> >
> > +Free Tx mbuf on Demand
> > +~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Many of the drivers don't release the mbuf back to the mempool, or
> local cache, immediately after the packet has been
> > +transmitted.
> > +Instead, they leave the mbuf in their Tx ring and either perform a bulk
> release when the ``tx_rs_thresh`` has been
> > +crossed or free the mbuf when a slot in the Tx ring is needed.
> > +
> > +An application can request the driver to release used mbufs with the
> ``rte_eth_tx_done_cleanup()`` API.
> > +This API requests the driver to release mbufs that are no longer in
> use, independent of whether or not the
> > +``tx_rs_thresh`` has been crossed.
> > +There are two scenarios when an application may want the mbuf released
> immediately:
> > +
> > +* When a given packet needs to be sent to multiple destination
> interfaces (either for Layer 2 flooding or Layer 3
> > +  multi-cast).
> > +  One option is to make a copy of the packet or a copy of the header
> portion that needs to be manipulated.
> > +  A second option is to transmit the packet and then poll the
> ``rte_eth_tx_done_cleanup()`` API until the reference
> > +  count on the packet is decremented.
> > +  Then the same packet can be transmitted to the next destination
> interface.
>
> By reading this paragraph, it's not so clear to me that the packet
> that will be transmitted on all interfaces will be different from
> one port to another.
>
> Maybe it could be reworded to insist on that?
>
>
What if I add the following sentence:

  Then the same packet can be transmitted to the next destination interface.
+ The application is still responsible for managing any packet
manipulations needed between the different destination
+ interfaces, but a packet copy can be avoided.


>
> > +
> > +* If an application is designed to make multiple runs, like a packet
> generator, and one run has completed.
> > +  The application may want to reset to a clean state.
>
> I'd reword into:
>
> Some applications are designed to make multiple runs, like a packet
> generator.
> Between each run, the application may want to reset to a clean state.
>
> What do you mean by "clean state"? All mbufs returned into the mempools?
> Why would a packet generator need that? For performance?
>
> Reworded as you suggested, then attempted to explain a 'clean state'.
Also reworded the last sentence a little.

+ * Some applications are designed to make multiple runs, like a packet
generator.
+   For performance reasons and consistency between runs, the application
may want to reset back to an initial state
+   between each run, where all mbufs are returned to the mempool.
+   In this case, it can call the ``rte_eth_tx_done_cleanup()`` API for
each destination interface it has been using
+   to request it to release of all its used mbufs.


> Also, do we want to ensure that all packets are actually transmitted?
>

Added an additional sentence to indicate that this API doesn't manage
whether or not the packet has been transmitted.

  Then the same packet can be transmitted to the next destination interface.
  The application is still responsible for managing any packet
manipulations needed between the different destination
  interface, but a packet copy can be avoided.
+  This API is independent of whether the packet was transmitted or
dropped, only that the mbuf is no longer in use by
+  the interface.


> Can we do that with this API or should we use another API like
> rte_eth_tx_descriptor_status() [1] ?
>
> [1] http://dpdk.org/dev/patchwork/patch/21549/
>
> I read through this patch. This API doesn't indicate if the packet was
transmitted or dropped (I think that is what you were asking). This API
could be used by the application to determine if the mbuf has been
freed, as opposed to polling the rte_mbuf_refcnt_read() for a change
in value. Did I miss your point?

>
> Thanks,
> Olivier
>

Thanks,
Billy McFall
  
Olivier Matz March 24, 2017, 12:46 p.m. UTC | #3
Hi Billy,

On Thu, 23 Mar 2017 09:32:14 -0400, Billy McFall <bmcfall@redhat.com> wrote:
> Thank you for your comments. See inline.
> 
> On Thu, Mar 23, 2017 at 6:37 AM, Olivier MATZ <olivier.matz@6wind.com>
> wrote:
> 
> > Hi Billy,
> >
> > On Wed, 15 Mar 2017 14:02:24 -0400, Billy McFall <bmcfall@redhat.com>
> > wrote:  
> > > Add a new API to force free consumed buffers on Tx ring. API will return
> > > the number of packets freed (0-n) or error code if feature not supported
> > > (-ENOTSUP) or input invalid (-ENODEV).
> > >
> > > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> > > ---
> > >  doc/guides/conf.py                      |  7 +++++--
> > >  doc/guides/nics/features/default.ini    |  4 +++-
> > >  doc/guides/prog_guide/poll_mode_drv.rst | 28  
> > ++++++++++++++++++++++++++++  
> > >  doc/guides/rel_notes/release_17_05.rst  |  7 ++++++-
> > >  lib/librte_ether/rte_ethdev.c           | 14 ++++++++++++++
> > >  lib/librte_ether/rte_ethdev.h           | 31  
> > +++++++++++++++++++++++++++++++  
> > >  6 files changed, 87 insertions(+), 4 deletions(-)
> > >  
> >
> > [...]
> >  
> > > --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > > +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > > @@ -249,6 +249,34 @@ One descriptor in the TX ring is used as a sentinel  
> > to avoid a hardware race con  
> > >
> > >      When configuring for DCB operation, at port initialization, both  
> > the number of transmit queues and the number of receive queues must be set
> > to 128.  
> > >
> > > +Free Tx mbuf on Demand
> > > +~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Many of the drivers don't release the mbuf back to the mempool, or  
> > local cache, immediately after the packet has been  
> > > +transmitted.
> > > +Instead, they leave the mbuf in their Tx ring and either perform a bulk  
> > release when the ``tx_rs_thresh`` has been  
> > > +crossed or free the mbuf when a slot in the Tx ring is needed.
> > > +
> > > +An application can request the driver to release used mbufs with the  
> > ``rte_eth_tx_done_cleanup()`` API.  
> > > +This API requests the driver to release mbufs that are no longer in  
> > use, independent of whether or not the  
> > > +``tx_rs_thresh`` has been crossed.
> > > +There are two scenarios when an application may want the mbuf released  
> > immediately:  
> > > +
> > > +* When a given packet needs to be sent to multiple destination  
> > interfaces (either for Layer 2 flooding or Layer 3  
> > > +  multi-cast).
> > > +  One option is to make a copy of the packet or a copy of the header  
> > portion that needs to be manipulated.  
> > > +  A second option is to transmit the packet and then poll the  
> > ``rte_eth_tx_done_cleanup()`` API until the reference  
> > > +  count on the packet is decremented.
> > > +  Then the same packet can be transmitted to the next destination  
> > interface.
> >
> > By reading this paragraph, it's not so clear to me that the packet
> > that will be transmitted on all interfaces will be different from
> > one port to another.
> >
> > Maybe it could be reworded to insist on that?
> >
> >  
> What if I add the following sentence:
> 
>   Then the same packet can be transmitted to the next destination interface.
> + The application is still responsible for managing any packet
> manipulations needed between the different destination
> + interfaces, but a packet copy can be avoided.

looks good, thanks.



> > > +
> > > +* If an application is designed to make multiple runs, like a packet  
> > generator, and one run has completed.  
> > > +  The application may want to reset to a clean state.  
> >
> > I'd reword into:
> >
> > Some applications are designed to make multiple runs, like a packet
> > generator.
> > Between each run, the application may want to reset to a clean state.
> >
> > What do you mean by "clean state"? All mbufs returned into the mempools?
> > Why would a packet generator need that? For performance?
> >
> > Reworded as you suggested, then attempted to explain a 'clean state'.  
> Also reworded the last sentence a little.
> 
> + * Some applications are designed to make multiple runs, like a packet
> generator.
> +   For performance reasons and consistency between runs, the application
> may want to reset back to an initial state
> +   between each run, where all mbufs are returned to the mempool.
> +   In this case, it can call the ``rte_eth_tx_done_cleanup()`` API for
> each destination interface it has been using
> +   to request it to release of all its used mbufs.

ok, looks clearer to me, thanks


> > Also, do we want to ensure that all packets are actually transmitted?
> >  
> 
> Added an additional sentence to indicate that this API doesn't manage
> whether or not the packet has been transmitted.
> 
>   Then the same packet can be transmitted to the next destination interface.
>   The application is still responsible for managing any packet
> manipulations needed between the different destination
>   interface, but a packet copy can be avoided.
> +  This API is independent of whether the packet was transmitted or
> dropped, only that the mbuf is no longer in use by
> +  the interface.

ok


> > Can we do that with this API or should we use another API like
> > rte_eth_tx_descriptor_status() [1] ?
> >
> > [1] http://dpdk.org/dev/patchwork/patch/21549/
> >
> > I read through this patch. This API doesn't indicate if the packet was  
> transmitted or dropped (I think that is what you were asking). This API
> could be used by the application to determine if the mbuf has been
> freed, as opposed to polling the rte_mbuf_refcnt_read() for a change
> in value. Did I miss your point?

Maybe my question was not clear :)
Let me try to reword it.

For a traffic generator use-case, a dummy algorithm may be:

1/ send packets in a loop until a condition is met (ex: packet count reached)
2/ call rte_eth_tx_done_cleanup()
3/ read stats for report

I think there is something missing between 1/ and 2/, to ensure that
all packets that were in the tx queue are processed (either transmitted
or dropped). If that's not the case, both steps 2/ and 3/ will not
behave as expected:
- all mbufs won't be returned to the pool
- statistics may be wrong

Maybe a simple wait() could do the job.
Using a combination of rte_eth_tx_done_cleanup() + rte_eth_tx_descriptor_status()
is probably also a solution.

Do you confirm rte_eth_tx_done_cleanup() does not check that?

Thanks
Olivier
  
Billy McFall March 24, 2017, 1:18 p.m. UTC | #4
On Fri, Mar 24, 2017 at 8:46 AM, Olivier Matz <olivier.matz@6wind.com>
wrote:

> Hi Billy,
>
> On Thu, 23 Mar 2017 09:32:14 -0400, Billy McFall <bmcfall@redhat.com>
> wrote:
> > Thank you for your comments. See inline.
> >
> > On Thu, Mar 23, 2017 at 6:37 AM, Olivier MATZ <olivier.matz@6wind.com>
> > wrote:
> >
> > > Hi Billy,
> > >
> > > On Wed, 15 Mar 2017 14:02:24 -0400, Billy McFall <bmcfall@redhat.com>
> > > wrote:
> > > > Add a new API to force free consumed buffers on Tx ring. API will
> return
> > > > the number of packets freed (0-n) or error code if feature not
> supported
> > > > (-ENOTSUP) or input invalid (-ENODEV).
> > > >
> > > > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> > > > ---
> > > >  doc/guides/conf.py                      |  7 +++++--
> > > >  doc/guides/nics/features/default.ini    |  4 +++-
> > > >  doc/guides/prog_guide/poll_mode_drv.rst | 28
> > > ++++++++++++++++++++++++++++
> > > >  doc/guides/rel_notes/release_17_05.rst  |  7 ++++++-
> > > >  lib/librte_ether/rte_ethdev.c           | 14 ++++++++++++++
> > > >  lib/librte_ether/rte_ethdev.h           | 31
> > > +++++++++++++++++++++++++++++++
> > > >  6 files changed, 87 insertions(+), 4 deletions(-)
> > > >
> > >
> > > [...]
> > >
> > > > --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > > > +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > > > @@ -249,6 +249,34 @@ One descriptor in the TX ring is used as a
> sentinel
> > > to avoid a hardware race con
> > > >
> > > >      When configuring for DCB operation, at port initialization, both
> > > the number of transmit queues and the number of receive queues must be
> set
> > > to 128.
> > > >
> > > > +Free Tx mbuf on Demand
> > > > +~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +Many of the drivers don't release the mbuf back to the mempool, or
> > > local cache, immediately after the packet has been
> > > > +transmitted.
> > > > +Instead, they leave the mbuf in their Tx ring and either perform a
> bulk
> > > release when the ``tx_rs_thresh`` has been
> > > > +crossed or free the mbuf when a slot in the Tx ring is needed.
> > > > +
> > > > +An application can request the driver to release used mbufs with the
> > > ``rte_eth_tx_done_cleanup()`` API.
> > > > +This API requests the driver to release mbufs that are no longer in
> > > use, independent of whether or not the
> > > > +``tx_rs_thresh`` has been crossed.
> > > > +There are two scenarios when an application may want the mbuf
> released
> > > immediately:
> > > > +
> > > > +* When a given packet needs to be sent to multiple destination
> > > interfaces (either for Layer 2 flooding or Layer 3
> > > > +  multi-cast).
> > > > +  One option is to make a copy of the packet or a copy of the header
> > > portion that needs to be manipulated.
> > > > +  A second option is to transmit the packet and then poll the
> > > ``rte_eth_tx_done_cleanup()`` API until the reference
> > > > +  count on the packet is decremented.
> > > > +  Then the same packet can be transmitted to the next destination
> > > interface.
> > >
> > > By reading this paragraph, it's not so clear to me that the packet
> > > that will be transmitted on all interfaces will be different from
> > > one port to another.
> > >
> > > Maybe it could be reworded to insist on that?
> > >
> > >
> > What if I add the following sentence:
> >
> >   Then the same packet can be transmitted to the next destination
> interface.
> > + The application is still responsible for managing any packet
> > manipulations needed between the different destination
> > + interfaces, but a packet copy can be avoided.
>
> looks good, thanks.
>
>
>
> > > > +
> > > > +* If an application is designed to make multiple runs, like a packet
> > > generator, and one run has completed.
> > > > +  The application may want to reset to a clean state.
> > >
> > > I'd reword into:
> > >
> > > Some applications are designed to make multiple runs, like a packet
> > > generator.
> > > Between each run, the application may want to reset to a clean state.
> > >
> > > What do you mean by "clean state"? All mbufs returned into the
> mempools?
> > > Why would a packet generator need that? For performance?
> > >
> > > Reworded as you suggested, then attempted to explain a 'clean state'.
> > Also reworded the last sentence a little.
> >
> > + * Some applications are designed to make multiple runs, like a packet
> > generator.
> > +   For performance reasons and consistency between runs, the application
> > may want to reset back to an initial state
> > +   between each run, where all mbufs are returned to the mempool.
> > +   In this case, it can call the ``rte_eth_tx_done_cleanup()`` API for
> > each destination interface it has been using
> > +   to request it to release of all its used mbufs.
>
> ok, looks clearer to me, thanks
>
>
> > > Also, do we want to ensure that all packets are actually transmitted?
> > >
> >
> > Added an additional sentence to indicate that this API doesn't manage
> > whether or not the packet has been transmitted.
> >
> >   Then the same packet can be transmitted to the next destination
> interface.
> >   The application is still responsible for managing any packet
> > manipulations needed between the different destination
> >   interface, but a packet copy can be avoided.
> > +  This API is independent of whether the packet was transmitted or
> > dropped, only that the mbuf is no longer in use by
> > +  the interface.
>
> ok
>
>
> > > Can we do that with this API or should we use another API like
> > > rte_eth_tx_descriptor_status() [1] ?
> > >
> > > [1] http://dpdk.org/dev/patchwork/patch/21549/
> > >
> > > I read through this patch. This API doesn't indicate if the packet was
> > transmitted or dropped (I think that is what you were asking). This API
> > could be used by the application to determine if the mbuf has been
> > freed, as opposed to polling the rte_mbuf_refcnt_read() for a change
> > in value. Did I miss your point?
>
> Maybe my question was not clear :)
> Let me try to reword it.
>
> For a traffic generator use-case, a dummy algorithm may be:
>
> 1/ send packets in a loop until a condition is met (ex: packet count
> reached)
> 2/ call rte_eth_tx_done_cleanup()
> 3/ read stats for report
>
> I think there is something missing between 1/ and 2/, to ensure that
> all packets that were in the tx queue are processed (either transmitted
> or dropped). If that's not the case, both steps 2/ and 3/ will not
> behave as expected:
> - all mbufs won't be returned to the pool
> - statistics may be wrong
>
> Maybe a simple wait() could do the job.
> Using a combination of rte_eth_tx_done_cleanup() +
> rte_eth_tx_descriptor_status()
> is probably also a solution.
>
> Do you confirm rte_eth_tx_done_cleanup() does not check that?
>
> Confirm.  rte_eth_tx_done_cleanup() does not check that. In the flooding
case,
the applications is expected to poll rte_eth_tx_done_cleanup() until some
condition
is met, like ref_count of given packet is decremented. So on the packetGen
case, the
application would need to wait some time and/or call
rte_eth_tx_descriptor_status()
as you suggested.

My original patch returned RTE_DONE (no more packets pending),
RTE_PROCESSING (freed what I could but there are still packets in the queue)
or -ERRNO for error. Then packets freed count was returned via a pointer in
the param list.
That would have solved what you are asking, but that was shot down as being
overkill.

Should I add another sentence to the packet generator bullet indicating
that it is the
application's job to make sure no more packets are pending? Like:

  In this case, it can call the ``rte_eth_tx_done_cleanup()`` API for each
destination interface it has been using
  to request it to release of all its used mbufs.
+ It is the application's responsibility to ensure all packets have been
processed by the destination interface.
+ Use rte_eth_tx_descriptor_status() to obtain the status of the transmit
queue,

Thanks
> Olivier
>
  
Olivier Matz March 24, 2017, 1:30 p.m. UTC | #5
On Fri, 24 Mar 2017 09:18:54 -0400, Billy McFall <bmcfall@redhat.com> wrote:
> On Fri, Mar 24, 2017 at 8:46 AM, Olivier Matz <olivier.matz@6wind.com>
> wrote:

[...]

> > > I read through this patch. This API doesn't indicate if the packet was  
> > > transmitted or dropped (I think that is what you were asking). This API
> > > could be used by the application to determine if the mbuf has been
> > > freed, as opposed to polling the rte_mbuf_refcnt_read() for a change
> > > in value. Did I miss your point?  
> >
> > Maybe my question was not clear :)
> > Let me try to reword it.
> >
> > For a traffic generator use-case, a dummy algorithm may be:
> >
> > 1/ send packets in a loop until a condition is met (ex: packet count
> > reached)
> > 2/ call rte_eth_tx_done_cleanup()
> > 3/ read stats for report
> >
> > I think there is something missing between 1/ and 2/, to ensure that
> > all packets that were in the tx queue are processed (either transmitted
> > or dropped). If that's not the case, both steps 2/ and 3/ will not
> > behave as expected:
> > - all mbufs won't be returned to the pool
> > - statistics may be wrong
> >
> > Maybe a simple wait() could do the job.
> > Using a combination of rte_eth_tx_done_cleanup() +
> > rte_eth_tx_descriptor_status()
> > is probably also a solution.
> >
> > Do you confirm rte_eth_tx_done_cleanup() does not check that?
> >
> Confirm.  rte_eth_tx_done_cleanup() does not check that. In the flooding  
> case,
> the applications is expected to poll rte_eth_tx_done_cleanup() until some
> condition
> is met, like ref_count of given packet is decremented. So on the packetGen
> case, the
> application would need to wait some time and/or call
> rte_eth_tx_descriptor_status()
> as you suggested.
> 
> My original patch returned RTE_DONE (no more packets pending),
> RTE_PROCESSING (freed what I could but there are still packets in the queue)
> or -ERRNO for error. Then packets freed count was returned via a pointer in
> the param list.
> That would have solved what you are asking, but that was shot down as being
> overkill.
> 
> Should I add another sentence to the packet generator bullet indicating
> that it is the
> application's job to make sure no more packets are pending? Like:
> 
>   In this case, it can call the ``rte_eth_tx_done_cleanup()`` API for each
> destination interface it has been using
>   to request it to release of all its used mbufs.
> + It is the application's responsibility to ensure all packets have been
> processed by the destination interface.
> + Use rte_eth_tx_descriptor_status() to obtain the status of the transmit
> queue,

Thanks for the clarification.
Not sure the sentence is required, since rte_eth_tx_descriptor_status()
is not included yet.

Regards,
Olivier
  

Patch

diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 34c62de..4cac26d 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -64,6 +64,9 @@ 
 
 master_doc = 'index'
 
+# Maximum feature description string length
+feature_str_len = 25
+
 # Figures, tables and code-blocks automatically numbered if they have caption
 numfig = True
 
@@ -300,7 +303,7 @@  def print_table_body(outfile, num_cols, ini_files, ini_data, default_features):
 def print_table_row(outfile, feature, line):
     """ Print a single row of the table with fixed formatting. """
     line = line.rstrip()
-    print('   {:<20}{}'.format(feature, line), file=outfile)
+    print('   {:<{}}{}'.format(feature, feature_str_len, line), file=outfile)
 
 
 def print_table_divider(outfile, num_cols):
@@ -309,7 +312,7 @@  def print_table_divider(outfile, num_cols):
     column_dividers = ['='] * num_cols
     line += ' '.join(column_dividers)
 
-    feature = '=' * 20
+    feature = '=' * feature_str_len
 
     print_table_row(outfile, feature, line)
 
diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index 299078f..0135c0c 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -3,7 +3,8 @@ 
 ;
 ; This file defines the features that are valid for inclusion in
 ; the other driver files and also the order that they appear in
-; the features table in the documentation.
+; the features table in the documentation. The feature description
+; string should not exceed feature_str_len defined in conf.py.
 ;
 [Features]
 Speed capabilities   =
@@ -11,6 +12,7 @@  Link status          =
 Link status event    =
 Queue status event   =
 Rx interrupt         =
+Free Tx mbuf on demand =
 Queue start/stop     =
 MTU update           =
 Jumbo frame          =
diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
index d4c92ea..21f7a9d 100644
--- a/doc/guides/prog_guide/poll_mode_drv.rst
+++ b/doc/guides/prog_guide/poll_mode_drv.rst
@@ -249,6 +249,34 @@  One descriptor in the TX ring is used as a sentinel to avoid a hardware race con
 
     When configuring for DCB operation, at port initialization, both the number of transmit queues and the number of receive queues must be set to 128.
 
+Free Tx mbuf on Demand
+~~~~~~~~~~~~~~~~~~~~~~
+
+Many of the drivers don't release the mbuf back to the mempool, or local cache, immediately after the packet has been
+transmitted.
+Instead, they leave the mbuf in their Tx ring and either perform a bulk release when the ``tx_rs_thresh`` has been
+crossed or free the mbuf when a slot in the Tx ring is needed.
+
+An application can request the driver to release used mbufs with the ``rte_eth_tx_done_cleanup()`` API.
+This API requests the driver to release mbufs that are no longer in use, independent of whether or not the
+``tx_rs_thresh`` has been crossed.
+There are two scenarios when an application may want the mbuf released immediately:
+
+* When a given packet needs to be sent to multiple destination interfaces (either for Layer 2 flooding or Layer 3
+  multi-cast).
+  One option is to make a copy of the packet or a copy of the header portion that needs to be manipulated.
+  A second option is to transmit the packet and then poll the ``rte_eth_tx_done_cleanup()`` API until the reference
+  count on the packet is decremented.
+  Then the same packet can be transmitted to the next destination interface.
+
+* If an application is designed to make multiple runs, like a packet generator, and one run has completed.
+  The application may want to reset to a clean state.
+  In this case, it may want to call the ``rte_eth_tx_done_cleanup()`` API to request each destination interface it has
+  been using to release all of its used mbufs.
+
+To determine if a driver supports this API, check for the *Free Tx mbuf on demand* feature in the *Network Interface
+Controller Drivers* document.
+
 Hardware Offload
 ~~~~~~~~~~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 4b90036..7b9c92c 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -41,11 +41,16 @@  New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
-
 * **Added powerpc support in pci probing for vfio-pci devices.**
 
   sPAPR IOMMU based pci probing enabled for vfio-pci devices.
 
+* **Added free Tx mbuf on demand API.**
+
+  Added a new function ``rte_eth_tx_done_cleanup()`` which allows an application
+  to request the driver to release mbufs from their Tx ring that are no longer
+  in use, independent of whether or not the ``tx_rs_thresh`` has been crossed.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index eb0a94a..b796e7d 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1275,6 +1275,20 @@  rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size)
 	return ret;
 }
 
+int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id, uint32_t free_cnt)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	/* Validate Input Data. Bail if not valid or not supported. */
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
+
+	/* Call driver to free pending mbufs. */
+	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
+			free_cnt);
+}
+
 void
 rte_eth_promiscuous_enable(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 4be217c..b3ee872 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1183,6 +1183,9 @@  typedef int (*eth_fw_version_get_t)(struct rte_eth_dev *dev,
 				     char *fw_version, size_t fw_size);
 /**< @internal Get firmware information of an Ethernet device. */
 
+typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
+/**< @internal Force mbufs to be from TX ring. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1488,6 +1491,7 @@  struct eth_dev_ops {
 	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt. */
 	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue. */
 	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
+	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
 
 	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
 	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
@@ -3178,6 +3182,33 @@  rte_eth_tx_buffer_count_callback(struct rte_mbuf **pkts, uint16_t unsent,
 		void *userdata);
 
 /**
+ * Request the driver to free mbufs currently cached by the driver. The
+ * driver will only free the mbuf if it is no longer in use. It is the
+ * application's responsibity to ensure rte_eth_tx_buffer_flush(..) is
+ * called if needed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param free_cnt
+ *   Maximum number of packets to free. Use 0 to indicate all possible packets
+ *   should be freed. Note that a packet may be using multiple mbufs.
+ * @return
+ *   Failure: < 0
+ *     -ENODEV: Invalid interface
+ *     -ENOTSUP: Driver does not support function
+ *   Success: >= 0
+ *     0-n: Number of packets freed. More packets may still remain in ring that
+ *     are in use.
+ */
+int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id, uint32_t free_cnt);
+
+/**
  * The eth device event type for interrupt, and maybe others in the future.
  */
 enum rte_eth_event_type {