[dpdk-dev] [RFC 0/9] get Rx and Tx used descriptors

Olivier Matz olivier.matz at 6wind.com
Fri Jan 13 17:44:09 CET 2017


Hi,

On Thu, 24 Nov 2016 10:54:12 +0100, Olivier Matz
<olivier.matz at 6wind.com> wrote:
> This RFC patchset introduces a new ethdev API function
> rte_eth_tx_queue_count() which is the tx counterpart of
> rte_eth_rx_queue_count(). It implements this API on some
> Intel drivers for reference, and it also optimizes the
> implementation of rte_eth_rx_queue_count().
> 

I'm planning to send a new version of this patchset, fixing the issues
seen by Ferruh, plus a bug fix in the e1000 implementation.

Does anyone have any comment about the new API or about questions
raised in the cover letter? Especially about the real meaning of "used
descriptor": should it include the descriptors hold by the driver?

Any comment about the method (binary search to find the used
descriptors)?

I'm also wondering about adding rte_eth_tx_descriptor_done() in the API
at the same time.

Regards,
Olivier




> The usage of these functions can be:
> - on Rx, anticipate that the cpu is not fast enough to process
>   all incoming packets, and take dispositions to solve the
>   problem (add more cpus, drop specific packets, ...)
> - on Tx, detect that the link is overloaded, and take dispositions
>   to solve the problem (notify flow control, drop specific
>   packets)
> 
> The tests I've done (instrumenting testpmd) show that browsing
> the descriptors linearly is slow when the ring size increases.
> Accessing the head/tail registers through pci is also slow
> whatever the size of the ring. A binary search is a good compromise
> that gives quite good performance whatever the size of the ring.
> 
> Remaining question are about:
> - should we keep this name? I'd say "queue_count" is quite confusing,
>   and I would expect it returns the number of queues, not the
>   number of used descriptors
> - how shall we count the used descriptors, knowing that the driver
>   can hold some to free them by bulk, which reduces the effective
>   size of the ring
> 
> I would be happy to have some feedback about this RFC before
> I send it as a patch.
> 
> Here are some helpers to understand the code more easily (I sometimes
> make some shortcuts between like 1 pkt == 1 desc).
> 
> RX side
> =======
> 
> - sw advances the tail pointer
> - hw advances the head pointer
> - the software populates the ring with descs to buffers that are
> filled when the hw receives packets
> - head == tail means there is no available buffer for hw to receive a
> packet
> - head points to the next descriptor to be filled
> - hw owns all descriptors between [head...tail]
> - when a packet is written in a descriptor, the DD (descriptor done)
>   bit is set, and the head is advanced
> - the driver never reads the head (needs a pci transaction), instead
> it monitors the DD bit of next descriptor
> - when a filled packet is retrieved by the software, the descriptor
> has to be populated with a new empty buffer. This is not done for each
>   packet: the driver holds them and waits until it has many
> descriptors to populate, and do it by bulk.
>   (by the way, it means that the effective size a queue of size=N is
>   lower than N since these descriptors cannot be used by the hw)
> 
> rxq->rx_tail: current value of the sw tail (the idx of the next
> packet to be received). The real tail (hw) can be different since the
> driver can hold descriptors.
> rxq->nb_rx_hold: number of held descriptors
> rxq->rxrearm_nb: same, but for vector driver
> rxq->rx_free_thresh: when the number of held descriptors reaches this
> threshold, descriptors are populated with buffers to be filled, and
> sw advances the tail
> 
> Example with a ring size of 64:
> 
> |----------------------------------------------------------------|
> |                    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx          |
> |                    x buffers filled with data by hw x          |
> |                    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx          |
> |----------------------------------------------------------------|
>                      ^hw_tail=20
>                                     ^sw_tail=35
>                                                        ^hw_head=54
>                      <--- nb_hold -->
>                                     <-pkts in hw queue->
> 
> The descriptors marked with 'x' has their DD bit set, the other
>   (' ') reference empty buffers.
> The next packet to be received by software is at index 35.
> The software holds 15 descriptors that will be rearmed later.
> There are 19 packets waiting in the hw queue.
> 
> We want the function rx_queue_count() to return the number of
> "used" descriptors. The first question is: what does that mean
> exactly? Should it be pkts_in_hw_queue or pkts_in_hw_queue + nb_hold?
> The current implementation returns pkts_in_hw_queue, but including
> nb_hold could be useful to know how many descriptors are really
> free (= size - used).
> 
> The current implementation checks the DD bit starting from sw_tail,
> every 4 packets. It can be quite slow for large rings. An alternative
> is to read the head register, but it's also slow.
> 
> This patchset optimizes rx_queue_count() by doing a binary
> search (checking for DD) between sw_tail and hw_tail, instead of a
> linear search.
> 
> TX side
> =======
> 
> - sw advances the tail pointer
> - hw advances the head pointer
> - the software populates the ring with full buffers to be sent by
>   the hw
> - head points to the in-progress descriptor.
> - sw writes new descriptors at tail
> - head == tail means that the transmit queue is empty
> - when the hw has processed a descriptor, it sets the DD bit if
>   the descriptor has the RS (report status) bit.
> - the driver never reads the head (needs a pci transaction), instead
> it monitors the DD bit of a descriptor that has the RS bit
> 
> txq->tx_tail: sw value for tail register
> txq->tx_free_thresh: free buffers if count(free descriptors) < this
> value txq->tx_rs_thresh: RS bit is set every X descriptor
> txq->tx_next_dd: next desc to scan for DD bit
> txq->tx_next_rs: next desc to set RS bit
> txq->last_desc_cleaned: last descriptor that have been cleaned
> txq->nb_tx_free: number of free descriptors
> 
> Example:
> 
> |----------------------------------------------------------------|
> |               D       R       R       R                        |
> |        ............xxxxxxxxxxxxxxxxxxxxxxxxx                   |
> |        <descs sent><- descs not sent yet  ->                   |
> |        ............xxxxxxxxxxxxxxxxxxxxxxxxx                   |
> |----------------------------------------------------------------|
>         ^last_desc_cleaned=8                    ^next_rs=47
>                 ^next_dd=15                   ^tail=45
>                      ^hw_head=20
> 
>                      <----  nb_used  --------->
> 
> The hardware is currently processing the descriptor 20
> 'R' means the descriptor has the RS bit
> 'D' means the descriptor has the DD + RS bits
> 'x' are packets in txq (not sent)
> '.' are packet already sent but not freed by sw
> 
> In this example, we have rs_thres=8. On next call to
> ixgbe_tx_free_bufs(), some buffers will be freed.
> 
> The new implementation does a binary search (checking for DD) between
> next_dd and tail.
> 
> 
> 
> Olivier Matz (9):
>   ethdev: clarify api comments of rx queue count
>   ethdev: move queue id check in generic layer
>   ethdev: add handler for Tx queue descriptor count
>   net/ixgbe: optimize Rx queue descriptor count
>   net/ixgbe: add handler for Tx queue descriptor count
>   net/igb: optimize rx queue descriptor count
>   net/igb: add handler for tx queue descriptor count
>   net/e1000: optimize rx queue descriptor count
>   net/e1000: add handler for tx queue descriptor count
> 
>  drivers/net/e1000/e1000_ethdev.h |  10 +++-
>  drivers/net/e1000/em_ethdev.c    |   1 +
>  drivers/net/e1000/em_rxtx.c      | 109
> ++++++++++++++++++++++++++++------ drivers/net/e1000/igb_ethdev.c
> |   1 + drivers/net/e1000/igb_rxtx.c     | 109
> ++++++++++++++++++++++++++++------ drivers/net/i40e/i40e_rxtx.c
> |   5 -- drivers/net/ixgbe/ixgbe_ethdev.c |   1 +
>  drivers/net/ixgbe/ixgbe_ethdev.h |   4 +-
>  drivers/net/ixgbe/ixgbe_rxtx.c   | 123
> +++++++++++++++++++++++++++++++++------
> drivers/net/ixgbe/ixgbe_rxtx.h   |   2 +
> drivers/net/nfp/nfp_net.c        |   6 --
> lib/librte_ether/rte_ethdev.h    |  48 +++++++++++++-- 12 files
> changed, 344 insertions(+), 75 deletions(-)
> 



More information about the dev mailing list