[dpdk-dev] [RFC 0/9] get Rx and Tx used descriptors

Olivier Matz olivier.matz at 6wind.com
Thu Nov 24 10:54:12 CET 2016


This RFC patchset introduces a new ethdev API function
rte_eth_tx_queue_count() which is the tx counterpart of
rte_eth_rx_queue_count(). It implements this API on some
Intel drivers for reference, and it also optimizes the
implementation of rte_eth_rx_queue_count().

The usage of these functions can be:
- on Rx, anticipate that the cpu is not fast enough to process
  all incoming packets, and take dispositions to solve the
  problem (add more cpus, drop specific packets, ...)
- on Tx, detect that the link is overloaded, and take dispositions
  to solve the problem (notify flow control, drop specific
  packets)

The tests I've done (instrumenting testpmd) show that browsing
the descriptors linearly is slow when the ring size increases.
Accessing the head/tail registers through pci is also slow
whatever the size of the ring. A binary search is a good compromise
that gives quite good performance whatever the size of the ring.

Remaining question are about:
- should we keep this name? I'd say "queue_count" is quite confusing,
  and I would expect it returns the number of queues, not the
  number of used descriptors
- how shall we count the used descriptors, knowing that the driver
  can hold some to free them by bulk, which reduces the effective
  size of the ring

I would be happy to have some feedback about this RFC before
I send it as a patch.

Here are some helpers to understand the code more easily (I sometimes
make some shortcuts between like 1 pkt == 1 desc).

RX side
=======

- sw advances the tail pointer
- hw advances the head pointer
- the software populates the ring with descs to buffers that are filled
  when the hw receives packets
- head == tail means there is no available buffer for hw to receive a packet
- head points to the next descriptor to be filled
- hw owns all descriptors between [head...tail]
- when a packet is written in a descriptor, the DD (descriptor done)
  bit is set, and the head is advanced
- the driver never reads the head (needs a pci transaction), instead it
  monitors the DD bit of next descriptor
- when a filled packet is retrieved by the software, the descriptor has
  to be populated with a new empty buffer. This is not done for each
  packet: the driver holds them and waits until it has many descriptors
  to populate, and do it by bulk.
  (by the way, it means that the effective size a queue of size=N is
  lower than N since these descriptors cannot be used by the hw)

rxq->rx_tail: current value of the sw tail (the idx of the next packet to
  be received). The real tail (hw) can be different since the driver can
  hold descriptors.
rxq->nb_rx_hold: number of held descriptors
rxq->rxrearm_nb: same, but for vector driver
rxq->rx_free_thresh: when the number of held descriptors reaches this threshold,
  descriptors are populated with buffers to be filled, and sw advances the tail

Example with a ring size of 64:

|----------------------------------------------------------------|
|                    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx          |
|                    x buffers filled with data by hw x          |
|                    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx          |
|----------------------------------------------------------------|
                     ^hw_tail=20
                                    ^sw_tail=35
                                                       ^hw_head=54
                     <--- nb_hold -->
                                    <-pkts in hw queue->

The descriptors marked with 'x' has their DD bit set, the other
  (' ') reference empty buffers.
The next packet to be received by software is at index 35.
The software holds 15 descriptors that will be rearmed later.
There are 19 packets waiting in the hw queue.

We want the function rx_queue_count() to return the number of
"used" descriptors. The first question is: what does that mean
exactly? Should it be pkts_in_hw_queue or pkts_in_hw_queue + nb_hold?
The current implementation returns pkts_in_hw_queue, but including
nb_hold could be useful to know how many descriptors are really
free (= size - used).

The current implementation checks the DD bit starting from sw_tail,
every 4 packets. It can be quite slow for large rings. An alternative
is to read the head register, but it's also slow.

This patchset optimizes rx_queue_count() by doing a binary
search (checking for DD) between sw_tail and hw_tail, instead of a
linear search.

TX side
=======

- sw advances the tail pointer
- hw advances the head pointer
- the software populates the ring with full buffers to be sent by
  the hw
- head points to the in-progress descriptor.
- sw writes new descriptors at tail
- head == tail means that the transmit queue is empty
- when the hw has processed a descriptor, it sets the DD bit if
  the descriptor has the RS (report status) bit.
- the driver never reads the head (needs a pci transaction), instead it
  monitors the DD bit of a descriptor that has the RS bit

txq->tx_tail: sw value for tail register
txq->tx_free_thresh: free buffers if count(free descriptors) < this value
txq->tx_rs_thresh: RS bit is set every X descriptor
txq->tx_next_dd: next desc to scan for DD bit
txq->tx_next_rs: next desc to set RS bit
txq->last_desc_cleaned: last descriptor that have been cleaned
txq->nb_tx_free: number of free descriptors

Example:

|----------------------------------------------------------------|
|               D       R       R       R                        |
|        ............xxxxxxxxxxxxxxxxxxxxxxxxx                   |
|        <descs sent><- descs not sent yet  ->                   |
|        ............xxxxxxxxxxxxxxxxxxxxxxxxx                   |
|----------------------------------------------------------------|
        ^last_desc_cleaned=8                    ^next_rs=47
                ^next_dd=15                   ^tail=45
                     ^hw_head=20

                     <----  nb_used  --------->

The hardware is currently processing the descriptor 20
'R' means the descriptor has the RS bit
'D' means the descriptor has the DD + RS bits
'x' are packets in txq (not sent)
'.' are packet already sent but not freed by sw

In this example, we have rs_thres=8. On next call to ixgbe_tx_free_bufs(),
some buffers will be freed.

The new implementation does a binary search (checking for DD) between next_dd
and tail.



Olivier Matz (9):
  ethdev: clarify api comments of rx queue count
  ethdev: move queue id check in generic layer
  ethdev: add handler for Tx queue descriptor count
  net/ixgbe: optimize Rx queue descriptor count
  net/ixgbe: add handler for Tx queue descriptor count
  net/igb: optimize rx queue descriptor count
  net/igb: add handler for tx queue descriptor count
  net/e1000: optimize rx queue descriptor count
  net/e1000: add handler for tx queue descriptor count

 drivers/net/e1000/e1000_ethdev.h |  10 +++-
 drivers/net/e1000/em_ethdev.c    |   1 +
 drivers/net/e1000/em_rxtx.c      | 109 ++++++++++++++++++++++++++++------
 drivers/net/e1000/igb_ethdev.c   |   1 +
 drivers/net/e1000/igb_rxtx.c     | 109 ++++++++++++++++++++++++++++------
 drivers/net/i40e/i40e_rxtx.c     |   5 --
 drivers/net/ixgbe/ixgbe_ethdev.c |   1 +
 drivers/net/ixgbe/ixgbe_ethdev.h |   4 +-
 drivers/net/ixgbe/ixgbe_rxtx.c   | 123 +++++++++++++++++++++++++++++++++------
 drivers/net/ixgbe/ixgbe_rxtx.h   |   2 +
 drivers/net/nfp/nfp_net.c        |   6 --
 lib/librte_ether/rte_ethdev.h    |  48 +++++++++++++--
 12 files changed, 344 insertions(+), 75 deletions(-)

-- 
2.8.1



More information about the dev mailing list