[dpdk-dev] A question about (poor) rte_ethdev internal rx/tx callbacks design

Andrew Rybchenko arybchenko at solarflare.com
Tue Nov 14 07:24:27 CET 2017


On 11/13/2017 10:33 PM, Ilya Matveychikov wrote:
>> On Nov 13, 2017, at 9:15 PM, Adrien Mazarguil <adrien.mazarguil at 6wind.com> wrote:
>>
>> On Mon, Nov 13, 2017 at 02:56:23PM +0400, Ilya Matveychikov wrote:
>>>> On Nov 13, 2017, at 2:39 PM, Adrien Mazarguil <adrien.mazarguil at 6wind.com> wrote:
>>>>
>>>> On Sat, Nov 11, 2017 at 09:18:45PM +0400, Ilya Matveychikov wrote:
>>>>> Folks,
>>>>>
>>>>> Are you serious with it:
>>>>>
>>>>> typedef uint16_t (*eth_rx_burst_t)(void *rxq,
>>>>> 				   struct rte_mbuf **rx_pkts,
>>>>> 				   uint16_t nb_pkts);
>>>>> typedef uint16_t (*eth_tx_burst_t)(void *txq,
>>>>> 				   struct rte_mbuf **tx_pkts,
>>>>> 				   uint16_t nb_pkts);
>>>>>
>>>>> I’m not surprised that every PMD stores port_id in every and each queue as having just the queue as an argument doesn’t allow to get the device. So the question is - why not to use something like:
>>>>>
>>>>> typedef uint16_t (*eth_rx_burst_t)(void *dev, uint16_t queue_id,
>>>>> 				   struct rte_mbuf **rx_pkts,
>>>>> 				   uint16_t nb_pkts);
>>>>> typedef uint16_t (*eth_tx_burst_t)(void *dev, uint16_t queue_id,
>>>>> 				   struct rte_mbuf **tx_pkts,
>>>>> 				   uint16_t nb_pkts);
>>>> I assume it's since the rte_eth_[rt]x_burst() wrappers already pay the price
>>>> for that indirection, doing it twice would be redundant.
>>> No need to do it twice, agree. We can pass dev pointer as well as queue, not just the queue’s
>>> index.
>>>
>>>> Basically the cost of storing a back-pointer to dev or a queue index in each
>>>> Rx/Tx queue structure is minor compared to saving a couple of CPU cycles
>>>> wherever we can.
>>> Not sure about it. More data to store - more cache space to occupy. Note that every queue has
>>> at least 4 bytes more than it actually needs. And RTE_MAX_QUEUES_PER_PORT is defined
>>> by it’s default to 1024. So we may have 4k extra for each port....
>> Note that queues are only allocated if requested by application, there's
>> really not much overhead involved.
> Yeah, mostly you are right here.
>
>> Also to echo Konstantin's reply and clarify mine, PMDs normally do not
>> access this structure from their data plane. This pointer, if needed, is
>> normally stored away from hot regions accessed during TX/RX, usually at the
>> end of TX/RX structures and only for the convenience of management
>> operations. It therefore has no measurable impact on the CPU cache.
>>
> I did a research of how drivers implements rx/tx queues and now I want to share the information
> and some thoughts about it (see the info at the end):
>
> 1) All drivers have tx/rx queues defined as structures
> 2) Current design implies that it’s enough to pass opaque rx/tx queue to the driver and frankly
>     speaking it is. But..
> 3) Most of drivers wants to get not only the queue’s pointer but at least queue_id and port_id and
>     most of them wants to have the pointer to internal devices’ data also.
>
> The way each driver solves (3) issue is data duplication. In other words, every queue used to have
> such the information (queue_id, port_id and dev_priv pointer) inside.
>
> My question was and still about such the design. Not sure that it’s the best way to do it keeping in
> mind that queue_id may be calculated using pointer difference and port_id may be stored just only
> once per device. But it’ll require to change internal interface, sure.
>
> And as I promised here is the result of the research on rx/tx queues:

<...>

> drivers/net/sfq:
>    struct sfc_ef10_rxq { dp { queue_id, port_id } }
>    struct sfc_ef10_txq { dp { queue_id, port_id } }

Which are not used on data/fast path (as Adrien state above). So, it 
does not affect cache usage efficient etc.


More information about the dev mailing list