[dpdk-dev] [PATCH] doc: announce changes to ethdev rxconf structure

Andrew Rybchenko arybchenko at solarflare.com
Sun Aug 30 14:58:57 CEST 2020


On 8/3/20 7:51 PM, Slava Ovsiienko wrote:
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko at solarflare.com>
>> Sent: Monday, August 3, 2020 18:31
>> To: Slava Ovsiienko <viacheslavo at mellanox.com>; dev at dpdk.org
>> Cc: Matan Azrad <matan at mellanox.com>; Raslan Darawsheh
>> <rasland at mellanox.com>; Thomas Monjalon <thomas at monjalon.net>;
>> ferruh.yigit at intel.com; jerinjacobk at gmail.com;
>> stephen at networkplumber.org; ajit.khaparde at broadcom.com;
>> maxime.coquelin at redhat.com; olivier.matz at 6wind.com;
>> david.marchand at redhat.com
>> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure
>>
>> Hi Slava,
>>
>> On 8/3/20 6:18 PM, Slava Ovsiienko wrote:
>>> Hi, Andrew
>>>
>>> Thanks for the comment, please, see below.
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <arybchenko at solarflare.com>
>>>> Sent: Monday, August 3, 2020 17:31
>>>> To: Slava Ovsiienko <viacheslavo at mellanox.com>; dev at dpdk.org
>>>> Cc: Matan Azrad <matan at mellanox.com>; Raslan Darawsheh
>>>> <rasland at mellanox.com>; Thomas Monjalon <thomas at monjalon.net>;
>>>> ferruh.yigit at intel.com; jerinjacobk at gmail.com;
>>>> stephen at networkplumber.org; ajit.khaparde at broadcom.com;
>>>> maxime.coquelin at redhat.com; olivier.matz at 6wind.com;
>>>> david.marchand at redhat.com
>>>> Subject: Re: ***Spam*** [PATCH] doc: announce changes to ethdev
>>>> rxconf structure
>>>>
>>>> On 8/3/20 1:58 PM, Viacheslav Ovsiienko wrote:
>>>>> The DPDK datapath in the transmit direction is very flexible.
>>>>> The applications can build multisegment packets and manages almost
>>>>> all data aspects - the memory pools where segments are allocated
>>>>> from, the segment lengths, the memory attributes like external,
>> registered, etc.
>>>>>
>>>>> In the receiving direction, the datapath is much less flexible, the
>>>>> applications can only specify the memory pool to configure the
>>>>> receiving queue and nothing more. In order to extend the receiving
>>>>> datapath capabilities it is proposed to add the new fields into
>>>>> rte_eth_rxconf structure:
>>>>>
>>>>> struct rte_eth_rxconf {
>>>>>     ...
>>>>>     uint16_t rx_split_num; /* number of segments to split */
>>>>>     uint16_t *rx_split_len; /* array of segment lengthes */
>>>>>     struct rte_mempool **mp; /* array of segment memory pools */
>>>>>     ...
>>>>> };
>>>>>
>>>>> The non-zero value of rx_split_num field configures the receiving
>>>>> queue to split ingress packets into multiple segments to the mbufs
>>>>> allocated from various memory pools according to the specified
>>>>> lengths. The zero value of rx_split_num field provides the backward
>>>>> compatibility and queue should be configured in a regular way (with
>>>>> single/multiple mbufs of the same data buffer length allocated from
>>>>> the single memory pool).
>>>>
>>>> From the above description it is not 100% clear how it will coexist with:
>>>>  - existing mb_pool argument of the rte_eth_rx_queue_setup()
>>>>  - DEV_RX_OFFLOAD_SCATTER
>>>
>>> DEV_RX_OFFLOAD_SCATTER flag is required to be reported and configured
>>> for the new feature to indicate the application is prepared for the
>>> multisegment packets.
>>
>> I hope it will be mentioned in the feature documentation in the future, but
>> I'm not 100% sure that it is required. See below.
> I suppose there is the hierarchy:
> - applications configures DEV_RX_OFFLOAD_SCATTER on the port and tells in this way:
> "Hey, driver, I'm ready to handle multi-segment packets". Readiness in general.
> - application configures BUFFER_SPLIT and tells PMD _HOW_ it wants to split, in particular way:
> "Hey, driver, please, drop ten bytes here, here and here, and the rest - over there"

My idea is to keep SCATTER and BUFFER_SPLIT independent.
SCATTER is a possibility to make multi-segment packets getting
mbufs from main rxq mempool as many as required.
BUFFER_SPLIT is support of many mempools and splitting
received packets as specified.

>>>
>>> But SCATTER it just tells that ingress packet length can exceed the
>>> mbuf data buffer length and the chain of mbufs must be built to store
>>> the entire packet. But there is the limitation - all mbufs are
>>> allocated  from the same memory pool, and all data buffers have the same
>> length.
>>> The new feature provides an opportunity to allocated mbufs from the
>>> desired pools and specifies the length of each buffer/part.
>>
>> Yes, it is clear, but what happens if packet does not fit into the provided
>> pools chain? Is the last used many times? May be it logical to use Rx queue
>> setup mb_pool as well for the purpose? I.e. use suggested here pools only
>> once and use mb_pool many times for the rest if SCATTER is supported and
>> only once if SCATTER is not supported.
> 
> It could be easily configured w/o involving SCATTER flag - just specify the last pool
> multiple times. I.e.
> pool 0 - 14B
> pool 1 - 20B
> ...
> pool N - 512B
> pool N - 512B
> pool N - 512B, sum of length >= max packet size 1518

I see, but IMHO it is excessive. pools 0 .. N-1 could be
provided in buffer split config, Nth should be in main
RxQ mempool plus SCATTER enabled.

> It was supposed the sum of lengths in array covers the maximal packet size.
> Currently there is the limitation on packet size, for example mlx5 PMD 
> just drops the packets with the length exceeded the one queue is configured for.
> 
>>
>>>
>>>>  - DEV_RX_OFFLOAD_HEADER_SPLIT
>>> The new feature (let's name it as "BUFFER_SPLIT") might be supported
>>> in conjunction with HEADER_SPLIT (say, split the rest of the data
>>> after the header) or rejected if HEADER_SPLIT is configured on the
>>> port, depending on PMD implementation (return ENOTSUP if both features
>> are requested on the same port).
>>
>> OK, consider to make SCATTER and BUFFER_SPLIT independent as suggested
>> above.
> Sorry, do you mean HEADER_SPLIT and BUFFER_SPLIT?

See above.

>>
>>>
>>>> How will application know that the feature is supported? Limitations?
>>> It is subject for further discussion, I see two options:
>>>  - introduce the DEV_RX_OFFLOAD_BUFFER_SPLIT flag
>>
>> +1
> OK, got it.
> 
>>
>>> - return ENOTSUP/EINVAL from rx_queue_setup() if feature is requested
>>>   (mp parameter is supposed to be NULL for the case)
>>
>> I'd say that it should be used for corner cases only which are hard to
>> formalize. It could be important to know maximum number of buffers to
>> split, total length which could be split from the remaining, limitations on split
>> lengths.
> Agree, the dedicated OFFLOAD flag seems to be preferable.
> 
> With best regards, Slava
> 
>>
>>>
>>>> Is it always split by specified/fixed length?
>>> Yes, it is simple feature, it splits the data to the buffers with
>>> required memory attributes provided by specified pools according to the
>> fixed lengths.
>>> It should be OK for protocols like eCPRI or some tunneling.
>>
>> I see. Thanks.
>>
>>>
>>>> What happens if header length is actually different?
>>> It is per queue configuration, packets might be sorted with rte_flow engine
>> between the queues.
>>> The supposed use case is to filter out specific protocol packets (say
>>> eCPRI with fixed header length) and split ones on specific Rx queue.
>>
>> Got it.
>>
>> Thanks,
>> Andrew.
>>
>>>
>>>
>>> With best regards,
>>> Slava
>>>
>>>>
>>>>> The new approach would allow splitting the ingress packets into
>>>>> multiple parts pushed to the memory with different attributes.
>>>>> For example, the packet headers can be pushed to the embedded data
>>>>> buffers within mbufs and the application data into the external
>>>>> buffers attached to mbufs allocated from the different memory pools.
>>>>> The memory attributes for the split parts may differ either - for
>>>>> example the application data may be pushed into the external memory
>>>>> located on the dedicated physical device, say GPU or NVMe. This
>>>>> would improve the DPDK receiving datapath flexibility preserving
>>>>> compatibility with existing API.
>>>>>
>>>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo at mellanox.com>
>>>>> ---
>>>>>  doc/guides/rel_notes/deprecation.rst | 5 +++++
>>>>>  1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>>>> b/doc/guides/rel_notes/deprecation.rst
>>>>> index ea4cfa7..cd700ae 100644
>>>>> --- a/doc/guides/rel_notes/deprecation.rst
>>>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>>>> @@ -99,6 +99,11 @@ Deprecation Notices
>>>>>    In 19.11 PMDs will still update the field even when the offload is not
>>>>>    enabled.
>>>>>
>>>>> +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the
>>>>> +receiving
>>>>> +  queues to split ingress packets into multiple segments according
>>>>> +to the
>>>>> +  specified lengths into the buffers allocated from the specified
>>>>> +  memory pools. The backward compatibility to existing API is
>> preserved.
>>>>> +
>>>>>  * ethdev: ``rx_descriptor_done`` dev_ops and
>>>> ``rte_eth_rx_descriptor_done``
>>>>>    will be deprecated in 20.11 and will be removed in 21.11.
>>>>>    Existing ``rte_eth_rx_descriptor_status`` and
>>>>> ``rte_eth_tx_descriptor_status``
>>>
> 



More information about the dev mailing list