[dpdk-dev] [RFC] mempool: implement index-based per core cache

Dharmik Thakkar Dharmik.Thakkar at arm.com
Wed Nov 3 16:12:45 CET 2021

Previous message (by thread): [dpdk-dev] [EXT] [PATCH] crypto/qat: fix access to null pointer
Next message (by thread): [dpdk-dev] [RFC] mempool: implement index-based per core cache
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

Thank you everyone for the comments! I am currently working on making the global pool ring’s implementation as index based.
Once done, I will send a patch for community review. I will also make it as a compile time option.

> On Oct 31, 2021, at 3:14 AM, Morten Brørup <mb at smartsharesystems.com> wrote:
> 
>> From: Morten Brørup
>> Sent: Saturday, 30 October 2021 12.24
>> 
>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Honnappa
>>> Nagarahalli
>>> Sent: Monday, 4 October 2021 18.36
>>> 
>>> <snip>
>>>> 
>>>> 
>>>>>>> Current mempool per core cache implementation is based on
>>> pointer
>>>>>>> For most architectures, each pointer consumes 64b Replace it
>>> with
>>>>>>> index-based implementation, where in each buffer is addressed
>>> by
>>>>>>> (pool address + index)
>> 
>> I like Dharmik's suggestion very much. CPU cache is a critical and
>> limited resource.
>> 
>> DPDK has a tendency of using pointers where indexes could be used
>> instead. I suppose pointers provide the additional flexibility of
>> mixing entries from different memory pools, e.g. multiple mbuf pools.
>> 

Agreed, thank you!

>>>>>> 
>>>>>> I don't think it is going to work:
>>>>>> On 64-bit systems difference between pool address and it's elem
>>>>>> address could be bigger than 4GB.
>>>>> Are you talking about a case where the memory pool size is more
>>> than 4GB?
>>>> 
>>>> That is one possible scenario.
>> 
>> That could be solved by making the index an element index instead of a
>> pointer offset: address = (pool address + index * element size).
> 
> Or instead of scaling the index with the element size, which is only known at runtime, the index could be more efficiently scaled by a compile time constant such as RTE_MEMPOOL_ALIGN (= RTE_CACHE_LINE_SIZE). With a cache line size of 64 byte, that would allow indexing into mempools up to 256 GB in size.
> 

Looking at this snippet [1] from rte_mempool_op_populate_helper(), there is an ‘offset’ added to avoid objects to cross page boundaries. If my understanding is correct, using the index of element instead of a pointer offset will pose a challenge for some of the corner cases.

[1]
        for (i = 0; i < max_objs; i++) {                                           
                /* avoid objects to cross page boundaries */
                if (check_obj_bounds(va + off, pg_sz, total_elt_sz) < 0) {
                        off += RTE_PTR_ALIGN_CEIL(va + off, pg_sz) - (va + off);
                        if (flags & RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ)
                                off += total_elt_sz -
                                        (((uintptr_t)(va + off - 1) %
                                                total_elt_sz) + 1);
                }

>> 
>>>> Another possibility - user populates mempool himself with some
>>> external
>>>> memory by calling rte_mempool_populate_iova() directly.
>>> Is the concern that IOVA might not be contiguous for all the memory
>>> used by the mempool?
>>> 
>>>> I suppose such situation can even occur even with normal
>>>> rte_mempool_create(), though it should be a really rare one.
>>> All in all, this feature needs to be configurable during compile
>> time.
>

Previous message (by thread): [dpdk-dev] [EXT] [PATCH] crypto/qat: fix access to null pointer
Next message (by thread): [dpdk-dev] [RFC] mempool: implement index-based per core cache
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list