Bug in rte_mempool_do_generic_get?

Morten Brørup mb at smartsharesystems.com
Fri Feb 24 13:13:19 CET 2023


> From: Harris, James R [mailto:james.r.harris at intel.com] 
> Sent: Friday, 24 February 2023 04.03
> 
> Hi,
> 
> I've tracked down a regression in SPDK to DPDK commit a2833ecc5 ("mempool: fix get objects from mempool with cache").

The problem probably goes all the way back to the introduction of the cache flush threshold, which effectively increased the cache size to 1.5 times the configured cache size, in this commit:
http://git.dpdk.org/dpdk/commit/lib/librte_mempool/rte_mempool.h?id=ea5dd2744b90b330f07fd10f327ab99ef55c7266

It might even go further back.

> 
> Here's an example that demonstrates the problem:
> 
> Allocate mempool with 2048 buffers and cache size 256.
> Core 0 allocates 512 buffers.  Mempool pulls 512 + 256 buffers from backing ring, returns 512 of them to caller, puts the other 256 in core 0 cache.  Backing ring now has 1280 buffers.
> Core 1 allocates 512 buffers.  Mempool pulls 512 + 256 buffers from backing ring, returns 512 of them to caller, puts the other 256 in core 1 cache.  Backing ring now has 512 buffers.
> Core 2 allocates 512 buffers.  Mempool pulls remaining 512 buffers from backing ring and returns all of them to caller.  Backing ring now has 0 buffers.
> Core 3 tries to allocate 512 buffers and it fails.
> 
> In the SPDK case, we don't really need or use the mempool cache in this case, so changing the cache size to 0 fixes the problem and is what we're going to move forward with.

If you are not making get/put requests smaller than the cache size, then yes, having no cache is the best solution.

> 
> But the behavior did cause a regression so I thought I'd mention it here.

Thank you.

> If you have a mempool with 2048 objects, shouldn't 4 cores each be able to do a 512 buffer bulk get, regardless of the configured cache size?

No, the scenario you described above is the expected behavior. I think it is documented somewhere that objects in the caches are unavailable for other cores, but now I cannot find where this is documented.


Furthermore, since the effective per-core cache size is 1.5 * configured cache size, a configured cache size of 256 may leave up to 384 objects in each per-core cache.

With 4 cores, you can expect up to 3 * 384 = 1152 objects sitting in the caches of other cores. If you want to be able to pull 512 objects with each core, the pool size should be 4 * 512 + 1152 objects.



More information about the dev mailing list