[PATCH 1/1] mempool: implement index-based per core cache

Ananyev, Konstantin konstantin.ananyev at intel.com
Thu Jan 13 11:37:29 CET 2022


Hi Dharmik,

> >
> >> Current mempool per core cache implementation stores pointers to mbufs
> >> On 64b architectures, each pointer consumes 8B
> >> This patch replaces it with index-based implementation,
> >> where in each buffer is addressed by (pool base address + index)
> >> It reduces the amount of memory/cache required for per core cache
> >>
> >> L3Fwd performance testing reveals minor improvements in the cache
> >> performance (L1 and L2 misses reduced by 0.60%)
> >> with no change in throughput
> >
> > I feel really sceptical about that patch and the whole idea in general:
> > - From what I read above there is no real performance improvement observed.
> >  (In fact on my IA boxes mempool_perf_autotest reports ~20% slowdown,
> >  see below for more details).
> 
> Currently, the optimizations (loop unroll and vectorization) are only implemented for ARM64.
> Similar optimizations can be implemented for x86 platforms which should close the performance gap
> and in my understanding should give better performance for a bulk size of 32.

Might be, but I still don't see the reason for such effort.
As you mentioned there is no performance improvement in 'real' apps: l3fwd, etc.
on ARM64 even with vectorized version of the code.

> > - Space utilization difference looks neglectable too.
> 
> Sorry, I did not understand this point.

As I understand one of the expectations from that patch was:
reduce memory/cache required, which should improve cache utilization
(less misses, etc.).
Though I think such improvements would be neglectable and wouldn't
cause any real performance gain. 

> > - The change introduces a new build time config option with a major limitation:
> >   All memzones in a pool have to be within the same 4GB boundary.
> >   To address it properly, extra changes will be required in init(/populate) part of the code.
> 
> I agree to the above mentioned challenges and I am currently working on resolving these issues.

I still think that to justify such changes some really noticeable performance
improvement needs to be demonstrated: double-digit speedup for l3fwd/ipsec-secgw/...  
Otherwise it just not worth the hassle. 
 
> >   All that will complicate mempool code, will make it more error prone
> >   and harder to maintain.
> > But, as there is no real gain in return - no point to add such extra complexity at all.
> >
> > Konstantin
> >


More information about the dev mailing list