[PATCH 1/1] mempool: implement index-based per core cache
Ananyev, Konstantin
konstantin.ananyev at intel.com
Thu Jan 13 11:37:29 CET 2022
Hi Dharmik,
> >
> >> Current mempool per core cache implementation stores pointers to mbufs
> >> On 64b architectures, each pointer consumes 8B
> >> This patch replaces it with index-based implementation,
> >> where in each buffer is addressed by (pool base address + index)
> >> It reduces the amount of memory/cache required for per core cache
> >>
> >> L3Fwd performance testing reveals minor improvements in the cache
> >> performance (L1 and L2 misses reduced by 0.60%)
> >> with no change in throughput
> >
> > I feel really sceptical about that patch and the whole idea in general:
> > - From what I read above there is no real performance improvement observed.
> > (In fact on my IA boxes mempool_perf_autotest reports ~20% slowdown,
> > see below for more details).
>
> Currently, the optimizations (loop unroll and vectorization) are only implemented for ARM64.
> Similar optimizations can be implemented for x86 platforms which should close the performance gap
> and in my understanding should give better performance for a bulk size of 32.
Might be, but I still don't see the reason for such effort.
As you mentioned there is no performance improvement in 'real' apps: l3fwd, etc.
on ARM64 even with vectorized version of the code.
> > - Space utilization difference looks neglectable too.
>
> Sorry, I did not understand this point.
As I understand one of the expectations from that patch was:
reduce memory/cache required, which should improve cache utilization
(less misses, etc.).
Though I think such improvements would be neglectable and wouldn't
cause any real performance gain.
> > - The change introduces a new build time config option with a major limitation:
> > All memzones in a pool have to be within the same 4GB boundary.
> > To address it properly, extra changes will be required in init(/populate) part of the code.
>
> I agree to the above mentioned challenges and I am currently working on resolving these issues.
I still think that to justify such changes some really noticeable performance
improvement needs to be demonstrated: double-digit speedup for l3fwd/ipsec-secgw/...
Otherwise it just not worth the hassle.
> > All that will complicate mempool code, will make it more error prone
> > and harder to maintain.
> > But, as there is no real gain in return - no point to add such extra complexity at all.
> >
> > Konstantin
> >
More information about the dev
mailing list