[dpdk-dev] TX performance regression caused by the mbuf cachline split

Paul Emmerich emmericp at net.in.tum.de
Tue May 12 01:18:31 CEST 2015


Found a really simple solution that almost restores the original 
performance: just add a prefetch on alloc. For some reason, I assumed 
that this was already done since the troublesome commit I investigated 
mentioned something about prefetching... I guess the commit referred to 
the hardware prefetcher in the CPU.

Adding an explicit prefetch command in the mbuf alloc function gives a 
throughput of 12.7/10.35 Mpps in my benchmark with the 
simple/full-featured tx path.

DPDK 1.7.1 was at 14.1/10.7 Mpps. I guess I can live with that, since 
I'm primarily interested in the full-featured path and the drop from 
10.7 to ~10.4 was due to another change.

Patch: https://github.com/dpdk-org/dpdk/pull/2
I also sent an email to the mailing list.

I also think that the rx-path could also benefit from prefetching somewhere.


Paul



More information about the dev mailing list