[PATCH] eal: non-temporal memcpy
Morten Brørup
mb at smartsharesystems.com
Mon Oct 10 09:35:08 CEST 2022
Mattias, Konstantin, Honnappa, Stephen,
In my patch for non-temporal memcpy, I have been aiming for using as much non-temporal store as possible. E.g. copying 16 byte to a 16 byte aligned address will be done using non-temporal store instructions.
Now, I am seriously considering this alternative:
Only using non-temporal stores for complete cache lines, and using normal stores for partial cache lines.
I think it will make things simpler when an application mixes normal and non-temporal stores. E.g. an application writing metadata (a pcap header) followed by packet data.
The disadvantage is that copying a burst of 32 packets, will - in the worst case - pollute 64 cache lines (one at the start plus one at the end of the copied data), i.e. 4 KiB of data cache. If copying to a consecutive memory area, e.g. a packet capture buffer, it will pollute 33 cache lines (because the start of packet #2 is in the same cache line as the end of packet #1, etc.).
What do you think?
PS: Non-temporal loads are easy to work with, so don't worry about that.
Med venlig hilsen / Kind regards,
-Morten Brørup
More information about the dev
mailing list