[PATCH v2] eal: fix unaligned loads/stores in rte_memcpy_generic

Luc Pelletier lucp.at.work at gmail.com
Sun Jan 16 15:09:49 CET 2022


> X86 always allows unaligned access. Irregardless of what tools say.
> Why impose additional overhead in performance critical code.

Let me preface my response by saying that I'm not a C compiler developer.
Hopefully someone who is will read this and chime in.

I agree that X86 allows unaligned store/load. However, the C standard doesn't,
and says that it's undefined behavior. This means that the code relies on
undefined behavior. It may do the right thing all the time, almost all the time,
some of the time... it's undefined. It may work now but it may stop
working in the future.
Here's a good discussion on SO about unaligned accesses in C on x86:

https://stackoverflow.com/questions/46790550/c-undefined-behavior-strict-aliasing-rule-or-incorrect-alignment/46790815#46790815

There's no way to do the unaligned store/load in C (that I know of)
without invoking
undefined behavior. I can see 2 options, either write the code in
assembly, or use
some other C construct that doesn't rely on undefined behavior.

While the for loop may seem slower than the other options, it
surprisingly results in
fewer load/store operations in certain scenarios. For example, if n ==
15 and it's
known at compile-time, the compiler will generate 2 overlapping qword load/store
operations (rather than the 4 that are currently being done with the
current code).

All that being said, I can go back to something similar to my first
patch. Using inline
assembly, and making sure this time that it works for 32-bit too. I
will post a patch in
a few minutes that does exactly that. Maintainers can then chime in
with their preferred
option.


More information about the stable mailing list