[PATCH v2] eal: fix unaligned loads/stores in rte_memcpy_generic

Stephen Hemminger stephen at networkplumber.org
Sun Jan 16 17:32:20 CET 2022


On Sun, 16 Jan 2022 09:09:49 -0500
Luc Pelletier <lucp.at.work at gmail.com> wrote:

> > X86 always allows unaligned access. Irregardless of what tools say.
> > Why impose additional overhead in performance critical code.  
> 
> Let me preface my response by saying that I'm not a C compiler developer.
> Hopefully someone who is will read this and chime in.
> 
> I agree that X86 allows unaligned store/load. However, the C standard doesn't,
> and says that it's undefined behavior. This means that the code relies on
> undefined behavior. It may do the right thing all the time, almost all the time,
> some of the time... it's undefined. It may work now but it may stop
> working in the future.
> Here's a good discussion on SO about unaligned accesses in C on x86:
> 
> https://stackoverflow.com/questions/46790550/c-undefined-behavior-strict-aliasing-rule-or-incorrect-alignment/46790815#46790815
> 
> There's no way to do the unaligned store/load in C (that I know of)
> without invoking
> undefined behavior. I can see 2 options, either write the code in
> assembly, or use
> some other C construct that doesn't rely on undefined behavior.
> 
> While the for loop may seem slower than the other options, it
> surprisingly results in
> fewer load/store operations in certain scenarios. For example, if n ==
> 15 and it's
> known at compile-time, the compiler will generate 2 overlapping qword load/store
> operations (rather than the 4 that are currently being done with the
> current code).
> 
> All that being said, I can go back to something similar to my first
> patch. Using inline
> assembly, and making sure this time that it works for 32-bit too. I
> will post a patch in
> a few minutes that does exactly that. Maintainers can then chime in
> with their preferred
> option.

I would propose that DPDK have same kind of define as the kernel
for SAFE_UNALIGNED_ACCESS.  The C standard has to apply to all architectures
but DPDK will make the choice to be fast rather than standards conformant.


More information about the stable mailing list