[dpdk-dev] [PATCH] Clean up rte_memcpy.h file
Ravi Kerur
rkerur at gmail.com
Wed Apr 15 23:04:58 CEST 2015
On Tue, Apr 14, 2015 at 7:53 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:
> On Tue, 14 Apr 2015 14:31:53 -0700
> Ravi Kerur <rkerur at gmail.com> wrote:
>
> > +
> > + for (i = 0; i < 2; i++)
> > + rte_mov32(dst + i * 32, src + i * 32);
> > }
> Unless you force compiler to unroll the loop, it will be slower.
>
I had done following things
1. Use sample code from Intel to make sure CPU supports those instructions.
2. Check generated code with and without loop using (gcc -O3 -m64 -S), gcc
version is 4.8.2
No difference in code generated between "loop" and "no-loop". At least I
was expecting difference in the code.
3. Run "make test" and compare "memcpy perf" results.
More information about the dev
mailing list