[dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms
Stephen Hemminger
stephen at networkplumber.org
Tue Jan 20 18:15:38 CET 2015
On Mon, 19 Jan 2015 09:53:34 +0800
zhihong.wang at intel.com wrote:
> Main code changes:
>
> 1. Differentiate architectural features based on CPU flags
>
> a. Implement separated move functions for SSE/AVX/AVX2 to make full utilization of cache bandwidth
>
> b. Implement separated copy flow specifically optimized for target architecture
>
> 2. Rewrite the memcpy function "rte_memcpy"
>
> a. Add store aligning
>
> b. Add load aligning based on architectural features
>
> c. Put block copy loop into inline move functions for better control of instruction order
>
> d. Eliminate unnecessary MOVs
>
> 3. Rewrite the inline move functions
>
> a. Add move functions for unaligned load cases
>
> b. Change instruction order in copy loops for better pipeline utilization
>
> c. Use intrinsics instead of assembly code
>
> 4. Remove slow glibc call for constant copies
>
> Signed-off-by: Zhihong Wang <zhihong.wang at intel.com>
Dumb question: why not fix glibc memcpy instead?
What is special about rte_memcpy?
More information about the dev
mailing list