[dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms

Stephen Hemminger stephen at networkplumber.org
Tue Jan 20 18:15:38 CET 2015


On Mon, 19 Jan 2015 09:53:34 +0800
zhihong.wang at intel.com wrote:

> Main code changes:
> 
> 1. Differentiate architectural features based on CPU flags
> 
>     a. Implement separated move functions for SSE/AVX/AVX2 to make full utilization of cache bandwidth
> 
>     b. Implement separated copy flow specifically optimized for target architecture
> 
> 2. Rewrite the memcpy function "rte_memcpy"
> 
>     a. Add store aligning
> 
>     b. Add load aligning based on architectural features
> 
>     c. Put block copy loop into inline move functions for better control of instruction order
> 
>     d. Eliminate unnecessary MOVs
> 
> 3. Rewrite the inline move functions
> 
>     a. Add move functions for unaligned load cases
> 
>     b. Change instruction order in copy loops for better pipeline utilization
> 
>     c. Use intrinsics instead of assembly code
> 
> 4. Remove slow glibc call for constant copies
> 
> Signed-off-by: Zhihong Wang <zhihong.wang at intel.com>

Dumb question: why not fix glibc memcpy instead?
What is special about rte_memcpy?



More information about the dev mailing list