[dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms

Jim Thompson jim at netgate.com
Sun Jan 25 21:02:51 CET 2015





> On Jan 20, 2015, at 11:15 AM, Stephen Hemminger <stephen at networkplumber.org> wrote:
> 
> On Mon, 19 Jan 2015 09:53:34 +0800
> zhihong.wang at intel.com wrote:
> 
>> Main code changes:
>> 
>> 1. Differentiate architectural features based on CPU flags
>> 
>>    a. Implement separated move functions for SSE/AVX/AVX2 to make full utilization of cache bandwidth
>> 
>>    b. Implement separated copy flow specifically optimized for target architecture
>> 
>> 2. Rewrite the memcpy function "rte_memcpy"
>> 
>>    a. Add store aligning
>> 
>>    b. Add load aligning based on architectural features
>> 
>>    c. Put block copy loop into inline move functions for better control of instruction order
>> 
>>    d. Eliminate unnecessary MOVs
>> 
>> 3. Rewrite the inline move functions
>> 
>>    a. Add move functions for unaligned load cases
>> 
>>    b. Change instruction order in copy loops for better pipeline utilization
>> 
>>    c. Use intrinsics instead of assembly code
>> 
>> 4. Remove slow glibc call for constant copies
>> 
>> Signed-off-by: Zhihong Wang <zhihong.wang at intel.com>
> 
> Dumb question: why not fix glibc memcpy instead?
> What is special about rte_memcpy?

In addition to the other points, a FreeBSD doesn't use glibc on the target platform, (but it is used on, say MIPS), and FreeBSD is a supported DPDK platform. 

So glibc isn't a solution. 

Jim


More information about the dev mailing list