[dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms
Jim Thompson
jim at netgate.com
Sun Jan 25 21:02:51 CET 2015
> On Jan 20, 2015, at 11:15 AM, Stephen Hemminger <stephen at networkplumber.org> wrote:
>
> On Mon, 19 Jan 2015 09:53:34 +0800
> zhihong.wang at intel.com wrote:
>
>> Main code changes:
>>
>> 1. Differentiate architectural features based on CPU flags
>>
>> a. Implement separated move functions for SSE/AVX/AVX2 to make full utilization of cache bandwidth
>>
>> b. Implement separated copy flow specifically optimized for target architecture
>>
>> 2. Rewrite the memcpy function "rte_memcpy"
>>
>> a. Add store aligning
>>
>> b. Add load aligning based on architectural features
>>
>> c. Put block copy loop into inline move functions for better control of instruction order
>>
>> d. Eliminate unnecessary MOVs
>>
>> 3. Rewrite the inline move functions
>>
>> a. Add move functions for unaligned load cases
>>
>> b. Change instruction order in copy loops for better pipeline utilization
>>
>> c. Use intrinsics instead of assembly code
>>
>> 4. Remove slow glibc call for constant copies
>>
>> Signed-off-by: Zhihong Wang <zhihong.wang at intel.com>
>
> Dumb question: why not fix glibc memcpy instead?
> What is special about rte_memcpy?
In addition to the other points, a FreeBSD doesn't use glibc on the target platform, (but it is used on, say MIPS), and FreeBSD is a supported DPDK platform.
So glibc isn't a solution.
Jim
More information about the dev
mailing list