[dpdk-dev] [PATCH v2 1/3] eal/x86: run-time dispatch over memcpy

Li, Xiaoyun xiaoyun.li at intel.com
Wed Sep 20 08:57:33 CEST 2017


Hi all
After further investigating, we have found some benefits with the patchset.
So the plan is to add a config parameter CONFIG_RTE_ENABLE_RUNTIME_DISPATCH.
By default, the value is "n" and would use current memcpy codes.
Only if users config it to "y", it would use the run-time dispatch codes(without inline).


Best Regards,
Xiaoyun Li




> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Li, Xiaoyun
> Sent: Tuesday, September 12, 2017 10:27
> To: Wang, Liang-min <liang-min.wang at intel.com>; Richardson, Bruce
> <bruce.richardson at intel.com>; Ananyev, Konstantin
> <konstantin.ananyev at intel.com>
> Cc: Zhang, Qi Z <qi.z.zhang at intel.com>; Lu, Wenzhuo
> <wenzhuo.lu at intel.com>; Zhang, Helin <helin.zhang at intel.com>;
> pierre at emutex.com; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/x86: run-time dispatch over
> memcpy
> 
> Hi ALL
> 
> After investigating, most DPDK codes are already run-time dispatching. Only
> rte_memcpy chooses the ISA at build-time.
> 
> To modify memcpy, there are two ways. The first one is function pointers
> and another is function multi-versioning in GCC.
> 
> But memcpy has been greatly optimized and gets benefit from total inline. If
> changing it to run-time dispatching via function pointers, the perf will drop a
> lot especially when copy size is small.
> 
> And function multi-versioning in GCC only works for C++. Even if it is said that
> GCC6 can support C, but in fact it does not support C in my trial.
> 
> 
> 
> The attachment is the perf results of memcpy with and without my patch and
> original DPDK codes but without inline.
> 
> It's just for comparison, so right now, I only tested on Broadwell, using AVX2.
> 
> The results are from running test/test/test_memcpy_perf.c.
> 
> (C = compile-time constant)
> 
> /* Do aligned tests where size is a variable */
> 
> /* Do aligned tests where size is a compile-time constant */
> 
> /* Do unaligned tests where size is a variable */
> 
> /* Do unaligned tests where size is a compile-time constant */
> 
> 
> 
> 4-7 means dpdk costs time 4 and glibc costs time 7
> 
> For size smaller than 128 bytes. This patch's perf is bad and even worse than
> glibc.
> 
> When size grows, the perf is better than glibc but worse than original dpdk.
> 
> And when grows above about 1024 bytes, it performs similarly to original
> dpdk.
> 
> Furthermore, if delete inline in original dpdk, the perf are similar to the perf
> with patch.
> 
> Different situations(4 types, such as cache to cache) perform differently but
> the trend is the same (size grows, perf grows).
> 
> 
> 
> So if needs dynamic, needs sacrifices some perf and needs to compile for the
> minimum target (e.g. compile for target avx, run on avx, avx2, avx512f).
> 
> 
> 
> Thus, I think this feature shouldn't be delivered in this release.
> 
> 
> 
> Best Regards,
> 
> Xiaoyun Li


More information about the dev mailing list