[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
Wang, Zhihong
zhihong.wang at intel.com
Mon Sep 18 07:10:40 CEST 2017
> Hi Zhihong Wang
>
> I test avx512 rte_memcpy found the performanc for ovs dpdk is lower than
> avx2 rte_memcpy.
Hi Haifeng,
AVX512 memcpy is marked as experimental and disabled by default, its
benefit varies from case to case. So enable it only when the case
(SW + HW setup with expected data pattern) is verified.
BTW, it's not recommended to use micro benchmarks like test_memcpy_perf
for memcpy performance report as they aren't likely able to reflect
performance of real world applications, please find more details at
https://software.intel.com/en-us/articles/performance-optimization-of-memcpy-in-dpdk
Thanks
Zhihong
>
> The vm loop test for ovs dpdk results:
> avx512 is *15*Gbps
> perf data:
> 0.52 │ vmovdq (%r8,%r10,1),%zmm0
> 95.33 │ sub $0x40,%r9
> 0.45 │ add $0x40,%r8
> 0.60 │ vmovdq %zmm0,-0x40(%r8)
> 1.84 │ cmp $0x3f,%r9
> │ ↓ ja f20
> │ lea -0x40(%rsi),%r8
> 0.15 │ or $0xffffffffffffffc0,%rsi
> 0.21 │ and $0xffffffffffffffc0,%r8
> 0.00 │ lea 0x40(%rsi,%r8,1),%rsi
> 0.00 │ vmovdq (%rcx,%rsi,1),%zmm0
> 0.22 │ vmovdq %zmm0,(%rdx,%rsi,1)
> 0.67 │ ↓ jmpq c78
> │ mov -0x128(%rbp),%rdi
> │ rex.R
> │ .byte 0x89
> │ popfq
>
> avx2 is *18.8*Gbps
> perf data:
> 0.96 │ add %r9,%r13
> 66.04 │ vmovdq (%rdx),%ymm0
> 1.20 │ sub $0x40,%rdi
> 1.53 │ add $0x40,%rdx
> 10.83 │ vmovdq %ymm0,-0x40(%rdx,%r15,1)
> 8.64 │ vmovdq -0x20(%rdx),%ymm0
> 7.58 │ vmovdq %ymm0,-0x40(%rdx,%r13,1)
>
>
> dpdk version: v17.05
> ovs version: 2.8.90
> qemu version: QEMU emulator version 2.9.94 (v2.10.0-rc4-dirty)
>
> gcc version: gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
> kernal version: 3.10.0
>
>
> compile dpdk:
> CONFIG_RTE_ENABLE_AVX512=y
> export DPDK_DIR=$PWD
> export DPDK_TARGET=x86_64-native-linuxapp-gcc
> export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
> make install T=$DPDK_TARGET DESTDIR=install
>
> compile ovs:
> sh boot.sh
> ./configure CFLAGS="-g -O2" --with-dpdk=$DPDK_BUILD --prefix=/usr --
> localstatedir=/var --sysconfdir=/etc
> make -j
> make install
>
> The test for dpdk test_memcpy_perf:
> avx2:
> ** rte_memcpy() - memcpy perf. tests (C = compile-time constant) **
> ======= ============== ============== ==============
> ==============
> Size Cache to cache Cache to mem Mem to cache Mem to mem
> (bytes) (ticks) (ticks) (ticks) (ticks)
> ------- -------------- -------------- -------------- --------------
> ========================== 32B aligned
> ============================
> 64 6 - 10 27 - 52 30 - 39 56 - 97
> 512 24 - 44 251 - 271 145 - 217 396 - 447
> 1024 35 - 78 394 - 433 252 - 319 609 - 670
> ------- -------------- -------------- -------------- --------------
> C 64 3 - 9 28 - 31 29 - 40 55 - 66
> C 512 25 - 55 253 - 268 139 - 268 397 - 410
> C 1024 32 - 83 394 - 416 250 - 396 612 - 687
> =========================== Unaligned
> =============================
> 64 8 - 9 85 - 71 45 - 45 125 - 121
> 512 33 - 49 282 - 305 153 - 252 420 - 478
> 1024 42 - 83 409 - 491 259 - 389 640 - 748
> ------- -------------- -------------- -------------- --------------
> C 64 4 - 9 42 - 46 39 - 46 76 - 90
> C 512 33 - 55 280 - 272 153 - 281 421 - 415
> C 1024 41 - 83 407 - 427 258 - 405 578 - 701
> ======= ============== ============== ==============
> ==============
>
> avx512:
> ** rte_memcpy() - memcpy perf. tests (C = compile-time constant) **
> ======= ============== ============== ==============
> ==============
> Size Cache to cache Cache to mem Mem to cache Mem to mem
> (bytes) (ticks) (ticks) (ticks) (ticks)
> ------- -------------- -------------- -------------- --------------
> ========================== 64B aligned
> ============================
> 64 6 - 9 18 - 33 24 - 38 40 - 65
> 512 18 - 44 178 - 262 138 - 218 309 - 429
> 1024 27 - 79 338 - 430 250 - 322 560 - 674
> ------- -------------- -------------- -------------- --------------
> C 64 3 - 9 18 - 20 23 - 41 39 - 50
> C 512 15 - 54 205 - 270 134 - 268 304 - 409
> C 1024 24 - 83 371 - 414 242 - 400 550 - 692
> =========================== Unaligned
> =============================
> 64 8 - 9 87 - 74 45 - 48 125 - 118
> 512 23 - 49 298 - 311 150 - 250 437 - 482
> 1024 36 - 83 427 - 505 259 - 406 633 - 754
> ------- -------------- -------------- -------------- --------------
> C 64 4 - 9 42 - 46 39 - 46 76 - 94
> C 512 23 - 55 246 - 277 152 - 290 349 - 426
> C 1024 38 - 83 398 - 431 258 - 416 634 - 725
> ======= ============== ============== ==============
> ==============
>
>
>
>
>
>
>
More information about the dev
mailing list