[dpdk-stable] AVX512 bug on SkyLake

Ananyev, Konstantin konstantin.ananyev at intel.com
Sun Nov 11 15:15:37 CET 2018


Hi Thomas,

> 
> Below is my conclusion for this bug.
> An expert of x86 is required to follow-up.
> 
> Summary:
> 	- CPU: Intel Skylake
> 	- Linux environment: Ubuntu 18.04
> 	- Compiler: GCC 7 or 8
> 	- Scenario: testpmd crashes when it starts forwarding
> 	- Behaviour: AVX2 version of rte_memcpy() fails if optimized for AVX512
> 	- Context: inline rte_memcpy() is called from
> 			inline rte_mempool_put_bulk(), called from
> 			mlx5_tx_complete() (inline or not)
> 	- Analysis: AVX512 optimization changes vmovdqu to vmovdqu8
> 
> Latest status can be found in Bugzilla:
> 	https://bugs.dpdk.org/show_bug.cgi?id=97#c35


Looking at dissamled output from the bug report, it seems that the
problem is not in vmovdqu8 instruction itself, but in the wrong offsets
generated by the compiler:

   vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x2]
   vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x30],0x1
    vmovups XMMWORD PTR [rsi+0x20],xmm0
    vextracti128 XMMWORD PTR [rsi+0x30],ymm0,0x1
    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x4]
    vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x50],0x1
    vmovups XMMWORD PTR [rsi+0x40],xmm0
    vextracti128 XMMWORD PTR [rsi+0x50],ymm0,0x1
    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6]

Should be:
vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x20]
I think.

Same for next two offsets: 0x4 and 0x6 respectively should be 0x40 and 0x60.

Not sure what causing compiler behaves that way.
BTW, looking though testpmd objdump output - it seems that only mlx5 driver
exhibits such problem (I didn't enable mlx4 actually, probably same problem here).
Which looks a bit weird to me.
Konstantin



More information about the stable mailing list