[Bug 1135] [DPDK-22.11][asan]vhost_event_idx_interrupt/wake_up_split_ring_vhost_user_cores_with_event_idx_interrupt_mode_16_queues the backend feedback EAL Error when relunch the dpdk-l3fwd

Mattias Rönnblom mattias.ronnblom at ericsson.com
Thu Dec 15 09:22:55 CET 2022


On 2022-11-30 08:21, bugzilla at dpdk.org wrote:
> https://protect2.fireeye.com/v1/url?k=31323334-501cfaf3-313273af-454445554331-bcbbbdbd8c05330d&q=1&e=d0faa455-3b1a-4a2d-961c-8bf4a4fbf4ba&u=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D1135
> 
>              Bug ID: 1135
>             Summary: [DPDK-22.11][asan]vhost_event_idx_interrupt/wake_up_sp
>                      lit_ring_vhost_user_cores_with_event_idx_interrupt_mod
>                      e_16_queues the backend feedback EAL Error when
>                      relunch the dpdk-l3fwd
>             Product: DPDK
>             Version: 22.11
>            Hardware: x86
>                  OS: Linux
>              Status: UNCONFIRMED
>            Severity: normal
>            Priority: Normal
>           Component: examples
>            Assignee: dev at dpdk.org
>            Reporter: dukaix.yuan at intel.com
>    Target Milestone: ---
> 
> [Environment]
> DPDK version: Use make showversion or for a non-released version: git remote -v
> && git show-ref --heads
> 
> DPDK 22.11
> Other software versions: QEMU 7.1.0
> OS: Ubuntu 22.04.1 LTS/5.15.45-051545-generic
> Compiler: gcc -11.3.0
> Hardware platform: Intel(R) Xeon(R) Platinum 8280M CPU @ 2.70GHz
> NIC hardware: Ethernet Controller XL710 for 40GbE QSFP+ 1583
> NIC firmware:  9.00 0x8000c8d4 1.3179.0
> NIC driver: 2.20.12 i40e
> 
> [Test Setup]
> Steps to reproduce
> List the steps to reproduce the issue.
> 
> 1.Start back-end with dpdk-l3fwd:
> 
> x86_64-native-linuxapp-gcc/examples/dpdk-l3fwd-power -l 1 -n 4 \
> --file-prefix=dpdk_2497935_20221129174620 --no-pci --log-level=9 \
> --vdev 'net_vhost0,iface=/root/dpdk/vhost-net0,queues=1,client=1' \
> -- -p 0x1 --parse-ptype 1 --config '(0,0,1)' --interrupt-only
> 
> 2.Start the front-end with QEMU:
> 
> taskset -c 20,21,22,23,24,25,26,27 /home/QEMU/qemu-7.1.0/bin/qemu-system-x86_64
> \
> -name vm0 -enable-kvm -pidfile /tmp/.vm0.pid -daemonize -monitor \
> unix:/tmp/vm0_monitor.sock,server,nowait \
> -netdev user,id=nttsip1,hostfwd=tcp:10.239.252.220:6000-:22 \
> -device e1000,netdev=nttsip1 -cpu host -smp 8 -m 16384 \
> -object memory-backend-file,id=mem,size=16384M,mem-path=/mnt/huge,share=on \
> -numa node,memdev=mem -mem-prealloc \
> -chardev socket,path=/tmp/vm0_qga0.sock,server,nowait,id=vm0_qga0 \
> -device virtio-serial \
> -device virtserialport,chardev=vm0_qga0,name=org.qemu.guest_agent.0 \
> -vnc :4 -drive file=/home/image/ubuntu2004.img \
> -chardev socket,id=char0,path=/root/dpdk/vhost-net0,server \
> -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
> -device virtio-net-pci,netdev=netdev0,mac=00:11:22:33:44:50,csum=on
> 
> 3.Quit and relunch the backend.
> 
> x86_64-native-linuxapp-gcc/examples/dpdk-l3fwd-power -l 1 -n 4 \
> --file-prefix=dpdk_2497935_20221129174620 --no-pci --log-level=9 \
> --vdev 'net_vhost0,iface=/root/dpdk/vhost-net0,queues=1,client=1' \
> -- -p 0x1 --parse-ptype 1 --config '(0,0,1)' --interrupt-only
> 
> [Show the output from the previous commands.]
> 

The below output is from the frontend, correct?

> EAL: Error
> op 1 fd -1 epoll_ctl, Bad file descriptor
> p 0 q 10 Rx ctl error op 1 epfd -1 vec 11
> L3FWD_POWER: RX interrupt won't enable.
> EAL: Error
> op 1 fd -1 epoll_ctl, Bad file descriptor
> p 0 q 6 Rx ctl error op 1 epfd -1 vec 7
> L3FWD_POWER: RX interrupt won't enable.
> L3FWD_POWER: entering main interrupt loop on lcore 13
> L3FWD_POWER:  -- lcoreid=13 portid=0 rxqueueid=12
> L3FWD_POWER: entering main interrupt loop on lcore 16
> L3FWD_POWER:  -- lcoreid=16 portid=0 rxqueueid=15
> EAL: Error
> op 1 fd -1 epoll_ctl, Bad file descriptor
> p 0 q 14 Rx ctl error op 1 epfd -1 vec 15
> L3FWD_POWER: RX interrupt won't enable.
> L3FWD_POWER: lcore 2 is waked up from rx interrupt on port 0 queue 1
> L3FWD_POWER:  -- lcoreid=3 portid=0 rxqueueid=2
> EAL: Error op 1 fd -1 epoll_ctl, Bad file descriptor
> p 0 q 9 Rx ctl error op 1 epfd -1 vec 10
> L3FWD_POWER: RX interrupt won't enable.
> EAL: Error op 1 fd -1 epoll_ctl, Bad file descriptor
> p 0 q 12 Rx ctl error op 1 epfd -1 vec 13
> L3FWD_POWER: RX interrupt won't enable.
> L3FWD_POWER: entering main interrupt loop on lcore 6
> L3FWD_POWER:  -- lcoreid=6 portid=0 rxqueueid=5
> L3FWD_POWER: lcore 2 sleeps until interrupt triggers
> L3FWD_POWER: entering main interrupt loop on lcore 14
> L3FWD_POWER:  -- lcoreid=14 portid=0 rxqueueid=13
> L3FWD_POWER: entering main interrupt loop on lcore 1
> L3FWD_POWER:  -- lcoreid=1 portid=0 rxqueueid=0
> EAL: Error op 1 fd -1 epoll_ctl, Bad file descriptor
> p 0 q 4 Rx ctl error op 1 epfd -1 vec 5
> L3FWD_POWER: RX interrupt won't enable.
> EAL: Error op 1 fd -1 epoll_ctl, Bad file descriptor
> p 0 q 2 Rx ctl error op 1 epfd -1 vec 3
> L3FWD_POWER: RX interrupt won't enable.
> EAL: Error op 1 fd -1 epoll_ctl, Bad file descriptor
> p 0 q 15 Rx ctl error op 1 epfd -1 vec 16
> L3FWD_POWER: RX interrupt won't enable.
> EAL: Error op 1 fd -1 epoll_ctl, Bad file descriptor
> p 0 q 11 Rx ctl error op 1 epfd -1 vec 12
> L3FWD_POWER: RX interrupt won't enable.
> L3FWD_POWER: entering main interrupt loop on lcore 8
> L3FWD_POWER:  -- lcoreid=8 portid=0 rxqueueid=7
> VHOST_CONFIG: (/root/dpdk/vhost-net0) read message VHOST_USER_SET_VRING_KICK
> VHOST_CONFIG: (/root/dpdk/vhost-net0) vring kick idx:4 file:136
> L3FWD_POWER: entering main interrupt loop on lcore 9
> L3FWD_POWER:  -- lcoreid=9 portid=0 rxqueueid=8
> EAL: Error op 1 fd -1 epoll_ctl, Bad file descriptor
> p 0 q 7 Rx ctl error op 1 epfd -1 vec 8
> L3FWD_POWER: RX interrupt won't enable.
> VHOST_CONFIG: (/root/dpdk/vhost-net0) read message VHOST_USER_SET_VRING_CALL
> VHOST_CONFIG: (/root/dpdk/vhost-net0) vring call idx:4 file:138
> L3FWD_POWER: lcore 1 is waked up from rx interrupt on port 0 queue 0
> EAL: Error op 1 fd -1 epoll_ctl, Bad file descriptor
> L3FWD_POWER: lcore 1 sleeps until interrupt triggers
> EAL: Error op 1 fd -1 epoll_ctl, Bad file descriptor
> p 0 q 8 Rx ctl error op 1 epfd -1 vec 9
> L3FWD_POWER: RX interrupt won't enable.
> L3FWD_POWER: entering main interrupt loop on lcore 4
> L3FWD_POWER:  -- lcoreid=4 portid=0 rxqueueid=3
> p 0 q 5 Rx ctl error op 1 epfd -1 vec 6
> VHOST_CONFIG: (/root/dpdk/vhost-net0) read message VHOST_USER_SET_VRING_NUM
> EAL: Error op 1 fd -1 epoll_ctl, Bad file descriptor
> p 0 q 3 Rx ctl error op 1 epfd -1 vec 4
> L3FWD_POWER: RX interrupt won't enable.
> EAL: Error op 1 fd -1 epoll_ctl, Bad file descriptor
> p 0 q 13 Rx ctl error op 1 epfd -1 vec 14
> L3FWD_POWER: RX interrupt won't enable.
> L3FWD_POWER: RX interrupt won't enable.
> 
> [Expected Result]
> Explain what is the expected result in text or as an example output:
> 
> There is no error information.
> 
> [Regression]
> Is this issue a regression: Y
> 
> Version the regression was introduced: Specify git id if known.
> 
> Bad commit:
> 
> commit 1c9a7fba5c90e0422b517404499ed106f647bcff (HEAD, refs/bisect/bad)
> Author: Mattias Rönnblom <mattias.ronnblom at ericsson.com>
> Date:   Mon Jul 11 14:11:32 2022 +0200
> 
>      net: accept unaligned data in checksum routines
> 

Yux, do you think this commit is the root cause of this issue? As 
opposed to this commit just being a change which exposed some 
already-existing issue in l3fwd-power or some library/PMD it uses.

>      __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
>      data through an uint16_t pointer, which allowed the compiler to assume
>      the data was 16-bit aligned. This in turn would, with certain
>      architectures and compiler flag combinations, result in code with SIMD
>      load or store instructions with restrictions on data alignment.
> 
>      This patch keeps the old algorithm, but data is read using memcpy()
>      instead of direct pointer access, forcing the compiler to always
>      generate code that handles unaligned input. The _may_alias_ GCC
>      attribute is no longer needed.
> 
>      The data on which the Internet checksum functions operates are almost
>      always 16-bit aligned, but there are exceptions. In particular, the
>      PDCP protocol header may (literally) have an odd size.
> 
>      Performance impact seems to range from none to a very slight
>      regression.
> 
>      Bugzilla ID: 1035
>      Fixes: 6006818cfb26 ("net: new checksum functions")
>      Cc: stable at dpdk.org
> 
>      Signed-off-by: Mattias Rönnblom <mattias.ronnblom at ericsson.com>
>      Acked-by: Olivier Matz <olivier.matz at 6wind.com>
> 



More information about the dev mailing list