[Environment] DPDK version: 92c0ad70ca version: 24.03-rc1 OS: RHEL9.0/5.14.0-70.13.1.el9_0.x86_64 Compiler: gcc version 11.2.1 Hardware platform: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz NIC hardware: Ethernet Controller XL710 for 40GbE QSFP+ 1583 NIC firmware: driver: i40e version: 2.24.6 firmware-version: 9.40 0x8000ece4 1.3429.0 [Test Setup] Steps to reproduce List the steps to reproduce the issue. 1, Build latest main dpdk24.03-rc1 rm -rf x86_64-native-linuxapp-gcc CC=gcc meson -Denable_kmods=True -Dlibdir=lib --default-library=shared x86_64-native-linuxapp-gcc ninja -C x86_64-native-linuxapp-gcc rm -rf /root/tmp/dpdk_share_lib /root/shared_lib_dpdk DESTDIR=/root/tmp/dpdk_share_lib ninja -C x86_64-native-linuxapp-gcc -j 110 install mv /root/tmp/dpdk_share_lib/usr/local/lib /root/shared_lib_dpdk ll /root/shared_lib_dpdk cat /root/.bashrc | grep LD_LIBRARY_PATH sed -i 's#export LD_LIBRARY_PATH=.*#export LD_LIBRARY_PATH=/root/shared_lib_dpdk#g' /root/.bashrc 2, Build LTS dpdk23.11.0 rm /root/dpdk tar zxvf dpdk_abi.tar.gz -C ~ cd ~/dpdk/ rm -rf x86_64-native-linuxapp-gcc CC=gcc meson -Denable_kmods=True -Dlibdir=lib --default-library=shared x86_64-native-linuxapp-gcc ninja -C x86_64-native-linuxapp-gcc rm -rf x86_64-native-linuxapp-gcc/lib rm -rf x86_64-native-linuxapp-gcc/drivers 3, Bind nic rmmod vfio_pci rmmod vfio_iommu_type1 rmmod vfio modprobe vfio modprobe vfio-pci usertools/dpdk-devbind.py --force --bind=vfio-pci 0000:18:00.0 0000:1a:00.0 4, Launch dpdk-test and run link_bonding_autotest x86_64-native-linuxapp-gcc/app/dpdk-test -c 0xff -d /root/shared_lib_dpdk -a 0000:18:00.0 -a 0000:1a:00.0 RTE>>link_bonding_autotest Show the output from the previous commands. [root@ABI-80 dpdk]# x86_64-native-linuxapp-gcc/app/dpdk-test -c 0xff -d /root/shared_lib_dpdk -a 0000:18:00.0 -a 0000:1a:00.0 EAL: Detected CPU lcores: 112 EAL: Detected NUMA nodes: 2 EAL: Detected shared linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: VFIO support initialized EAL: Using IOMMU type 1 (Type 1) EAL: Ignore mapping IO port bar(1) EAL: Ignore mapping IO port bar(4) EAL: Probe PCI driver: net_i40e (8086:1583) device: 0000:18:00.0 (socket 0) i40e_GLQF_reg_init(): i40e device 0000:18:00.0 changed global register [0x002689a0]. original: 0x00000021, new: 0x00000029 EAL: Ignore mapping IO port bar(1) EAL: Ignore mapping IO port bar(4) EAL: Probe PCI driver: net_i40e (8086:1583) device: 0000:1a:00.0 (socket 0) i40e_GLQF_reg_init(): i40e device 0000:1a:00.0 changed global register [0x002689a0]. original: 0x00000021, new: 0x00000029 TELEMETRY: No legacy callbacks, legacy socket not created APP: HPET is not enabled, using TSC as default timer RTE>>link_bonding_autotest + ------------------------------------------------------- + + Test Suite : Link Bonding Unit Test Suite Segmentation fault (core dumped) [Expected Result] Test ok. [Regression] Is this issue a regression: (Y/N) Y The first bad commit: commit d4b9235f95de4f46f368627af256ed8080f20d65 Author: Jerin Jacob <jerinj@marvell.com> Date: Thu Jan 18 15:17:42 2024 +0530 ethdev: add Tx queue used count query Introduce a new API to retrieve the number of used descriptors in a Tx queue. Applications can leverage this API in the fast path to inspect the Tx queue occupancy and take appropriate actions based on the available free descriptors. A notable use case could be implementing Random Early Discard (RED) in software based on Tx queue occupancy. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@amd.com>
Could you provide the backtrace of this crash?
[root@ABI-80 dpdk]# dmesg [321775.556832] vfio-pci 0000:18:00.0: Masking broken INTx support [321775.556908] vfio-pci 0000:18:00.0: vfio_ecap_init: hiding ecap 0x19@0x1d0 [321775.844843] vfio-pci 0000:1a:00.0: Masking broken INTx support [321775.844947] vfio-pci 0000:1a:00.0: vfio_ecap_init: hiding ecap 0x19@0x1d0 [321781.820113] dpdk-test[2298656]: segfault at 50 ip 0000000000453d31 sp 00007ffdeefcda90 error 6 in dpdk-test[416000+1f6000] [321781.820124] Code: f7 41 89 45 58 49 8d 4d 10 49 8d 44 24 10 49 89 47 58 49 89 4c 24 30 48 89 44 24 10 ba 01 00 00 00 49 8b 47 40 48 89 4c 24 18 <66> 89 50 50 b9 01 00 00 00 31 d2 49 8b 47 40 be 06 00 00 00 66 89
Is there any progress on this issue?
Looks like i40e specific issue, can you reproduce with anything vdev or HW PMD
(In reply to Jerin from comment #5) > Looks like i40e specific issue, can you reproduce with anything vdev or HW > PMD Not i40e specific issue, ice nic also can reproduce. dmesg: [269453.056342] vfio-pci 0000:4b:00.0: vfio_ecap_init: hiding ecap 0x19@0x1d0 [269453.056347] vfio-pci 0000:4b:00.0: vfio_ecap_init: hiding ecap 0x25@0x200 [269453.056349] vfio-pci 0000:4b:00.0: vfio_ecap_init: hiding ecap 0x26@0x210 [269453.056351] vfio-pci 0000:4b:00.0: vfio_ecap_init: hiding ecap 0x27@0x250 [269453.277824] vfio-pci 0000:4b:11.0: enabling device (0000 -> 0002) [269453.600299] vfio-pci 0000:4b:11.0: enabling device (0000 -> 0002) [269458.470533] dpdk-test[2956468]: segfault at 50 ip 0000563f05359a25 sp 00007fff5230ead0 error 6 in dpdk-test[563f05313000+220000] [269458.470544] Code: f7 41 89 45 58 49 8d 4d 10 49 8d 44 24 10 49 89 46 58 49 89 4c 24 30 48 89 44 24 10 ba 01 00 00 00 49 8b 46 40 48 89 4c 24 18 <66> 89 50 50 b9 01 00 00 00 31 d2 49 8b 46 40 be 06 00 00 00 66 89 OS:Ubuntu22.04.3 LTS/5.15.0-91-generic/gcc version 11.4.0
The issue is due to the following change[1] app/test/virtual_pmd.c is using internal struct rte_eth_dev struct via virtual_ethdev_create() and it's used by app/test/test_link_bonding.c. So this test case is not valid for testing the ABI as it is using internal structure. [1] diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index b482cd12bb..f05f68a67c 100644 --- a/lib/ethdev/ethdev_driver.h +++ b/lib/ethdev/ethdev_driver.h @@ -58,6 +58,8 @@ struct rte_eth_dev { eth_rx_queue_count_t rx_queue_count; /** Check the status of a Rx descriptor */ eth_rx_descriptor_status_t rx_descriptor_status; + /** Get the number of used Tx descriptors */ + eth_tx_queue_count_t tx_queue_count; /** Check the status of a Tx descriptor */ eth_tx_descriptor_status_t tx_descriptor_status; /** Pointer to PMD transmit mbufs reuse function */
How can I reproduce the issue, 'link_bonding_autotest' test passes for me? And which function/test fails in 'app/test/test_link_bonding.c'?
(In reply to Ferruh YIGIT from comment #8) > How can I reproduce the issue, 'link_bonding_autotest' test passes for me? > > And which function/test fails in 'app/test/test_link_bonding.c'? Hi Ferruh, this is testing ABI. Steps you can refer to the above description.
(In reply to Jerin from comment #7) > The issue is due to the following change[1] > > app/test/virtual_pmd.c is using internal struct rte_eth_dev struct via > virtual_ethdev_create() and it's used by app/test/test_link_bonding.c. > So this test case is not valid for testing the ABI as it is using internal > structure. > > > [1] > diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h > index b482cd12bb..f05f68a67c 100644 > --- a/lib/ethdev/ethdev_driver.h > +++ b/lib/ethdev/ethdev_driver.h > @@ -58,6 +58,8 @@ struct rte_eth_dev { > eth_rx_queue_count_t rx_queue_count; > /** Check the status of a Rx descriptor */ > eth_rx_descriptor_status_t rx_descriptor_status; > + /** Get the number of used Tx descriptors */ > + eth_tx_queue_count_t tx_queue_count; > /** Check the status of a Tx descriptor */ > eth_tx_descriptor_status_t tx_descriptor_status; > /** Pointer to PMD transmit mbufs reuse function */ Hi Jerin, do you mean that we should close this bugzilla, and from the bad commit begins, we should not test this case for ABI testing, right?
(In reply to jiang,yu from comment #9) > (In reply to Ferruh YIGIT from comment #8) > > How can I reproduce the issue, 'link_bonding_autotest' test passes for me? > > > > And which function/test fails in 'app/test/test_link_bonding.c'? > > Hi Ferruh, this is testing ABI. Steps you can refer to the above description. > I overlooked that this is ABI testing. Then I agree with Jerin, as 'link_bonding_autotest' is using internal API it is not suitable for ABI testing, or changes to internal structures, like "struct rte_eth_dev" in this case will cause ABI issues. +1 to remove this test for ABI testing.
Thanks Jerin and Ferruh. And close this Bugzilla according Jerin and Ferruh's inputs.