Bug 1024 - [dpdk-22.07][meson test] driver-tests/link_bonding_mode4_autotest bond handshake failed
Summary: [dpdk-22.07][meson test] driver-tests/link_bonding_mode4_autotest bond handsh...
Status: UNCONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: meson (show other bugs)
Version: unspecified
Hardware: All All
: Normal normal
Target Milestone: ---
Assignee: dev
URL:
Depends on:
Blocks:
 
Reported: 2022-06-02 11:55 CEST by liweiyuan
Modified: 2023-05-31 07:24 CEST (History)
2 users (show)



Attachments

Description liweiyuan 2022-06-02 11:55:38 CEST
Environment
DPDK version: dpdk-22.07-rc0:eeab353b793109432538ba480a44de5dd3048032
Other software versions: name/version for QEMU, OVS, etc. Repeat as required.
OS: Ubuntu 22.04 LTS/ 5.15.0-27-generic
Compiler: gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0
Hardware platform: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
NIC hardware: Ethernet Controller XXV710 for 25GbE SFP28 158b.
NIC firmware: i40e-5.15.0-27-generic/8.70 0x8000c40f 1.3179.0

Test Setup
Steps to reproduce
List the steps to reproduce the issue.

1. Use the following command to build DPDK: 
CC=gcc meson -Denable_kmods=True -Dlibdir=lib --default-library=static x86_64-native-linuxapp-gcc/ ninja -C x86_64-native-linuxapp-gcc/ 

2. Execute the following command in the dpdk directory. 
meson test -C x86_64-native-linuxapp-gcc link_bonding_mode4_autotest -t 0.1

Show the output from the previous commands.
1/1 DPDK:driver-tests / link_bonding_mode4_autotest FAIL           22.68s   killed by signal 11 SIGSEGV
09:33:43 DPDK_TEST=link_bonding_mode4_autotest MALLOC_PERTURB_=42 /root/dpdk/x86_64-native-linuxapp-gcc/app/test/dpdk-test
----------------------------------- output -----------------------------------
stdout:
RTE>>link_bonding_mode4_autotest^M
 + ------------------------------------------------------- +
 + Test Suite : Link Bonding mode 4 Unit Test Suite
 + ------------------------------------------------------- +
 + TestCase [ 0] : test_mode4_agg_mode_selection failed
 + TestCase [ 1] : test_mode4_lacp succeeded
 + TestCase [ 2] : test_mode4_rx succeeded
 + TestCase [ 3] : test_mode4_tx_burst succeeded
 + TestCase [ 4] : test_mode4_marker succeeded
 + TestCase [ 5] : test_mode4_expired succeeded
 + TestCase [ 6] : test_mode4_ext_ctrl succeeded
 + TestCase [ 7] : test_mode4_ext_lacp succeeded
 + ------------------------------------------------------- +
 + Test Suite Summary : Link Bonding mode 4 Unit Test Suite
 + ------------------------------------------------------- +
 + Tests Total :        8
 + Tests Skipped :      0
 + Tests Executed :     8
 + Tests Unsupported:   0
 + Tests Passed :       7
 + Tests Failed :       1
 + ------------------------------------------------------- +
Test Failed
RTE>>
stderr:
EAL: Detected CPU lcores: 72
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
EAL: Using IOMMU type 1 (Type 1)
EAL: Ignore mapping IO port bar(1)
EAL: Ignore mapping IO port bar(4)
EAL: Probe PCI driver: net_i40e (8086:158b) device: 0000:81:00.0 (socket 1)
APP: HPET is not enabled, using TSC as default timer
bond_ethdev_mode_set(1587) - Using mode 4, it is necessary to do TX burst and RX burst at least every 100ms.
Device with port_id=1 already started
Device with port_id=2 already started
Device with port_id=3 already started
Device with port_id=4 already started
EAL: Test assert bond_handshake line 643 failed: Bond handshake failedEAL: Test assert test_mode4_agg_mode_selection line 680 failed: Initial handshake failed
Device with port_id=1 already stopped

Expected Result
Test OK

Regression
Is this issue a regression: (Y/N) Y
First run it by using "meson test"
Comment 1 liweiyuan 2022-11-14 02:51:29 CET
1.Root cause:
In the code , I see the timeout loop 30 times, and delay 100ms every time, it is based on the short timeout(3 seconds).



After testing in 7 servers, one is 2.9s(pass), and others are 3.1s(fail).



2.Checking the 802.3 doc:
LACP Session Timeout and Port Priority
You can set the timeout for a LACP session. The timeout value is the amount of time that a port-channel interface waits for a LACPDU from the remote system before terminating the LACP session. The default time out value is long (90 seconds); short is 3 seconds. You can also set the port priority. The higher the value the lower the priority. The priority range is 1 to 65535 and the default is 255.



it seems that it is ok for setting timeout value from 3s to 90s, maybe increase the loop 30 to bigger?
Comment 2 Ferruh YIGIT 2023-03-27 14:40:12 CEST
Is this issue seen by the latest version of DPDK (head of master branch), or is this only a v22.07 issue?
Comment 3 jiang,yu 2023-05-31 07:24:00 CEST
(In reply to Ferruh YIGIT from comment #2)
> Is this issue seen by the latest version of DPDK (head of master branch), or
> is this only a v22.07 issue?

It is still seen by the latest dpdk.

Note You need to log in before you can comment on or make changes to this bug.