Bug 14 - Kernel Crash bug of ixgbevf kernel module in "Intel(R) 10GbE PCI Express Virtual Function Driver Version: 4.0.3 Release: 1" and the latest version
Summary: Kernel Crash bug of ixgbevf kernel module in "Intel(R) 10GbE PCI Express Virt...
Status: CONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: other (show other bugs)
Version: unspecified
Hardware: All Linux
: Normal critical
Target Milestone: ---
Assignee: batmanustc
URL:
Depends on:
Blocks:
 
Reported: 2018-02-02 03:43 CET by batmanustc
Modified: 2018-09-16 00:13 CEST (History)
4 users (show)



Attachments
vmcore-dmesg.txt (883.43 KB, text/plain)
2018-02-02 03:43 CET, batmanustc
Details

Description batmanustc 2018-02-02 03:43:43 CET
Created attachment 3 [details]
vmcore-dmesg.txt

Hi all,

There is a kernel crash bug of ixgbevf kernel module in "Intel(R) 10GbE PCI Express Virtual Function Driver Version: 4.0.3 Release: 1"

How to produce:
1. Use SRIOV, like this:
sudo /usr/local/share/openvswitch/scripts/dpdk_nic_bind --status

Network devices using DPDK-compatible driver
============================================
0000:01:00.0 'Ethernet Controller 10-Gigabit X540-AT2' drv=igb_uio unused=ixgbe
0000:01:00.1 'Ethernet Controller 10-Gigabit X540-AT2' drv=igb_uio unused=ixgbe

Network devices using kernel driver
===================================
0000:01:10.0 'X540 Ethernet Controller Virtual Function' if=enp1s16 drv=ixgbevf unused=bak,igb_uio
0000:01:10.1 'X540 Ethernet Controller Virtual Function' if=enp1s16f1 drv=ixgbevf unused=bak,igb_uio
0000:08:00.0 'I350 Gigabit Network Connection' if=eth2 drv=igb unused=igb_uio
0000:08:00.1 'I350 Gigabit Network Connection' if=eth3 drv=igb unused=igb_uio

Other network devices
=====================
<none>

2. bond enp1s16 and enp1s16f1 into bond1, by /etc/sysconfig/ifcfg-bond1.
3. bond 0000:01:00.0 and 0000:01:00.1 in ovs-dpdk into dpdkb2, by dpdk api.
4. stop bond1(ifdown bond1), stop dpdkb2(rte_eth_dev_stop), sleep 5 second, start dpdkb2(rte_eth_dev_start), start bond1(ifup bond1).

After several times, bug happens, attachment is vmcore-dmesg.txt

similar bug, refer to 
https://sourceforge.net/p/e1000/mailman/message/35173093/

looks like an old bug, please do not hid bug, to fix it.
Comment 1 Qian 2018-02-08 04:21:57 CET
Looks like a ixgbevf/ixgbe kernel bug. 
What's your DPDK version and ixgbe kernel driver/firmware version? you can use ethtool -i device_id to check. Could you try newer kernel version, or you can check the link here for Tested platform in http://dpdk.org/doc/guides/rel_notes/ as reference.
Comment 2 batmanustc 2018-05-02 04:16:45 CEST
This is my patch:

diff --git a/kmod/ixgbevf-4.0.3/src/ixgbevf_main.c b/kmod/ixgbevf-4.0.3/src/ixgbevf_main.c
index 88f87cc..1373ab8 100644
--- a/kmod/ixgbevf-4.0.3/src/ixgbevf_main.c
+++ b/kmod/ixgbevf-4.0.3/src/ixgbevf_main.c
@@ -3742,9 +3742,14 @@ static int ixgbevf_close(struct net_device *netdev)
 {
        struct ixgbevf_adapter *adapter = netdev_priv(netdev);

+    while(test_and_set_bit(__IXGBEVF_SERVICE_SCHED, &adapter->state))
+        msleep(1);
+
        if (netif_device_present(netdev))
                ixgbevf_close_suspend(adapter);

+    clear_bit(__IXGBEVF_SERVICE_SCHED, &adapter->state);
+
        return 0;
 }
Comment 3 Ajit Khaparde 2018-07-15 06:46:13 CEST
Qian,
Does the patch make sense? Or do you want to point the bug to someone else?

Thanks
Comment 4 Paul Stillwell 2018-08-28 18:01:35 CEST
Hi,

If you would like your patch reviewed, please submit the patch to intel-wired-lan@lists.osuosl.org. Please attach the patch (since mailers sometimes mess with the spacing in code) as well as put it in the body of the message. Here are a couple of suggestions before you send the patch:

1. Make sure your patch description has adequate information about what the problem is and how this patch solves the problem. What you have above doesn't contain enough information about why this fixes the issue you are seeing.

2. Make sure your whitespace is correct. The patch above could be suffering from an artifact when posting to the web forum, but make sure that in the code the tabs are set correctly and that everything lines up with the code around it.

3. You can also include steps to reproduce the issue in the email. If you can reproduce without DPDK then that is the best path to take since anyone reviewing the patch on the above mailing list may not be able to reproduce with DPDK.

4. I noticed that your patch is based on 4.0.3 and the current version of the ixgbevf driver is 4.3.5, so you will probably get better support if you reproduce on the 4.3.5 version of the driver and then submit your patch based on that.

Thanks!
Comment 5 Ajit Khaparde 2018-09-16 00:13:07 CEST
batmanustc@gmail.com

Do you have any update?

Note You need to log in before you can comment on or make changes to this bug.