Created attachment 3 [details] vmcore-dmesg.txt Hi all, There is a kernel crash bug of ixgbevf kernel module in "Intel(R) 10GbE PCI Express Virtual Function Driver Version: 4.0.3 Release: 1" How to produce: 1. Use SRIOV, like this: sudo /usr/local/share/openvswitch/scripts/dpdk_nic_bind --status Network devices using DPDK-compatible driver ============================================ 0000:01:00.0 'Ethernet Controller 10-Gigabit X540-AT2' drv=igb_uio unused=ixgbe 0000:01:00.1 'Ethernet Controller 10-Gigabit X540-AT2' drv=igb_uio unused=ixgbe Network devices using kernel driver =================================== 0000:01:10.0 'X540 Ethernet Controller Virtual Function' if=enp1s16 drv=ixgbevf unused=bak,igb_uio 0000:01:10.1 'X540 Ethernet Controller Virtual Function' if=enp1s16f1 drv=ixgbevf unused=bak,igb_uio 0000:08:00.0 'I350 Gigabit Network Connection' if=eth2 drv=igb unused=igb_uio 0000:08:00.1 'I350 Gigabit Network Connection' if=eth3 drv=igb unused=igb_uio Other network devices ===================== <none> 2. bond enp1s16 and enp1s16f1 into bond1, by /etc/sysconfig/ifcfg-bond1. 3. bond 0000:01:00.0 and 0000:01:00.1 in ovs-dpdk into dpdkb2, by dpdk api. 4. stop bond1(ifdown bond1), stop dpdkb2(rte_eth_dev_stop), sleep 5 second, start dpdkb2(rte_eth_dev_start), start bond1(ifup bond1). After several times, bug happens, attachment is vmcore-dmesg.txt similar bug, refer to https://sourceforge.net/p/e1000/mailman/message/35173093/ looks like an old bug, please do not hid bug, to fix it.
Looks like a ixgbevf/ixgbe kernel bug. What's your DPDK version and ixgbe kernel driver/firmware version? you can use ethtool -i device_id to check. Could you try newer kernel version, or you can check the link here for Tested platform in http://dpdk.org/doc/guides/rel_notes/ as reference.
This is my patch: diff --git a/kmod/ixgbevf-4.0.3/src/ixgbevf_main.c b/kmod/ixgbevf-4.0.3/src/ixgbevf_main.c index 88f87cc..1373ab8 100644 --- a/kmod/ixgbevf-4.0.3/src/ixgbevf_main.c +++ b/kmod/ixgbevf-4.0.3/src/ixgbevf_main.c @@ -3742,9 +3742,14 @@ static int ixgbevf_close(struct net_device *netdev) { struct ixgbevf_adapter *adapter = netdev_priv(netdev); + while(test_and_set_bit(__IXGBEVF_SERVICE_SCHED, &adapter->state)) + msleep(1); + if (netif_device_present(netdev)) ixgbevf_close_suspend(adapter); + clear_bit(__IXGBEVF_SERVICE_SCHED, &adapter->state); + return 0; }
Qian, Does the patch make sense? Or do you want to point the bug to someone else? Thanks
Hi, If you would like your patch reviewed, please submit the patch to intel-wired-lan@lists.osuosl.org. Please attach the patch (since mailers sometimes mess with the spacing in code) as well as put it in the body of the message. Here are a couple of suggestions before you send the patch: 1. Make sure your patch description has adequate information about what the problem is and how this patch solves the problem. What you have above doesn't contain enough information about why this fixes the issue you are seeing. 2. Make sure your whitespace is correct. The patch above could be suffering from an artifact when posting to the web forum, but make sure that in the code the tabs are set correctly and that everything lines up with the code around it. 3. You can also include steps to reproduce the issue in the email. If you can reproduce without DPDK then that is the best path to take since anyone reviewing the patch on the above mailing list may not be able to reproduce with DPDK. 4. I noticed that your patch is based on 4.0.3 and the current version of the ixgbevf driver is 4.3.5, so you will probably get better support if you reproduce on the 4.3.5 version of the driver and then submit your patch based on that. Thanks!
batmanustc@gmail.com Do you have any update?