Bug 957 - IXGBE LSC IRQ configured state is lost on certain link down events
Summary: IXGBE LSC IRQ configured state is lost on certain link down events
Status: UNCONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: ethdev (show other bugs)
Version: 20.11
Hardware: All All
: Normal normal
Target Milestone: ---
Assignee: dev
URL:
Depends on:
Blocks:
 
Reported: 2022-03-14 22:50 CET by Mike
Modified: 2023-04-17 16:48 CEST (History)
3 users (show)



Attachments

Description Mike 2022-03-14 22:50:49 CET
Hello,

We recently ran into an issue with DPDK 20.11 for the IXGBE driver operating in 10G BASE-T mode. We have been able to replicate this behavior using dpdk-testpmd and do not see any recent/pertinent updates, so we are hopeful someone may be able to advise based on the information provided below. On the surface, based on our investigation, it would appear the current link-down transition logic does not correctly preserve IRQ mask configurations, specifically LSC, when a link partner causes some sort of slow or bounced link down event.
Background: 
We recently started using a new 3rd party traffic generator card for testing our application. We found when using this card in 10G BASE-T mode and toggling link up/down, it would correctly cause our application to detect the port to be down in our DPDK design. However, the link down event handling by the DPDK IXGBE driver appears to permanently disable its LSC IRQ detection on the first port down event such that any subsequent link up or down events from the external test card on this port would no longer be detected. The only way to restore link up was to restart the DPDK port in our design (stop/start). Having looked at this a bit, we switched over to the classic testpmd application and observed the exact same behavior.

Here is the data we believe you would find interesting:

NIC in question:

# lspci -D -nn | grep -F [0200] | grep 552
0000:03:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T [8086:15ad]
0000:03:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T [8086:15ad]
# dpdk-devbind.py -s | grep 552
0000:03:00.0 'Ethernet Connection X552/X557-AT 10GBASE-T 15ad' drv=vfio-pci unused=uio_pci_generic
0000:03:00.1 'Ethernet Connection X552/X557-AT 10GBASE-T 15ad' drv=vfio-pci unused=uio_pci_generic

We made the following debug logging changes to try an capture interesting data to share:

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 5a30c39593..75a9f9163b 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -4497,7 +4497,7 @@ ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev)

     /* read-on-clear nic registers here */
    eicr = IXGBE_READ_REG(hw, IXGBE_EICR);
-    PMD_DRV_LOG(DEBUG, "eicr %x", eicr);
+    PMD_DRV_LOG(ERR, "eicr %x", eicr);

     intr->flags = 0;

@@ -4614,7 +4613,7 @@ ixgbe_dev_interrupt_action(struct rte_eth_dev *dev)
          }
    }

-    PMD_DRV_LOG(DEBUG, "enable intr immediately");
+    PMD_DRV_LOG(ERR, "enable intr immediately, mask: 0x%08x, orig: 0x%08x, flags: 0x%08x", intr->mask, intr->mask_original, intr->flags);
    ixgbe_enable_intr(dev);

     return 0;
@@ -4648,7 +4647,9 @@ ixgbe_dev_interrupt_delayed_handler(void *param)

     ixgbe_disable_intr(hw);

-    eicr = IXGBE_READ_REG(hw, IXGBE_EICR);
+   eicr = IXGBE_READ_REG(hw, IXGBE_EICR);
+   PMD_DRV_LOG(ERR, "in delay func: eicr 0x%08x", eicr);
+   PMD_DRV_LOG(ERR, "enable intr delayed, mask: 0x%08x, orig: 0x%08x, flags: 0x%08x", intr->mask, intr->mask_original, intr->flags);
    if (eicr & IXGBE_EICR_MAILBOX)
          ixgbe_pf_mbx_process(dev);

With the above “log-err” additions, we have provided the following results. The first set of data below was generated using an older 3rd party traffic generator card to provide “good” results that show the IXGBE driver working correctly. Following that are the non-working (bad) logging results for the new traffic generator card. Both 3rd party cards correctly transition between down and up states.


######################################################################
# good sequence, both down detection and then up detection
######################################################################
# port transition from up to down
<27>1 2022-03-05T00:12:11.415436+00:00 - -  ixgbe_dev_interrupt_get_status(): eicr 100000
<27>1 2022-03-05T00:12:11.415489+00:00 - -  ixgbe_dev_interrupt_action(): enable intr immediately, mask: 0x02200000, orig: 0x02300000, flags: 0x00000001
<27>1 2022-03-05T00:12:11.425448+00:00 - -  ixgbe_dev_interrupt_get_status(): eicr 2000000
<27>1 2022-03-05T00:12:11.446191+00:00 - -  ixgbe_dev_interrupt_action(): enable intr immediately, mask: 0x02200000, orig: 0x02300000, flags: 0x00000000
<27>1 2022-03-05T00:12:15.415600+00:00 - -  ixgbe_dev_interrupt_delayed_handler(): in delay func: eicr 0x00000000
<27>1 2022-03-05T00:12:15.415655+00:00 - -  ixgbe_dev_interrupt_delayed_handler(): enable intr delayed, mask: 0x02200000, orig: 0x02300000, flags: 0x00000000

# port transition from down to up
<27>1 2022-03-05T00:12:33.856734+00:00 - -  ixgbe_dev_interrupt_get_status(): eicr 2000000
<27>1 2022-03-05T00:12:33.877463+00:00 - -  ixgbe_dev_interrupt_action(): enable intr immediately, mask: 0x02300000, orig: 0x00000000, flags: 0x00000000
<27>1 2022-03-05T00:12:34.203274+00:00 - -  ixgbe_dev_interrupt_get_status(): eicr 100000
<27>1 2022-03-05T00:12:34.207905+00:00 - -  ixgbe_dev_interrupt_action(): enable intr immediately, mask: 0x02200000, orig: 0x02300000, flags: 0x00000001
<27>1 2022-03-05T00:12:35.207994+00:00 - -  ixgbe_dev_interrupt_delayed_handler(): in delay func: eicr 0x00100000
<27>1 2022-03-05T00:12:35.208027+00:00 - -  ixgbe_dev_interrupt_delayed_handler(): enable intr delayed, mask: 0x02200000, orig: 0x02300000, flags: 0x00000001

######################################################################
# bad sequence, detects down event, but does not see the up event
######################################################################
# port transition from up to down
<27>1 2022-03-05T00:13:00.377072+00:00 - -  ixgbe_dev_interrupt_get_status(): eicr 100000
<27>1 2022-03-05T00:13:00.377127+00:00 - -  ixgbe_dev_interrupt_action(): enable intr immediately, mask: 0x02200000, orig: 0x02300000, flags: 0x00000001
<27>1 2022-03-05T00:13:00.643788+00:00 - -  ixgbe_dev_interrupt_get_status(): eicr 2100000
<27>1 2022-03-05T00:13:00.664603+00:00 - -  ixgbe_dev_interrupt_action(): enable intr immediately, mask: 0x02200000, orig: 0x02200000, flags: 0x00000001
<27>1 2022-03-05T00:13:01.664703+00:00 - -  ixgbe_dev_interrupt_delayed_handler(): in delay func: eicr 0x00000000
<27>1 2022-03-05T00:13:01.664738+00:00 - -  ixgbe_dev_interrupt_delayed_handler(): enable intr delayed, mask: 0x02200000, orig: 0x02200000, flags: 0x00000001
<27>1 2022-03-05T00:13:04.377237+00:00 - -  ixgbe_dev_interrupt_delayed_handler(): in delay func: eicr 0x00000000
<27>1 2022-03-05T00:13:04.377269+00:00 - -  ixgbe_dev_interrupt_delayed_handler(): enable intr delayed, mask: 0x02200000, orig: 0x00000000, flags: 0x00000000

# port transition from down to up
<nothing happens as LSC IRQ is not enabled due to above link-down sequence>
Comment 1 Muthurajan.Jayakumar 2022-05-18 17:37:18 CEST
Dear Mike, 
Thank you so much for posting this in DPDK Bugzilla.
Customer is asking for follow up.
Can I please request your guidance as how to get back to customer with status update please.
Once again, thank you very much for filing DPDK bugzilla please

Thanks
M Jay
Comment 2 Steve Yang 2022-05-23 11:57:55 CEST
Hello Mike,

I've attempted reply the "good sequence" you provided via ixgbe card(82599ES), but never succeed. It cannot detect correct link status (always 'up').
I've also tried different DPDK version (from 19.11 to latest 22.03), no "good sequence" occurred.

Here is my test steps:
- bind ixgbe device to DPDK, 
    > dpdk-devbind -b vfio-pci 0000:81:00.0;
- link up paired NIC device, e.g.: ens802f0(ice NIC) is a paired device with 0000:81:00.0(ixgbe);
    > ifconfig ens802f0 up
- launch testpmd
    > dpdk-testpmd -c 0xf -n 4 -a 0000:81:00.0 -- -i
- check link status 
    > show port summary all 
    #from this step, we can find the link status is 'up';
- link down paired NIC device
    > ifconfig ens802f0 down
- check link status again
    > show port summary all 
    #from this step, the link status still is 'up', no any interrupt signal received.

Could you please help give more detailed reproduce steps for those two sequence scenarios? or please help point out which major steps I missed?

Thanks & Regards,
Steve Yang.
Comment 3 Mike 2022-06-27 18:37:35 CEST
My testing was done using up/down state transitions in the external device the NIC is connected to. I was not manipulating testpmd application up/down status, but rather tracking its response to link partner up/down transitions. Please confirm you are trying this approach in your setup. In my case, I was able to utilize a "simulate cable break" option. You may need to physically remove the Cat-6e to do same if the link partner is not capable of correctly doing this via SW commands.
Comment 4 Mike 2023-04-17 16:48:13 CEST
Bumping this defect to see if replication was possible as described.

Note You need to log in before you can comment on or make changes to this bug.