[dpdk-dev] net/qede: fix slow path completion timeout

Message ID 1527101333-16888-1-git-send-email-rasesh.mody@cavium.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Mody, Rasesh May 23, 2018, 6:48 p.m. UTC
  From: Shahed Shaikh <shahed.shaikh@cavium.com>

In 100G mode, we poll firmware slow path completion for every 1 second,
which is not enough and may result in completion timeout if
driver misses that window.

Patch "eal: set affinity for control threads" exposed this issue since
alarm callback runs in control thread context.

Fix this issue by update polling period to 100ms.

Fixes: d651ee4919cd ("eal: set affinity for control threads")
Fixes: 2af14ca79c0a ("net/qede: support 100G")
Cc: stable@dpdk.org

Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>
---
 drivers/net/qede/qede_ethdev.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
  

Comments

Ferruh Yigit May 24, 2018, 5:22 p.m. UTC | #1
On 5/23/2018 7:48 PM, Rasesh Mody wrote:
> From: Shahed Shaikh <shahed.shaikh@cavium.com>
> 
> In 100G mode, we poll firmware slow path completion for every 1 second,
> which is not enough and may result in completion timeout if
> driver misses that window.
> 
> Patch "eal: set affinity for control threads" exposed this issue since
> alarm callback runs in control thread context.
> 
> Fix this issue by update polling period to 100ms.
> 
> Fixes: d651ee4919cd ("eal: set affinity for control threads")
> Fixes: 2af14ca79c0a ("net/qede: support 100G")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>

Hi Rasesh,

I can see it fixes an issue introduced because of a change in this release,
what is the priority of the defect?
  
Mody, Rasesh May 24, 2018, 6:02 p.m. UTC | #2
Hi Ferruh,

> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]

> Sent: Thursday, May 24, 2018 10:22 AM

> 

> On 5/23/2018 7:48 PM, Rasesh Mody wrote:

> > From: Shahed Shaikh <shahed.shaikh@cavium.com>

> >

> > In 100G mode, we poll firmware slow path completion for every 1

> > second, which is not enough and may result in completion timeout if

> > driver misses that window.

> >

> > Patch "eal: set affinity for control threads" exposed this issue since

> > alarm callback runs in control thread context.

> >

> > Fix this issue by update polling period to 100ms.

> >

> > Fixes: d651ee4919cd ("eal: set affinity for control threads")

> > Fixes: 2af14ca79c0a ("net/qede: support 100G")

> > Cc: stable@dpdk.org

> >

> > Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>

> 

> Hi Rasesh,

> 

> I can see it fixes an issue introduced because of a change in this release, what

> is the priority of the defect?


This is important for us to get this change in as it fixes slow path completion timeout for our 100G adapters leading to PMD load failure. Please apply to 18.05.

Thanks!
-Rasesh
  
Ferruh Yigit May 25, 2018, 9:35 a.m. UTC | #3
On 5/24/2018 7:02 PM, Mody, Rasesh wrote:
> Hi Ferruh,
> 
>> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
>> Sent: Thursday, May 24, 2018 10:22 AM
>>
>> On 5/23/2018 7:48 PM, Rasesh Mody wrote:
>>> From: Shahed Shaikh <shahed.shaikh@cavium.com>
>>>
>>> In 100G mode, we poll firmware slow path completion for every 1
>>> second, which is not enough and may result in completion timeout if
>>> driver misses that window.
>>>
>>> Patch "eal: set affinity for control threads" exposed this issue since
>>> alarm callback runs in control thread context.
>>>
>>> Fix this issue by update polling period to 100ms.
>>>
>>> Fixes: d651ee4919cd ("eal: set affinity for control threads")
>>> Fixes: 2af14ca79c0a ("net/qede: support 100G")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>
>>
>> Hi Rasesh,
>>
>> I can see it fixes an issue introduced because of a change in this release, what
>> is the priority of the defect?
> 
> This is important for us to get this change in as it fixes slow path completion timeout for our 100G adapters leading to PMD load failure. Please apply to 18.05.

Applied to dpdk-next-net/master, thanks.
  

Patch

diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c
index 30b6519..338ddc1 100644
--- a/drivers/net/qede/qede_ethdev.c
+++ b/drivers/net/qede/qede_ethdev.c
@@ -16,7 +16,7 @@ 
 int qede_logtype_driver;
 
 static const struct qed_eth_ops *qed_ops;
-static int64_t timer_period = 1;
+#define QEDE_SP_TIMER_PERIOD	10000 /* 100ms */
 
 /* VXLAN tunnel classification mapping */
 const struct _qede_udp_tunn_types {
@@ -1698,7 +1698,7 @@  static void qede_poll_sp_sb_cb(void *param)
 	qede_interrupt_action(ECORE_LEADING_HWFN(edev));
 	qede_interrupt_action(&edev->hwfns[1]);
 
-	rc = rte_eal_alarm_set(timer_period * US_PER_S,
+	rc = rte_eal_alarm_set(QEDE_SP_TIMER_PERIOD,
 			       qede_poll_sp_sb_cb,
 			       (void *)eth_dev);
 	if (rc != 0) {
@@ -3093,7 +3093,7 @@  static int qede_common_dev_init(struct rte_eth_dev *eth_dev, bool is_vf)
 	 * interrupt vector but we need one for each engine.
 	 */
 	if (ECORE_IS_CMT(edev) && IS_PF(edev)) {
-		rc = rte_eal_alarm_set(timer_period * US_PER_S,
+		rc = rte_eal_alarm_set(QEDE_SP_TIMER_PERIOD,
 				       qede_poll_sp_sb_cb,
 				       (void *)eth_dev);
 		if (rc != 0) {