[2/3] net/bnxt: fix ver get HWRM command

Message ID 20210304090728.26166-3-kalesh-anakkur.purayil@broadcom.com (mailing list archive)
State Accepted, archived
Delegated to: Ajit Khaparde
Headers
Series bnxt fixes |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Kalesh A P March 4, 2021, 9:07 a.m. UTC
  From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>

Fix HWRM_VER_GET command to handle DEV_NOT_RDY state.

Driver should fail probe if the device is not ready.
Conversely, the HWRM_VER_GET poll after reset can safely
retry until the existing timeout is exceeded.

Fixes: 804e746c7b73 ("net/bnxt: add hardware resource manager init code")
Cc: stable@dpdk.org

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
---
 drivers/net/bnxt/bnxt_hwrm.c | 9 +++++++++
 1 file changed, 9 insertions(+)
  

Comments

David Marchand March 1, 2022, 12:53 p.m. UTC | #1
Hello,

On Thu, Mar 4, 2021 at 9:45 AM Kalesh A P
<kalesh-anakkur.purayil@broadcom.com> wrote:
>
> From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
>
> Fix HWRM_VER_GET command to handle DEV_NOT_RDY state.
>
> Driver should fail probe if the device is not ready.
> Conversely, the HWRM_VER_GET poll after reset can safely
> retry until the existing timeout is exceeded.
>
> Fixes: 804e746c7b73 ("net/bnxt: add hardware resource manager init code")
> Cc: stable@dpdk.org
>
> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
> Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
> Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>

This patch makes probing fail on a RHEL9 kernel with firmwares:
firmware-version: 20.6.143.0/pkg 20.06.04.06
and
firmware-version: 20.6.112.0

Simply reverting the patch fixes probing in my env.

This was reported by our QE team which is doing sanity checks on ovs
2.17 which uses dpdk v21.11.
https://bugzilla.redhat.com/show_bug.cgi?id=2055531
  
David Marchand March 17, 2022, 10:11 a.m. UTC | #2
Kalesh, Ajit,

On Tue, Mar 1, 2022 at 1:53 PM David Marchand <david.marchand@redhat.com> wrote:
> On Thu, Mar 4, 2021 at 9:45 AM Kalesh A P
> <kalesh-anakkur.purayil@broadcom.com> wrote:
> >
> > From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> >
> > Fix HWRM_VER_GET command to handle DEV_NOT_RDY state.
> >
> > Driver should fail probe if the device is not ready.
> > Conversely, the HWRM_VER_GET poll after reset can safely
> > retry until the existing timeout is exceeded.
> >
> > Fixes: 804e746c7b73 ("net/bnxt: add hardware resource manager init code")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> > Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
> > Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
> > Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
>
> This patch makes probing fail on a RHEL9 kernel with firmwares:
> firmware-version: 20.6.143.0/pkg 20.06.04.06
> and
> firmware-version: 20.6.112.0
>
> Simply reverting the patch fixes probing in my env.
>
> This was reported by our QE team which is doing sanity checks on ovs
> 2.17 which uses dpdk v21.11.
> https://bugzilla.redhat.com/show_bug.cgi?id=2055531

QE confirmed reverting this patch fixed initialisation in their setup.
I reproduced the issue on another system with RHEL 8.5, and confirmed
that reverting the patch or upgrading the firmware fix the issue.

I can understand new features in a PMD might require newer firmwares.
But breaking compatibility with older firmwares is a big no.

If there is no other option, I will send a revert for this patch.


Thanks.
  
Ajit Khaparde March 17, 2022, 4:48 p.m. UTC | #3
On Thu, Mar 17, 2022 at 3:11 AM David Marchand
<david.marchand@redhat.com> wrote:
>
> Kalesh, Ajit,
>
> On Tue, Mar 1, 2022 at 1:53 PM David Marchand <david.marchand@redhat.com> wrote:
> > On Thu, Mar 4, 2021 at 9:45 AM Kalesh A P
> > <kalesh-anakkur.purayil@broadcom.com> wrote:
> > >
> > > From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> > >
> > > Fix HWRM_VER_GET command to handle DEV_NOT_RDY state.
> > >
> > > Driver should fail probe if the device is not ready.
> > > Conversely, the HWRM_VER_GET poll after reset can safely
> > > retry until the existing timeout is exceeded.
> > >
> > > Fixes: 804e746c7b73 ("net/bnxt: add hardware resource manager init code")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> > > Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
> > > Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
> > > Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
> >
> > This patch makes probing fail on a RHEL9 kernel with firmwares:
> > firmware-version: 20.6.143.0/pkg 20.06.04.06
> > and
> > firmware-version: 20.6.112.0
Oh wow! That's really old FW.

> >
> > Simply reverting the patch fixes probing in my env.
> >
> > This was reported by our QE team which is doing sanity checks on ovs
> > 2.17 which uses dpdk v21.11.
> > https://bugzilla.redhat.com/show_bug.cgi?id=2055531
>
> QE confirmed reverting this patch fixed initialisation in their setup.
> I reproduced the issue on another system with RHEL 8.5, and confirmed
> that reverting the patch or upgrading the firmware fix the issue.
>
> I can understand new features in a PMD might require newer firmwares.
> But breaking compatibility with older firmwares is a big no.
I agree.
Ideally this should not have been sent as a fix targeting the super-old commit
804e746c7b73.

>
> If there is no other option, I will send a revert for this patch.
A version check may help avoid a complete revert.
But that will require some testing, which means we are looking at
mid-next week for the fix.

Submit the revert patch, just in case it takes longer than that.

Thanks
Ajit


>
>
> Thanks.
>
> --
> David Marchand
>
  
David Marchand March 18, 2022, 2:53 p.m. UTC | #4
Hi Ajit,

On Thu, Mar 17, 2022 at 5:48 PM Ajit Khaparde
<ajit.khaparde@broadcom.com> wrote:
> > On Tue, Mar 1, 2022 at 1:53 PM David Marchand <david.marchand@redhat.com> wrote:
> > > On Thu, Mar 4, 2021 at 9:45 AM Kalesh A P
> > > <kalesh-anakkur.purayil@broadcom.com> wrote:
> > > >
> > > > From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> > > >
> > > > Fix HWRM_VER_GET command to handle DEV_NOT_RDY state.
> > > >
> > > > Driver should fail probe if the device is not ready.
> > > > Conversely, the HWRM_VER_GET poll after reset can safely
> > > > retry until the existing timeout is exceeded.
> > > >
> > > > Fixes: 804e746c7b73 ("net/bnxt: add hardware resource manager init code")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> > > > Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
> > > > Reviewed-by: Randy Schacher <stuart.schacher@broadcom.com>
> > > > Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
> > >
> > > This patch makes probing fail on a RHEL9 kernel with firmwares:
> > > firmware-version: 20.6.143.0/pkg 20.06.04.06
> > > and
> > > firmware-version: 20.6.112.0
> Oh wow! That's really old FW.

Well, hard to tell it is old, from my side.

I found a few nics in our lab that show similar firmware versions.
As you can see, QE have some servers with such nics too.


> > If there is no other option, I will send a revert for this patch.
> A version check may help avoid a complete revert.
> But that will require some testing, which means we are looking at
> mid-next week for the fix.
>
> Submit the revert patch, just in case it takes longer than that.

I'll do that.
Thanks.
  

Patch

diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index 02b0a21..5ef0845 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -1217,6 +1217,11 @@  int bnxt_hwrm_ver_get(struct bnxt *bp, uint32_t timeout)
 	else
 		HWRM_CHECK_RESULT();
 
+	if (resp->flags & HWRM_VER_GET_OUTPUT_FLAGS_DEV_NOT_RDY) {
+		rc = -EAGAIN;
+		goto error;
+	}
+
 	PMD_DRV_LOG(INFO, "%d.%d.%d:%d.%d.%d.%d\n",
 		resp->hwrm_intf_maj_8b, resp->hwrm_intf_min_8b,
 		resp->hwrm_intf_upd_8b, resp->hwrm_fw_maj_8b,
@@ -6045,6 +6050,10 @@  int bnxt_hwrm_poll_ver_get(struct bnxt *bp)
 	rc = bnxt_hwrm_send_message(bp, &req, sizeof(req), BNXT_USE_CHIMP_MB);
 
 	HWRM_CHECK_RESULT_SILENT();
+
+	if (resp->flags & HWRM_VER_GET_OUTPUT_FLAGS_DEV_NOT_RDY)
+		rc = -EAGAIN;
+
 	HWRM_UNLOCK();
 
 	return rc;