[dpdk-stable] [PATCH] net/bnxt: fix missing barriers in completion handling

Lance Richardson lance.richardson at broadcom.com
Fri Jul 9 16:48:35 CEST 2021


On Fri, Jul 9, 2021 at 2:00 AM Ruifeng Wang <Ruifeng.Wang at arm.com> wrote:
>
<snip>
> > +/**
> > + * Check validity of a completion ring entry. If the entry is valid, include a
> > + * C11 __ATOMIC_ACQUIRE fence to ensure that subsequent loads of fields
> > in the
> > + * completion are not hoisted by the compiler or by the CPU to come before
> > the
> > + * loading of the "valid" field.
> > + *
> > + * Note: the caller must not access any fields in the specified completion
> > + * entry prior to calling this function.
> > + *
> > + * @param cmp
> Nit, cmpl

Thanks, good catch. I'll fix this in v2.

<snip>
>
> >
> >       /* Check to see if hw has posted a completion for the descriptor. */
> > @@ -3327,7 +3327,7 @@ bnxt_tx_descriptor_status_op(void *tx_queue,
> > uint16_t offset)
> >               cons = RING_CMPL(ring_mask, raw_cons);
> >               txcmp = (struct tx_cmpl *)&cp_desc_ring[cons];
> >
> > -             if (!CMP_VALID(txcmp, raw_cons, cp_ring_struct))
> > +             if (!bnxt_cpr_cmp_valid(txcmp, raw_cons, ring_mask + 1))
> cpr->cp_ring_struct->ring_size can be used instead of 'ring_mask + 1'?
>
> >                       break;
> >
> >               if (CMP_TYPE(txcmp) == TX_CMPL_TYPE_TX_L2)
>
> <snip>
>
> > diff --git a/drivers/net/bnxt/bnxt_rxtx_vec_neon.c
> > b/drivers/net/bnxt/bnxt_rxtx_vec_neon.c
> > index 263e6ec3c..13211060c 100644
> > --- a/drivers/net/bnxt/bnxt_rxtx_vec_neon.c
> > +++ b/drivers/net/bnxt/bnxt_rxtx_vec_neon.c
> > @@ -339,7 +339,7 @@ bnxt_handle_tx_cp_vec(struct bnxt_tx_queue *txq)
> >               cons = RING_CMPL(ring_mask, raw_cons);
> >               txcmp = (struct tx_cmpl *)&cp_desc_ring[cons];
> >
> > -             if (!CMP_VALID(txcmp, raw_cons, cp_ring_struct))
> > +             if (!bnxt_cpr_cmp_valid(txcmp, raw_cons, ring_mask + 1))
> Same here. I think cpr->cp_ring_struct->ring_size can be used and it avoids calculation.
> Also some places in other vector files.

It's true that cpr->cp_ring_struct->ring_size and ring_mask + 1 are
equivalent, but there doesn't seem to be a meaningful difference
between the two in the generated code.

Based on disassembly of x86 and Arm code for this function, the compiler
correctly determines that the value of ring_mask + 1 doesn't change within
the loop, so it is only computed once. The only difference would be in
whether an add instruction or a load instruction is used to put the value
in the register.


More information about the stable mailing list