[PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path

Ruifeng Wang Ruifeng.Wang at arm.com
Thu Sep 29 08:51:20 CEST 2022

Previous message (by thread): [PATCH v5] event/dlb2: fix max enqueue and dequeue cli override
Next message (by thread): [PATCH] devtools: fix checkpatches subject retrieval from stdin
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> -----Original Message-----
> From: Ruifeng Wang
> Sent: Wednesday, June 29, 2022 7:41 PM
> To: Slava Ovsiienko <viacheslavo at nvidia.com>; Ali Alnubani <alialnu at nvidia.com>; Matan
> Azrad <matan at nvidia.com>
> Cc: dev at dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>; stable at dpdk.org; nd
> <nd at arm.com>; nd <nd at arm.com>; nd <nd at arm.com>
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
> 
> > -----Original Message-----
> > From: Slava Ovsiienko <viacheslavo at nvidia.com>
> > Sent: Wednesday, June 29, 2022 3:55 PM
> > To: Ruifeng Wang <Ruifeng.Wang at arm.com>; Ali Alnubani
> > <alialnu at nvidia.com>; Matan Azrad <matan at nvidia.com>
> > Cc: dev at dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>;
> > stable at dpdk.org; nd <nd at arm.com>; nd <nd at arm.com>
> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > vector path
> >
> > Hi, Ruifeng
> >
> > > -----Original Message-----
> > > From: Ruifeng Wang <Ruifeng.Wang at arm.com>
> > > Sent: Monday, June 27, 2022 14:08
> > > To: Slava Ovsiienko <viacheslavo at nvidia.com>; Ali Alnubani
> > > <alialnu at nvidia.com>; Matan Azrad <matan at nvidia.com>
> > > Cc: dev at dpdk.org; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli at arm.com>;
> > > stable at dpdk.org; nd <nd at arm.com>; nd <nd at arm.com>
> > > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in
> > > NEON vector path
> > >
> > > > -----Original Message-----
> > > > From: Slava Ovsiienko <viacheslavo at nvidia.com>
> > > > Sent: Monday, June 20, 2022 1:38 PM
> > > > To: Ali Alnubani <alialnu at nvidia.com>; Ruifeng Wang
> > > > <Ruifeng.Wang at arm.com>; Matan Azrad <matan at nvidia.com>
> > > > Cc: dev at dpdk.org; Honnappa Nagarahalli
> > > > <Honnappa.Nagarahalli at arm.com>; stable at dpdk.org; nd <nd at arm.com>
> > > > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in
> > > > NEON vector path
> > > >
> > > > Hi, Ruifeng
> > >
> > > Hi Slava,
> > >
> > > Thanks for your review.
> > > >
> > > > My apologies for review delay.
> > >
> > > Apologies too. I was on something else.
> > >
> > > > As far I understand the hypothetical problem scenario is:
> > > > - CPU core reorders reading of qwords of 16B vector
> > > > - core reads the second 8B of CQE (old CQE values)
> > > > - CQE update
> > > > - core reads the first 8B of CQE (new CQE values)
> > >
> > > Yes, This is the problem.
> > > >
> > > > How the re-reading of CQEs can resolve the issue?
> > > > This wrong scenario might happen on the second read and we would
> > > > run into the same issue.
> > >
> > > Here we are trying to ordering reading of a 16B vector (8B with
> > > op_own
> > > - high, and 8B without op_own - low).
> > > The first read will load 16B. The second read will load and update
> > > low 8B (no op_own).
> > OK, I got the point, thank you for the explanations.
> > Can we avoid the first reading of low 8B (no containing CQE owning field)?
> >
> > I mean to update this part to read only upper 8Bs:
> >                 /* B.0 (CQE 3) load a block having op_own. */
> >                 c3 = vld1q_u64((uint64_t *)(p3 + 48));
> >                 /* B.0 (CQE 2) load a block having op_own. */
> >                 c2 = vld1q_u64((uint64_t *)(p2 + 48));
> >                 /* B.0 (CQE 1) load a block having op_own. */
> >                 c1 = vld1q_u64((uint64_t *)(p1 + 48));
> >                 /* B.0 (CQE 0) load a block having op_own. */
> >                 c0 = vld1q_u64((uint64_t *)(p0 + 48));
> >                 /* Synchronize for loading the rest of blocks. */
> >                 rte_io_rmb();
> >
> > Because lower 8Bs will be overlapped with the second read (in your
> > patch) and barrier ensures the correct order.
> 
> Hi Slava,
> 
> Yes, your suggestion is valid.
> Actually, I tried that approach: load higher 8B + barrier + load lower 8B + combine the
> two 8Bs into a vector.
> It also has no observable performance impact but generates more instructions compared to
> the current patch (the 'combine' operation).
> So I followed current approach.
> 
> Thanks.
> >
Hi Slava,

Are there any further comments?

Thanks,
Ruifeng

Previous message (by thread): [PATCH v5] event/dlb2: fix max enqueue and dequeue cli override
Next message (by thread): [PATCH] devtools: fix checkpatches subject retrieval from stdin
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the stable mailing list