[PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path

Slava Ovsiienko viacheslavo at nvidia.com
Wed Jun 29 09:55:18 CEST 2022


Hi, Ruifeng

> -----Original Message-----
> From: Ruifeng Wang <Ruifeng.Wang at arm.com>
> Sent: Monday, June 27, 2022 14:08
> To: Slava Ovsiienko <viacheslavo at nvidia.com>; Ali Alnubani
> <alialnu at nvidia.com>; Matan Azrad <matan at nvidia.com>
> Cc: dev at dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>;
> stable at dpdk.org; nd <nd at arm.com>; nd <nd at arm.com>
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> vector path
> 
> > -----Original Message-----
> > From: Slava Ovsiienko <viacheslavo at nvidia.com>
> > Sent: Monday, June 20, 2022 1:38 PM
> > To: Ali Alnubani <alialnu at nvidia.com>; Ruifeng Wang
> > <Ruifeng.Wang at arm.com>; Matan Azrad <matan at nvidia.com>
> > Cc: dev at dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>;
> > stable at dpdk.org; nd <nd at arm.com>
> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > vector path
> >
> > Hi, Ruifeng
> 
> Hi Slava,
> 
> Thanks for your review.
> >
> > My apologies for review delay.
> 
> Apologies too. I was on something else.
> 
> > As far I understand the hypothetical problem scenario is:
> > - CPU core reorders reading of qwords of 16B vector
> > - core reads the second 8B of CQE (old CQE values)
> > - CQE update
> > - core reads the first 8B of CQE (new CQE values)
> 
> Yes, This is the problem.
> >
> > How the re-reading of CQEs can resolve the issue?
> > This wrong scenario might happen on the second read and we would run
> > into the same issue.
> 
> Here we are trying to ordering reading of a 16B vector (8B with op_own -
> high, and 8B without op_own - low).
> The first read will load 16B. The second read will load and update low
> 8B (no op_own).
OK, I got the point, thank you for the explanations.
Can we avoid the first reading of low 8B (no containing CQE owning field)? 

I mean to update this part to read only upper 8Bs:
                /* B.0 (CQE 3) load a block having op_own. */
                c3 = vld1q_u64((uint64_t *)(p3 + 48));
                /* B.0 (CQE 2) load a block having op_own. */
                c2 = vld1q_u64((uint64_t *)(p2 + 48));
                /* B.0 (CQE 1) load a block having op_own. */
                c1 = vld1q_u64((uint64_t *)(p1 + 48));
                /* B.0 (CQE 0) load a block having op_own. */
                c0 = vld1q_u64((uint64_t *)(p0 + 48));
                /* Synchronize for loading the rest of blocks. */
                rte_io_rmb();

Because lower 8Bs will be overlapped with the second read (in your patch) 
and barrier ensures the correct order.


With best regards,
Slava


More information about the stable mailing list