[PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path

Ruifeng Wang Ruifeng.Wang at arm.com
Mon Jun 27 13:08:21 CEST 2022


> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo at nvidia.com>
> Sent: Monday, June 20, 2022 1:38 PM
> To: Ali Alnubani <alialnu at nvidia.com>; Ruifeng Wang
> <Ruifeng.Wang at arm.com>; Matan Azrad <matan at nvidia.com>
> Cc: dev at dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli at arm.com>; stable at dpdk.org; nd <nd at arm.com>
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector
> path
> 
> Hi, Ruifeng

Hi Slava,

Thanks for your review.
> 
> My apologies for review delay.

Apologies too. I was on something else.

> As far I understand the hypothetical problem scenario is:
> - CPU core reorders reading of qwords of 16B vector
> - core reads the second 8B of CQE (old CQE values)
> - CQE update
> - core reads the first 8B of CQE (new CQE values)

Yes, This is the problem.
> 
> How the re-reading of CQEs can resolve the issue?
> This wrong scenario might happen on the second read and we would run into
> the same issue.

Here we are trying to ordering reading of a 16B vector (8B with op_own - high, and 8B without op_own - low).
The first read will load 16B. The second read will load and update low 8B (no op_own).
There are 2 possible status indicated by op_own: valid, invalid.
If CQE status is invalid, no problem, it will be ignored this time.
If CQE status is valid, the second read ensures the rest of CQE is no older than high 8B (with op_own). 
Assuming NIC updates op_own no earlier than the rest part of CQE, I think the second read ensures CQE content retrieved is correct.

> 
> In my opinion, the right solution to cover potential reordering should be:
> - read CQE
> - check CQE status (first 8B)

We don't need to check CQE status at the moment. See explanation above.
> - read memory barrier
> - read the rest of CQE
> 
> With best regards,
> Slava
> 
> > -----Original Message-----
> > From: Ali Alnubani <alialnu at nvidia.com>
> > Sent: Thursday, May 19, 2022 17:56
> > To: Ruifeng Wang <ruifeng.wang at arm.com>; Matan Azrad
> > <matan at nvidia.com>; Slava Ovsiienko <viacheslavo at nvidia.com>
> > Cc: dev at dpdk.org; honnappa.nagarahalli at arm.com; stable at dpdk.org;
> > nd at arm.com
> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > vector path
> >
> > > -----Original Message-----
> > > From: Ruifeng Wang <ruifeng.wang at arm.com>
> > > Sent: Tuesday, January 4, 2022 5:01 AM
> > > To: Matan Azrad <matan at nvidia.com>; Slava Ovsiienko
> > > <viacheslavo at nvidia.com>
> > > Cc: dev at dpdk.org; honnappa.nagarahalli at arm.com; stable at dpdk.org;
> > > nd at arm.com; Ruifeng Wang <ruifeng.wang at arm.com>
> > > Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > > vector path
> > >
> > > In NEON vector PMD, vector load loads two contiguous 8B of
> > > descriptor data into vector register. Given vector load ensures no
> > > 16B atomicity, read of the word that includes op_own field could be
> > > reordered after read of other words. In this case, some words could
> > > contain invalid data.
> > >
> > > Reloaded qword0 after read barrier to update vector register. This
> > > ensures that the fetched data is correct.
> > >
> > > Testpmd single core test on N1SDP/ThunderX2 showed no performance
> > > drop.
> > >
> > > Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx
> > > completions")
> > > Cc: stable at dpdk.org
> > >
> > > Signed-off-by: Ruifeng Wang <ruifeng.wang at arm.com>
> > > ---
> >
> > Tested with BlueField-2 and didn't see a performance impact.
> >
> > Tested-by: Ali Alnubani <alialnu at nvidia.com>
> >
> > Thanks,
> > Ali


More information about the dev mailing list