[dpdk-dev] [PATCH v2 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path

Ruifeng Wang ruifeng.wang at arm.com
Wed Sep 15 10:33:38 CEST 2021


Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates
that the rest of the descriptor words have valid values. Hence, the
word containing DD bit must be read first before reading the rest of
the descriptor words.

In NEON vector PMD, vector load loads two contiguous 8B of
descriptor data into vector register. Given vector load ensures no
16B atomicity, read of the word that includes DD field could be
reordered after read of other words. In this case, some words could
contain invalid data.

Read barrier is added after read of qword1 that includes DD field.
And qword0 is reloaded to update vector register. This ensures
that the fetched data is correct.

Testpmd single core test on N1SDP/ThunderX2 showed no performance drop.

Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
Cc: stable at dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang at arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
---
 drivers/net/i40e/i40e_rxtx_vec_neon.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c
index b2683fda60..71191c7cc8 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -286,6 +286,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq,
 		descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
 		descs[0] =  vld1q_u64((uint64_t *)(rxdp));
 
+		/* Use acquire fence to order loads of descriptor qwords */
+		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+		/* A.2 reload qword0 to make it ordered after qword1 load */
+		descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0);
+		descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0);
+		descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0);
+		descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0);
+
 		/* B.1 load 4 mbuf point */
 		mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
 		mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
-- 
2.25.1



More information about the dev mailing list