[dpdk-dev] net/mlx5: add prefetching Rx completion queue

Message ID 20170117020940.37453-1-yskoh@mellanox.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel compilation fail Compilation issues

Commit Message

Yongseok Koh Jan. 17, 2017, 2:09 a.m. UTC
  On receiving a compressed session of Rx completion, prefetch every entries
to be invalidated. Also, invalidate consumed completions per every 8
mini-completions, not to wait until the last entry is consumed. This helps
to reduce jitter in rx_burst.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxtx.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)
  

Comments

Adrien Mazarguil Jan. 20, 2017, 4:56 p.m. UTC | #1
On Mon, Jan 16, 2017 at 06:09:40PM -0800, Yongseok Koh wrote:
> On receiving a compressed session of Rx completion, prefetch every entries
> to be invalidated. Also, invalidate consumed completions per every 8
> mini-completions, not to wait until the last entry is consumed. This helps
> to reduce jitter in rx_burst.
> 
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_rxtx.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)

Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
  
Ferruh Yigit Jan. 20, 2017, 6:42 p.m. UTC | #2
On 1/20/2017 4:56 PM, Adrien Mazarguil wrote:
> On Mon, Jan 16, 2017 at 06:09:40PM -0800, Yongseok Koh wrote:
>> On receiving a compressed session of Rx completion, prefetch every entries
>> to be invalidated. Also, invalidate consumed completions per every 8
>> mini-completions, not to wait until the last entry is consumed. This helps
>> to reduce jitter in rx_burst.
>>
>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>> ---
>>  drivers/net/mlx5/mlx5_rxtx.c | 23 ++++++++++++++++++++---
>>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

Applied to dpdk-next-net/master, thanks.
  

Patch

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 94157202c..2ae949295 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1156,6 +1156,7 @@  mlx5_rx_poll_len(struct rxq *rxq, volatile struct mlx5_cqe *cqe,
 	struct rxq_zip *zip = &rxq->zip;
 	uint16_t cqe_n = cqe_cnt + 1;
 	int len = 0;
+	uint16_t idx, end;
 
 	/* Process compressed data in the CQE and mini arrays. */
 	if (zip->ai) {
@@ -1166,6 +1167,14 @@  mlx5_rx_poll_len(struct rxq *rxq, volatile struct mlx5_cqe *cqe,
 		len = ntohl((*mc)[zip->ai & 7].byte_cnt);
 		*rss_hash = ntohl((*mc)[zip->ai & 7].rx_hash_result);
 		if ((++zip->ai & 7) == 0) {
+			/* Invalidate consumed CQEs */
+			idx = zip->ca;
+			end = zip->na;
+			while (idx != end) {
+				(*rxq->cqes)[idx & cqe_cnt].op_own =
+					MLX5_CQE_INVALIDATE;
+				++idx;
+			}
 			/*
 			 * Increment consumer index to skip the number of
 			 * CQEs consumed. Hardware leaves holes in the CQ
@@ -1175,8 +1184,9 @@  mlx5_rx_poll_len(struct rxq *rxq, volatile struct mlx5_cqe *cqe,
 			zip->na += 8;
 		}
 		if (unlikely(rxq->zip.ai == rxq->zip.cqe_cnt)) {
-			uint16_t idx = rxq->cq_ci + 1;
-			uint16_t end = zip->cq_ci;
+			/* Invalidate the rest */
+			idx = zip->ca;
+			end = zip->cq_ci;
 
 			while (idx != end) {
 				(*rxq->cqes)[idx & cqe_cnt].op_own =
@@ -1212,7 +1222,7 @@  mlx5_rx_poll_len(struct rxq *rxq, volatile struct mlx5_cqe *cqe,
 			 * special case the second one is located 7 CQEs after
 			 * the initial CQE instead of 8 for subsequent ones.
 			 */
-			zip->ca = rxq->cq_ci & cqe_cnt;
+			zip->ca = rxq->cq_ci;
 			zip->na = zip->ca + 7;
 			/* Compute the next non compressed CQE. */
 			--rxq->cq_ci;
@@ -1221,6 +1231,13 @@  mlx5_rx_poll_len(struct rxq *rxq, volatile struct mlx5_cqe *cqe,
 			len = ntohl((*mc)[0].byte_cnt);
 			*rss_hash = ntohl((*mc)[0].rx_hash_result);
 			zip->ai = 1;
+			/* Prefetch all the entries to be invalidated */
+			idx = zip->ca;
+			end = zip->cq_ci;
+			while (idx != end) {
+				rte_prefetch0(&(*rxq->cqes)[(idx) & cqe_cnt]);
+				++idx;
+			}
 		} else {
 			len = ntohl(cqe->byte_cnt);
 			*rss_hash = ntohl(cqe->rx_hash_res);