[dpdk-dev,v2,5/7] net/mlx4: remove unnecessary variables in Tx burst

Message ID 1508768520-4810-6-git-send-email-ophirmu@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Ophir Munk Oct. 23, 2017, 2:21 p.m. UTC
  From: Matan Azrad <matan@mellanox.com>

Remove usage of variables which doesn't add new information for
performance improvement.

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx4/mlx4_rxtx.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)
  

Comments

Adrien Mazarguil Oct. 25, 2017, 4:49 p.m. UTC | #1
Hi Ophir/Matan,

On Mon, Oct 23, 2017 at 02:21:58PM +0000, Ophir Munk wrote:
> From: Matan Azrad <matan@mellanox.com>
> 
> Remove usage of variables which doesn't add new information for
> performance improvement.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>

I'm almost 100% sure this commit wasn't validated for performance on its
own. Don't mention "performance improvement" in that case.

If you're removing a couple of local variables for readability, just say
so.

More below.

> ---
>  drivers/net/mlx4/mlx4_rxtx.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c
> index 014a6d3..e8d9a35 100644
> --- a/drivers/net/mlx4/mlx4_rxtx.c
> +++ b/drivers/net/mlx4/mlx4_rxtx.c
> @@ -285,8 +285,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
>  	struct txq *txq = (struct txq *)dpdk_txq;
>  	unsigned int elts_head = txq->elts_head;
>  	const unsigned int elts_n = txq->elts_n;
> -	unsigned int elts_comp = 0;
> -	unsigned int bytes_sent = 0;
>  	unsigned int i;
>  	unsigned int max;
>  	struct mlx4_sq *sq = &txq->msq;
> @@ -498,8 +496,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
>  						       MLX4_BIT_WQE_OWN : 0));
>  		sq->head += nr_txbbs;
>  		elt->buf = buf;
> -		bytes_sent += buf->pkt_len;
> -		++elts_comp;
> +		txq->stats.obytes += buf->pkt_len;
>  		elts_head = elts_head_next;
>  	}
>  	/* Take a shortcut if nothing must be sent. */
> @@ -507,13 +504,12 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
>  		return 0;
>  	/* Increment send statistics counters. */
>  	txq->stats.opackets += i;
> -	txq->stats.obytes += bytes_sent;
>  	/* Make sure that descriptors are written before doorbell record. */
>  	rte_wmb();
>  	/* Ring QP doorbell. */
>  	rte_write32(txq->msq.doorbell_qpn, txq->msq.db);
>  	txq->elts_head = elts_head;
> -	txq->elts_comp += elts_comp;
> +	txq->elts_comp += i;
>  	return i;
>  }

For bytes_sent, reading these changes and assuming -O0 with the compiler
attempting to convert the code without reordering/improving things, this
replaces register variables used in a loop with memory operations on a large
structure through a pointer (txq->stats update for every iteration instead
of once at the end).

Are you really sure this is more optimized that way? Although the compiler
likely does it already with -O3, helping it avoid unnecessary memory writes
is good in my opinion.

OK for the removal of redundant elts_comp though. Although I'm pretty sure
once again the compiler didn't wait for this patch to optimize it away.
  

Patch

diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c
index 014a6d3..e8d9a35 100644
--- a/drivers/net/mlx4/mlx4_rxtx.c
+++ b/drivers/net/mlx4/mlx4_rxtx.c
@@ -285,8 +285,6 @@  mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	struct txq *txq = (struct txq *)dpdk_txq;
 	unsigned int elts_head = txq->elts_head;
 	const unsigned int elts_n = txq->elts_n;
-	unsigned int elts_comp = 0;
-	unsigned int bytes_sent = 0;
 	unsigned int i;
 	unsigned int max;
 	struct mlx4_sq *sq = &txq->msq;
@@ -498,8 +496,7 @@  mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 						       MLX4_BIT_WQE_OWN : 0));
 		sq->head += nr_txbbs;
 		elt->buf = buf;
-		bytes_sent += buf->pkt_len;
-		++elts_comp;
+		txq->stats.obytes += buf->pkt_len;
 		elts_head = elts_head_next;
 	}
 	/* Take a shortcut if nothing must be sent. */
@@ -507,13 +504,12 @@  mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		return 0;
 	/* Increment send statistics counters. */
 	txq->stats.opackets += i;
-	txq->stats.obytes += bytes_sent;
 	/* Make sure that descriptors are written before doorbell record. */
 	rte_wmb();
 	/* Ring QP doorbell. */
 	rte_write32(txq->msq.doorbell_qpn, txq->msq.db);
 	txq->elts_head = elts_head;
-	txq->elts_comp += elts_comp;
+	txq->elts_comp += i;
 	return i;
 }