[dpdk-dev,1/2] net/mlx5: replace memory barrier type

Message ID 1503301622-14220-2-git-send-email-sagi@grimberg.me (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Sagi Grimberg Aug. 21, 2017, 7:47 a.m. UTC
  From: Shahaf Shuler <shahafs@mellanox.com>

The reason for the requirement of a barrier between the txq writes
and the doorbell record writes is to avoid a case where the device
reads the doorbell record's new value before the txq writes are flushed
to memory.

The current use of rte_wmb is not necessary, and can be replaced by
rte_compiler_barrier as it acts as a write memory barrier.
More details on this type of barrier can be found on [1]

Replacing the rte_wmb is also expected to improve the throughput.

[1] https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Alexander Solganik <solganik@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/net/mlx5/mlx5_rxtx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Nélio Laranjeiro Aug. 23, 2017, 11:39 a.m. UTC | #1
On Mon, Aug 21, 2017 at 10:47:01AM +0300, Sagi Grimberg wrote:
> From: Shahaf Shuler <shahafs@mellanox.com>
> 
> The reason for the requirement of a barrier between the txq writes
> and the doorbell record writes is to avoid a case where the device
> reads the doorbell record's new value before the txq writes are flushed
> to memory.
> 
> The current use of rte_wmb is not necessary, and can be replaced by
> rte_compiler_barrier as it acts as a write memory barrier.
> More details on this type of barrier can be found on [1]
> 
> Replacing the rte_wmb is also expected to improve the throughput.
> 
> [1] https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
> 
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> Signed-off-by: Alexander Solganik <solganik@gmail.com>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>  drivers/net/mlx5/mlx5_rxtx.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
> index 7de1d10863e5..59b9ff24fb82 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.h
> +++ b/drivers/net/mlx5/mlx5_rxtx.h
> @@ -602,7 +602,7 @@ mlx5_tx_dbrec(struct txq *txq, volatile struct mlx5_wqe *wqe)
>  	uint64_t *dst = (uint64_t *)((uintptr_t)txq->bf_reg);
>  	volatile uint64_t *src = ((volatile uint64_t *)wqe);
>  
> -	rte_wmb();
> +	rte_compiler_barrier();
>  	*txq->qp_db = htonl(txq->wqe_ci);
>  	/* Ensure ordering between DB record and BF copy. */
>  	rte_wmb();
> -- 
> 2.7.4
 
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
  
Bruce Richardson Aug. 23, 2017, 1:11 p.m. UTC | #2
On Wed, Aug 23, 2017 at 01:39:08PM +0200, Nélio Laranjeiro wrote:
> On Mon, Aug 21, 2017 at 10:47:01AM +0300, Sagi Grimberg wrote:
> > From: Shahaf Shuler <shahafs@mellanox.com>
> > 
> > The reason for the requirement of a barrier between the txq writes
> > and the doorbell record writes is to avoid a case where the device
> > reads the doorbell record's new value before the txq writes are flushed
> > to memory.
> > 
> > The current use of rte_wmb is not necessary, and can be replaced by
> > rte_compiler_barrier as it acts as a write memory barrier.
> > More details on this type of barrier can be found on [1]
> > 
> > Replacing the rte_wmb is also expected to improve the throughput.
> > 
> > [1] https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
> > 
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> > Signed-off-by: Alexander Solganik <solganik@gmail.com>
> > Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> > ---
> >  drivers/net/mlx5/mlx5_rxtx.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
> > index 7de1d10863e5..59b9ff24fb82 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx.h
> > +++ b/drivers/net/mlx5/mlx5_rxtx.h
> > @@ -602,7 +602,7 @@ mlx5_tx_dbrec(struct txq *txq, volatile struct mlx5_wqe *wqe)
> >  	uint64_t *dst = (uint64_t *)((uintptr_t)txq->bf_reg);
> >  	volatile uint64_t *src = ((volatile uint64_t *)wqe);
> >  
> > -	rte_wmb();
> > +	rte_compiler_barrier();
> >  	*txq->qp_db = htonl(txq->wqe_ci);
> >  	/* Ensure ordering between DB record and BF copy. */
> >  	rte_wmb();
> > -- 
> > 2.7.4
>  
> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> 
While a compiler barrier may do on platforms with strong ordering, I'm
wondering if the rte_smp_wmb() macro may be needed here to give compiler
barrier or actual memory barrier depending on platform?

/Bruce
  
Shahaf Shuler Aug. 24, 2017, 6:56 a.m. UTC | #3
Wednesday, August 23, 2017 4:12 PM, Bruce Richardson:
> On Wed, Aug 23, 2017 at 01:39:08PM +0200, Nélio Laranjeiro wrote:
> > On Mon, Aug 21, 2017 at 10:47:01AM +0300, Sagi Grimberg wrote:
> >
> > Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> >
> While a compiler barrier may do on platforms with strong ordering, I'm
> wondering if the rte_smp_wmb() macro may be needed here to give
> compiler barrier or actual memory barrier depending on platform?

Thanks for the catch!

However, the description of rte_smp_wmb() not seems to fit our case here.
We don't try to sync between different lcores, rather between the device and a single lcore. 

Maybe rte_io_wmb fits better? 

> 
> /Bruce
  
Bruce Richardson Aug. 24, 2017, 9:27 a.m. UTC | #4
On Thu, Aug 24, 2017 at 06:56:11AM +0000, Shahaf Shuler wrote:
> Wednesday, August 23, 2017 4:12 PM, Bruce Richardson:
> > On Wed, Aug 23, 2017 at 01:39:08PM +0200, Nélio Laranjeiro wrote:
> > > On Mon, Aug 21, 2017 at 10:47:01AM +0300, Sagi Grimberg wrote:
> > >
> > > Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> > >
> > While a compiler barrier may do on platforms with strong ordering, I'm
> > wondering if the rte_smp_wmb() macro may be needed here to give
> > compiler barrier or actual memory barrier depending on platform?
> 
> Thanks for the catch!
> 
> However, the description of rte_smp_wmb() not seems to fit our case here.
> We don't try to sync between different lcores, rather between the device and a single lcore. 
> 
> Maybe rte_io_wmb fits better? 
> 
Yep. Looks about right.

/Bruce
  

Patch

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 7de1d10863e5..59b9ff24fb82 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -602,7 +602,7 @@  mlx5_tx_dbrec(struct txq *txq, volatile struct mlx5_wqe *wqe)
 	uint64_t *dst = (uint64_t *)((uintptr_t)txq->bf_reg);
 	volatile uint64_t *src = ((volatile uint64_t *)wqe);
 
-	rte_wmb();
+	rte_compiler_barrier();
 	*txq->qp_db = htonl(txq->wqe_ci);
 	/* Ensure ordering between DB record and BF copy. */
 	rte_wmb();