[dpdk-dev,1/2] net/mlx5: replace memory barrier type
Checks
Commit Message
From: Shahaf Shuler <shahafs@mellanox.com>
The reason for the requirement of a barrier between the txq writes
and the doorbell record writes is to avoid a case where the device
reads the doorbell record's new value before the txq writes are flushed
to memory.
The current use of rte_wmb is not necessary, and can be replaced by
rte_compiler_barrier as it acts as a write memory barrier.
More details on this type of barrier can be found on [1]
Replacing the rte_wmb is also expected to improve the throughput.
[1] https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Alexander Solganik <solganik@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/net/mlx5/mlx5_rxtx.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Mon, Aug 21, 2017 at 10:47:01AM +0300, Sagi Grimberg wrote:
> From: Shahaf Shuler <shahafs@mellanox.com>
>
> The reason for the requirement of a barrier between the txq writes
> and the doorbell record writes is to avoid a case where the device
> reads the doorbell record's new value before the txq writes are flushed
> to memory.
>
> The current use of rte_wmb is not necessary, and can be replaced by
> rte_compiler_barrier as it acts as a write memory barrier.
> More details on this type of barrier can be found on [1]
>
> Replacing the rte_wmb is also expected to improve the throughput.
>
> [1] https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> Signed-off-by: Alexander Solganik <solganik@gmail.com>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
> drivers/net/mlx5/mlx5_rxtx.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
> index 7de1d10863e5..59b9ff24fb82 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.h
> +++ b/drivers/net/mlx5/mlx5_rxtx.h
> @@ -602,7 +602,7 @@ mlx5_tx_dbrec(struct txq *txq, volatile struct mlx5_wqe *wqe)
> uint64_t *dst = (uint64_t *)((uintptr_t)txq->bf_reg);
> volatile uint64_t *src = ((volatile uint64_t *)wqe);
>
> - rte_wmb();
> + rte_compiler_barrier();
> *txq->qp_db = htonl(txq->wqe_ci);
> /* Ensure ordering between DB record and BF copy. */
> rte_wmb();
> --
> 2.7.4
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
On Wed, Aug 23, 2017 at 01:39:08PM +0200, Nélio Laranjeiro wrote:
> On Mon, Aug 21, 2017 at 10:47:01AM +0300, Sagi Grimberg wrote:
> > From: Shahaf Shuler <shahafs@mellanox.com>
> >
> > The reason for the requirement of a barrier between the txq writes
> > and the doorbell record writes is to avoid a case where the device
> > reads the doorbell record's new value before the txq writes are flushed
> > to memory.
> >
> > The current use of rte_wmb is not necessary, and can be replaced by
> > rte_compiler_barrier as it acts as a write memory barrier.
> > More details on this type of barrier can be found on [1]
> >
> > Replacing the rte_wmb is also expected to improve the throughput.
> >
> > [1] https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> > Signed-off-by: Alexander Solganik <solganik@gmail.com>
> > Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> > ---
> > drivers/net/mlx5/mlx5_rxtx.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
> > index 7de1d10863e5..59b9ff24fb82 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx.h
> > +++ b/drivers/net/mlx5/mlx5_rxtx.h
> > @@ -602,7 +602,7 @@ mlx5_tx_dbrec(struct txq *txq, volatile struct mlx5_wqe *wqe)
> > uint64_t *dst = (uint64_t *)((uintptr_t)txq->bf_reg);
> > volatile uint64_t *src = ((volatile uint64_t *)wqe);
> >
> > - rte_wmb();
> > + rte_compiler_barrier();
> > *txq->qp_db = htonl(txq->wqe_ci);
> > /* Ensure ordering between DB record and BF copy. */
> > rte_wmb();
> > --
> > 2.7.4
>
> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
>
While a compiler barrier may do on platforms with strong ordering, I'm
wondering if the rte_smp_wmb() macro may be needed here to give compiler
barrier or actual memory barrier depending on platform?
/Bruce
Wednesday, August 23, 2017 4:12 PM, Bruce Richardson:
> On Wed, Aug 23, 2017 at 01:39:08PM +0200, Nélio Laranjeiro wrote:
> > On Mon, Aug 21, 2017 at 10:47:01AM +0300, Sagi Grimberg wrote:
> >
> > Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> >
> While a compiler barrier may do on platforms with strong ordering, I'm
> wondering if the rte_smp_wmb() macro may be needed here to give
> compiler barrier or actual memory barrier depending on platform?
Thanks for the catch!
However, the description of rte_smp_wmb() not seems to fit our case here.
We don't try to sync between different lcores, rather between the device and a single lcore.
Maybe rte_io_wmb fits better?
>
> /Bruce
On Thu, Aug 24, 2017 at 06:56:11AM +0000, Shahaf Shuler wrote:
> Wednesday, August 23, 2017 4:12 PM, Bruce Richardson:
> > On Wed, Aug 23, 2017 at 01:39:08PM +0200, Nélio Laranjeiro wrote:
> > > On Mon, Aug 21, 2017 at 10:47:01AM +0300, Sagi Grimberg wrote:
> > >
> > > Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> > >
> > While a compiler barrier may do on platforms with strong ordering, I'm
> > wondering if the rte_smp_wmb() macro may be needed here to give
> > compiler barrier or actual memory barrier depending on platform?
>
> Thanks for the catch!
>
> However, the description of rte_smp_wmb() not seems to fit our case here.
> We don't try to sync between different lcores, rather between the device and a single lcore.
>
> Maybe rte_io_wmb fits better?
>
Yep. Looks about right.
/Bruce
@@ -602,7 +602,7 @@ mlx5_tx_dbrec(struct txq *txq, volatile struct mlx5_wqe *wqe)
uint64_t *dst = (uint64_t *)((uintptr_t)txq->bf_reg);
volatile uint64_t *src = ((volatile uint64_t *)wqe);
- rte_wmb();
+ rte_compiler_barrier();
*txq->qp_db = htonl(txq->wqe_ci);
/* Ensure ordering between DB record and BF copy. */
rte_wmb();