[dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions

Nélio Laranjeiro nelio.laranjeiro at 6wind.com
Thu Oct 26 15:44:24 CEST 2017


On Thu, Oct 26, 2017 at 12:30:54PM +0000, Matan Azrad wrote:
> Hi Nelio
> Please see my comments below (3).
> 
> 
> > -----Original Message-----
> > From: Nélio Laranjeiro [mailto:nelio.laranjeiro at 6wind.com]
> > Sent: Thursday, October 26, 2017 3:12 PM
> > To: Matan Azrad <matan at mellanox.com>
> > Cc: Ophir Munk <ophirmu at mellanox.com>; Adrien Mazarguil
> > <adrien.mazarguil at 6wind.com>; dev at dpdk.org; Thomas Monjalon
> > <thomas at monjalon.net>; Olga Shern <olgas at mellanox.com>; Mordechay
> > Haimovsky <motih at mellanox.com>
> > Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions
> > 
> > On Thu, Oct 26, 2017 at 10:31:06AM +0000, Matan Azrad wrote:
> > > Hi Nelio
> > >
> > > I think the memory barrier discussion is not relevant for this patch
> > > (if it will be relevant I will create new one).
> > > Please see my comments inline.
> > 
> > It was not my single comment.  There is also useless code like having null
> > segments in the packets which is not allowed on DPDK.
> 
> Sorry, but I can't find comments in the previous mails.

You should search in the series,

> Moreover  this comment(first time I see it) is not relevant to this patch and asking something else.
> All what this patch does is to merge 2 functions to prevent double
> asking about WQ remain space...

Again in the series itself.

The point, this series embed 7 patches for "performance improvement",
whereas the single improvement is avoiding to call an outside function by
copy/pasting it into the PMD.
In fact it will save few cycles, but this improvements could have been
much more if the it was not a bare copy/paste.

The real question is what is the improvement?  If the improvement is
significant, it worse having this series, otherwise it does not as it
may also bring some bugs which may be resolve from its original source
whereas this one will remain.

> Remove memory\compiler barriers or dealing with null segments are not in the scope here. 
> 
> > 
> > > Regarding this specific patch, I didn't see any comment from you, Are
> > > you agree with it?
> > >
> > > > -----Original Message-----
> > > > From: Nélio Laranjeiro [mailto:nelio.laranjeiro at 6wind.com]
> > > > Sent: Wednesday, October 25, 2017 10:50 AM
> > > > To: Ophir Munk <ophirmu at mellanox.com>
> > > > Cc: Adrien Mazarguil <adrien.mazarguil at 6wind.com>; dev at dpdk.org;
> > > > Thomas Monjalon <thomas at monjalon.net>; Olga Shern
> > > > <olgas at mellanox.com>; Matan Azrad <matan at mellanox.com>
> > > > Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path
> > > > functions
> > > >
> > > > On Tue, Oct 24, 2017 at 08:36:52PM +0000, Ophir Munk wrote:
> > > > > Hi,
> > > > >
> > > > > On Tuesday, October 24, 2017 4:52 PM, Nélio Laranjeiro wrote:
> > > > > >
> > > > > > On Mon, Oct 23, 2017 at 02:21:57PM +0000, Ophir Munk wrote:
> > > > > > > From: Matan Azrad <matan at mellanox.com>
> > > > > > >
> > > > > > > Merge tx_burst and mlx4_post_send functions to prevent double
> > > > > > > asking about WQ remain space.
> > > > > > >
> > > > > > > This should improve performance.
> > > > > > >
> > > > > > > Signed-off-by: Matan Azrad <matan at mellanox.com>
> > > > > > > ---
> > > > > > >  drivers/net/mlx4/mlx4_rxtx.c | 353
> > > > > > > +++++++++++++++++++++----------------------
> > > > > > >  1 file changed, 170 insertions(+), 183 deletions(-)
> > > > > >
> > > > > > What are the real expectation you have on the remaining patches
> > > > > > of the series?
> > > > > >
> > > > > > According to the comment of this commit log "This should improve
> > > > > > performance" there are too many barriers at each packet/segment
> > > > > > level to improve something.
> > > > > >
> > > > > > The point is, mlx4_burst_tx() should write all the WQE without
> > > > > > any barrier as it is processing a burst of packets (whereas
> > > > > > Verbs functions which may only process a single packet).
> > > > >
> > > > > > The lonely barrier which should be present is the one to ensure
> > > > > > that all the host memory is flushed before triggering the Tx doorbell.
> > > > > >
> > > > >
> > > > > There is a known ConnectX-3 HW limitation: the first 4 bytes of
> > > > > every TXWBB (64 bytes chunks) should be written in a reversed
> > > > > order (from last TXWBB to first TXWBB).
> > > >
> > > > This means the first WQE filled by the burst function is the doorbell.
> > > > In such situation, the first four bytes of it can be written before
> > > > leaving the burst function and after a write memory barrier.
> > > >
> > > > Until this first WQE is not complete, the NIC won't start processing
> > > > the packets.  Memory barriers per packets becomes useless.
> > >
> > > I think this is not true, Since mlx4 HW can prefetch advanced TXbbs if
> > > their first 4 bytes are valid in spite of the first WQE is still not valid (please
> > read the spec).
> > 
> > A compiler barrier is enough on x86 to forbid the CPU to re-order the
> > instructions, on arm you need a memory barrier, there is a macro in DPDK for
> > that, rte_io_wmb().
> > 
> We are also using compiler barrier here.
> 
> > Before triggering the doorbell you must flush the case, this is the only place
> > where the rte_wmb() should be used.
> > 
> 
> We are also using memory barrier only for this reason.
> 
> > > > It gives something like:
> > > >
> > > >  uint32_t tx_bb_db = 0;
> > > >  void *first_wqe = NULL;
> > > >
> > > >  /*
> > > >   * Prepare all Packets by writing the WQEs without the 4 first bytes of
> > > >   * the first WQE.
> > > >   */
> > > >  for () {
> > > >  	if (!wqe) {
> > > > 		first_wqe = wqe;
> > > > 		tx_bb_db = foo;
> > > > 	}
> > > >  }
> > > >  /* Leaving. */
> > > >  rte_wmb();
> > > >  *(uin32_t*)wqe = tx_bb_db;
> > > >  return n;
> > > >
> > >
> > > I will take care to check if we can do 2 loops:
> > > Write all  last 60B per TXbb.
> > > Memory barrier.
> > > Write all first 4B per TXbbs.
> > >
> > > > > The last 60 bytes of any TXWBB can be written in any order (before
> > > > > writing the first 4 bytes).
> > > > > Is your last statement (using lonely barrier) is in accordance
> > > > > with this limitation? Please explain.
> > > > >
> > > > > > There is also too many cases handled which are useless in bursts
> > > > situation,
> > > > > > this function needs to be re-written to its minimal use case i.e.
> > > > processing a
> > > > > > valid burst of packets/segments and triggering at the end of the
> > > > > > burst the
> > > > Tx
> > > > > > doorbell.
> > > > > >
> > > >
> > > > Regards,
> > > >
> > > > --
> > > > Nélio Laranjeiro
> > > > 6WIND
> > 
> > Regards,
> > 
> > --
> > Nélio Laranjeiro
> > 6WIND

-- 
Nélio Laranjeiro
6WIND


More information about the dev mailing list