[dpdk-dev] [PATCH v2 0/3] enable AVX512 for iavf

Morten Brørup mb at smartsharesystems.com
Thu Sep 17 11:35:33 CEST 2020


> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> Sent: Thursday, September 17, 2020 11:13 AM
> 
> On Thu, Sep 17, 2020 at 09:37:29AM +0200, Morten Brørup wrote:
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wenzhuo Lu
> > > Sent: Thursday, September 17, 2020 3:40 AM
> > >
> > > AVX512 instructions is supported by more and more platforms. These
> > > instructions
> > > can be used in the data path to enhance the per-core performance of
> > > packet
> > > processing.
> > > Comparing with the existing implementation, this path set
> introduces
> > > some AVX512
> > > instructions into the iavf data path, and we get a better per-code
> > > throughput.
> > >
> > > v2:
> > > Update meson.build.
> > > Repalce the deprecated 'buf_physaddr' by 'buf_iova'.
> > >
> > > Wenzhuo Lu (3):
> > >   net/iavf: enable AVX512 for legacy RX
> > >   net/iavf: enable AVX512 for flexible RX
> > >   net/iavf: enable AVX512 for TX
> > >
> > >  doc/guides/rel_notes/release_20_11.rst  |    3 +
> > >  drivers/net/iavf/iavf_ethdev.c          |    3 +-
> > >  drivers/net/iavf/iavf_rxtx.c            |   69 +-
> > >  drivers/net/iavf/iavf_rxtx.h            |   18 +
> > >  drivers/net/iavf/iavf_rxtx_vec_avx512.c | 1720
> > > +++++++++++++++++++++++++++++++
> > >  drivers/net/iavf/meson.build            |   17 +
> > >  6 files changed, 1818 insertions(+), 12 deletions(-)
> > >  create mode 100644 drivers/net/iavf/iavf_rxtx_vec_avx512.c
> > >
> > > --
> > > 1.9.3
> > >
> >
> > I am not sure I understand the full context here, so please bear with
> me if I'm completely off...
> >
> > With this patch set, it looks like the driver manipulates the mempool
> cache directly, bypassing the libararies encapsulating it.
> >
> > Isn't that going deeper into a library than expected... What if the
> implementation of the mempool library changes radically?
> >
> > And if there are performance gains to be achieved by using vector
> instructions for manipulating the mempool, perhaps your vector
> optimizations should go into the mempool library instead?
> >
> 
> Looking specifically at the descriptor re-arm code, the benefit from
> working off the mempool cache directly comes from saving loads by
> merging
> the code blocks, rather than directly from the vectorization itself -
> though the vectorization doesn't hurt. The original code having a
> separate
> mempool function worked roughly like below:
> 
> 1. mempool code loads mbuf pointers from cache
> 2. mempool code writes mbuf pointers to the SW ring for the NIC
> 3. driver code loads the mempool pointers from the SW ring
> 4. driver code then does the rest of the descriptor re-arm.
> 
> The benefit comes from eliminating step 3, the loads in the driver,
> which
> are dependent upon the previous stores. By having the driver itself
> read
> from the mempool cache (the code still uses mempool functions for every
> other part, since everything beyond the cache depends on the
> ring/stack/bucket implementation), we can have the stores go out, and
> while
> they are completing reuse the already-loaded data to do the descriptor
> rearm.
> 
> Hope this clarifies things.
> 
> /Bruce
> 

Thank you for the detailed explanation, Bruce.

It makes sense to me now. So,

Acked-By: Morten Brørup <mb at smartsharesystems.com>


Med venlig hilsen / kind regards
- Morten Brørup





More information about the dev mailing list