[dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs

Jerin Jacob jerin.jacob at caviumnetworks.com
Thu Dec 3 14:20:34 CET 2015


On Thu, Dec 03, 2015 at 12:42:13PM +0000, Ananyev, Konstantin wrote:
>
>
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > Sent: Thursday, December 03, 2015 12:17 PM
> > To: Ananyev, Konstantin
> > Cc: Thomas Monjalon; dev at dpdk.org; viktorin at rehivetech.com
> > Subject: Re: [dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
> >
> > On Thu, Dec 03, 2015 at 11:02:07AM +0000, Ananyev, Konstantin wrote:
> >
> > Hi Konstantin,
> >
> > > Hi Jerin,
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > > > Sent: Thursday, December 03, 2015 9:34 AM
> > > > To: Thomas Monjalon
> > > > Cc: dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
> > > >
> > > > On Wed, Dec 02, 2015 at 05:57:10PM +0100, Thomas Monjalon wrote:
> > > > > 2015-12-02 22:23, Jerin Jacob:
> > > > > > On Wed, Dec 02, 2015 at 05:40:13PM +0100, Thomas Monjalon wrote:
> > > > > > > 2015-12-02 20:04, Jerin Jacob:
> > > > > > > > On Wed, Dec 02, 2015 at 09:13:51PM +0800, Jianbo Liu wrote:
> > > > > > > > > On 2 December 2015 at 18:39, Jerin Jacob <jerin.jacob at caviumnetworks.com> wrote:
> > > > > > > > > > AND they include "rte_lpm.h"(it internally includes rte_vect.h)
> > > > > > > > > > that lead to multiple definition and its not good.
> > > > > > > > > >
> > > > > > > > > But you will have similar issue since "typedef int32x4_t __m128i"
> > > > > > > > > appears in both your patch and this header file.
> > > > > > > >
> > > > > > > > I just tested it, it won't break, back to back "typedef int32x4_t __m128i"
> > > > > > > > is fine(unlike inline function).
> > > > > > > >
> > > > > > > > my intention to keep __m128i "as is"  because changing the __m128i to rte_???
> > > > > > > > something would break the ABI.
> > > > > > >
> > > > > > > Isn't it already broken in 2.2?
> > > > > >
> > > > > > Does it mean, You would like to have rte_128i(or similar) kind of
> > > > > > abstraction to represent 128bit SIMD variable in DPDK?
> > > > >
> > > > > If you are convinced that it is the best way to write a generic code, yes.
> > > >
> > > > I grep-ed through DPDK API list to see the dependency with SIMD in API
> > > > definition.I see only rte_lpm_lookupx4 API has SIMD dependency in API
> > > > definition.
> > > >
> > > > I believe that's the root cause of the problem. IMO, The
> > > > better way to fix this would be to remove __m128i from API and have more
> > > > general representation to remove the architecture dependency from API
> > > >
> > > > something like this,
> > > >
> > > > rte_lpm_lookupx4(const struct rte_lpm *lpm, uint32_t *ip, uint16_t
> > > > hop[4], uint16_t defv)
> > > >
> > > > instead of
> > > >
> > > > rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t
> > > > hop[4],  uint16_t defv)
> > >
> > > The idea for that function was that rte_lpm_lookupx4() accepts 4 IPv4 addresses that are:
> > > 1. already in 128bit register
> > > 2. 'prepared' - byte swap is already done for them if needed.
> > >
> > > About ways to fix  __m128i dependency: as I can see x86 and arm DPDK code
> > > already has xmm_t typedef:
> > >
> > > $ find lib -type f | xargs grep xmm_t | grep typedef
> > > lib/librte_eal/common/include/arch/x86/rte_vect.h:typedef __m128i xmm_t;
> > > lib/librte_eal/common/include/arch/arm/rte_vect.h:typedef int32x4_t xmm_t;
> > >
> > > Why not to  change rte_lpm_lookupx4() to accept xmm_t as input parameter.
> > > As I understand it would solve the problem, and wouldn't introduce any API/ABI breakage, right?
> >
> > Yes, If we have API/ABI breakage concerns.
> >
> > IMO, Now this would call for some kind of rte_vect_* abstraction load,
> > store, set kind of SIMD operation on xmm_t in common test code to
> > aviod #ifdef's in app/test/test_lpm.c
>
> Yes, seems so.
>
> >
> > I guess we may not need those abstractions in
> > lib/librte_eal/common/include/arch/ directory.
> > keeping in app/test/xmmt_ops.h should be enough, right?
>
> That sounds ok to me.
> At least for now.
> For future the more generic question - do we like to have some
> generic layer abstraction for similar vector instrincts across different archs?
> From one side it might help people writing/using vector implementation of some stuff,
> from other side - there would be extra hassle creating/supporting it.

There are few such libaries avilable on web. example:

NEON -> SSE
https://software.intel.com/sites/default/files/managed/cf/f6/NEONvsSSE.h

SSE -> NEON
https://github.com/jratcliff63367/sse2neon/blob/master/SSE2NEON.h

but coming up with common abstraction will  be difficult as it's not
one to one mapped all the time and performance criteria to choose
the instruction on given architecture to realize a certain logic etc

Jerin

>
> Konstantin
>
> >
> >
> > >
> > > Konstantin
> > >
> > > >
> > > > Now I am not sure why this API was created like this, from l3fwd.c
> > > > example, it looks to accommodate the IPV4 byte swap[1]. If it's true,
> > > > maybe we can have eal byte swap abstraction for optimized byte swap on
> > > > memory for 4 IP address in one shot
> > > >
> > > > or
> > > >
> > > > Have rte_lpm_lookupx4 take an argument for byte swap or not ?
> > > >
> > > > or
> > > >
> > > > something similar?
> > > >
> > > > Thoughts ?
> > > >
> > > > [1]
> > > > const  __m128i bswap_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, 11,
> > > >                                                 4, 5, 6, 7, 0, 1, 2, 3);
> > > > /* Byte swap 4 IPV4 addresses. */
> > > > dip = _mm_shuffle_epi8(dip, bswap_mask);
> > > >
> > > > Jerin
> > > >
> > > > > I think the most important question is to know what is the best solution
> > > > > for performance and maintainability. The API/ABI questions will be considered
> > > > > after.
> > > > >
> > > > > Thanks for your involvement guys.


More information about the dev mailing list