[dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs

Ananyev, Konstantin konstantin.ananyev at intel.com
Wed Dec 2 11:33:44 CET 2015


Hi everyone,

> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianbo Liu
> Sent: Wednesday, December 02, 2015 9:50 AM
> To: Jerin Jacob
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs
> 
> On 2 December 2015 at 16:03, Jerin Jacob <jerin.jacob at caviumnetworks.com> wrote:
> > On Wed, Dec 02, 2015 at 02:54:52PM +0800, Jianbo Liu wrote:
> >> On 2 December 2015 at 00:41, Jerin Jacob <jerin.jacob at caviumnetworks.com> wrote:
> >> > On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote:
> >> >> Adds ARM NEON support for lpm.
> >> >> And enables table/pipeline libraries which depend on lpm.
> >> >
> >> > I already sent the patch on the same yesterday.
> >> > We can converge the patches after the discussion.
> >> > Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml
> >> >
> >> Yes, I have read your patch. But there are many differences, so I sent
> >> mine for your reviewing :)
> >>
> >> >
> >> >>
> >> >> Signed-off-by: Jianbo Liu <jianbo.liu at linaro.org>
> >> >> ---
> >> >>  config/defconfig_arm-armv7a-linuxapp-gcc          |  3 -
> >> >>  config/defconfig_arm64-armv8a-linuxapp-gcc        |  3 -
> >> >>  lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++++++++++
> >> >>  lib/librte_lpm/rte_lpm.h                          | 68 ++++++++++++++++-------
> >> >>  4 files changed, 77 insertions(+), 25 deletions(-)
> >> >>
> >> >> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> >> >> index cbebd64..efffa1f 100644
> >> >> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> >> >> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> >> >> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n
> >> >>  CONFIG_RTE_EAL_IGB_UIO=n
> >> >>
> >> >>  # fails to compile on ARM
> >> >> -CONFIG_RTE_LIBRTE_LPM=n
> >> >> -CONFIG_RTE_LIBRTE_TABLE=n
> >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
> >> >>  CONFIG_RTE_SCHED_VECTOR=n
> >> >>
> >> >>  # cannot use those on ARM
> >> >> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> >> index 504f3ed..57f7941 100644
> >> >> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> >> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
> >> >> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n
> >> >>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
> >> >>  CONFIG_RTE_LIBRTE_I40E_PMD=n
> >> >>
> >> >> -CONFIG_RTE_LIBRTE_LPM=n
> >> >> -CONFIG_RTE_LIBRTE_TABLE=n
> >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n
> >> >>  CONFIG_RTE_SCHED_VECTOR=n
> >> >> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> >> index a33c054..7437711 100644
> >> >> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> >> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> >> >> @@ -41,6 +41,8 @@ extern "C" {
> >> >>
> >> >>  typedef int32x4_t xmm_t;
> >> >>
> >> >> +typedef int32x4_t __m128i;
> >> >> +
> >> >>  #define      XMM_SIZE        (sizeof(xmm_t))
> >> >>  #define      XMM_MASK        (XMM_SIZE - 1)
> >> >>
> >> >> @@ -53,6 +55,32 @@ typedef union rte_xmm {
> >> >>       double   pd[XMM_SIZE / sizeof(double)];
> >> >>  } __attribute__((aligned(16))) rte_xmm_t;
> >> >>
> >> >> +static __inline __m128i
> >> >> +_mm_set_epi32(int i3, int i2, int i1, int i0)
> >> >> +{
> >> >> +     int32_t r[4] = {i0, i1, i2, i3};
> >> >> +
> >> >> +     return vld1q_s32(r);
> >> >> +}
> >> >> +
> >> >> +static __inline __m128i
> >> >> +_mm_loadu_si128(__m128i *p)
> >> >> +{
> >> >> +     return vld1q_s32((int32_t *)p);
> >> >> +}
> >> >> +
> >> >> +static __inline __m128i
> >> >> +_mm_set1_epi32(int i)
> >> >> +{
> >> >> +     return vdupq_n_s32(i);
> >> >> +}
> >> >> +
> >> >> +static __inline __m128i
> >> >> +_mm_and_si128(__m128i a, __m128i b)
> >> >> +{
> >> >> +     return vandq_s32(a, b);
> >> >> +}
> >> >> +
> >
> > IMO, it's not always good to emulate GCC defined intrinsics of
> > other architecture. What if a legacy DPDK application has such mappings
> > then BOOM, multiple definition, which one is correct? which one
> > to comment it out? Integration pain starts for DPDK library consumer:-(
> >
> They can include rte_vect.h in build/include directly, which is linked correctly
> to the one for that ARCH, so there is no need to worry about.
> 
> 
> >> >
> >> > IMO, it makes sense to not emulate the SSE intrinsics with NEON
> >> > Let's create the rte_vect_* as required. look at the existing patch.
> >> >
> >> I thought of creating a layer of SIMD over all the platforms before.
> >> But can't you see it make things complicated, considering there are
> >> only few simple intrinsic to implement?
> >
> > Not true, There were, a lot of SSE intrinsics needs be to emulated for ACL NEON
> > implementation if I were to take this approach and emulation comes with
> > the cost.
> >
> No, I will not re-implement all the intrinsic like that .
> I only do with the simple intrinsic, such as load/store, as you said below.
> 
> > So my take is,
> > lets the each architecture implementation for specific SIMD version of DPDK
> > API in the library should have the freedom to implement the API in
> > NATIVE.
> >
> > And let's create only rte_vect_* abstraction only for using
> > that API/library. Which boils down to have very minimal rte_vect_*
> > abstraction to load, store, set not beyond that.
> >
> > This makes clear "contract" between DPDK library and the applications.
> > and make easy for remaning new architecture  porting effort in DPDK.
> >
> Agree.
> But I reuse existing intrinsic names, and you recreate new ones.
> And I try to do as few changes as possible, and try to avoid any
> mistaken which may cause code un-compiled.
> I think it's design level question, we need to hear what others talk about it.
> 
> > Imagine how your proposed function will look like if new architecture
> > wants to implement "optimized" version of rte_lpm_lookupx4
> >
> There is no optimization for this (simple) rte_lpm_lookupx4, otherwise
> you have done that in your patch.
> If there is for other new platform, defintely they should do like
> yours, as you did for NEON ACL.
> 
> >
> >> If do so, we also need to explain to others how to use these interfaces.
> >> Besides, this patch did the smallest changes to the original code, and
> >> more likely to be accepted by others.
> >
> > other patch makes no changes to IA version of rte_lpm_lookupx4.I thought
> > that make reviewer easy to review the changes in architecture
> > perspective.
> >
> As I know, they don't enable LPM for PPC, and ARM is the first one to
> touch this issue.
> 
> >>
> >> >
> >> >>  #ifdef RTE_ARCH_ARM
> >> >>  /* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */
> >> >>  static __inline uint8x16_t
> >> >> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> >> >> index c299ce2..c76c07d 100644
> >> >> --- a/lib/librte_lpm/rte_lpm.h
> >> >> +++ b/lib/librte_lpm/rte_lpm.h
> >> >> @@ -361,6 +361,47 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips,
> >> >>  /* Mask four results. */
> >> >>  #define       RTE_LPM_MASKX4_RES     UINT64_C(0x00ff00ff00ff00ff)
> >> >>
> >> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> >> >
> >> > Separate out arm implementation to the different header file.
> >> > Too many ifdef looks odd in the header file and difficult to manage.
> >> >
> >> But there are many ifdefs already.
> >> And It seems unreasonable to add a new file only for one small function.
> >>
> >
> > small or big, its matter of each architecture to have
> > the freedom for the optimized version for the implementation.
> >
> > What if  other architecture demands to write this function in assembly
> > or restructure it for performance improvement?
> >
> If there is such demands, should do like that.
> But I don't see any restructure in your patch, and you still follow
> the logic as x86, is it worth adding a new file?
> 

My preference would also be to put architecture dependent implementation
into different files. 
Might be create lib/librte_lpm/arch/(arm|x86)/... here?
Konstantin  



More information about the dev mailing list