[dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on IA platform

Ananyev, Konstantin konstantin.ananyev at intel.com
Thu Dec 8 10:26:17 CET 2016


Hi Zhiyong,

> 
> HI, Thomas:
> 	Sorry for late reply. I have been being always considering your suggestion.
> 
> > -----Original Message-----
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Sent: Friday, December 2, 2016 6:25 PM
> > To: Yang, Zhiyong <zhiyong.yang at intel.com>
> > Cc: dev at dpdk.org; yuanhan.liu at linux.intel.com; Richardson, Bruce
> > <bruce.richardson at intel.com>; Ananyev, Konstantin
> > <konstantin.ananyev at intel.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch at intel.com>
> > Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on
> > IA platform
> >
> > 2016-12-05 16:26, Zhiyong Yang:
> > > +#ifndef _RTE_MEMSET_X86_64_H_
> >
> > Is this implementation specific to 64-bit?
> >
> 
> Yes.
> 
> > > +
> > > +#define rte_memset memset
> > > +
> > > +#else
> > > +
> > > +static void *
> > > +rte_memset(void *dst, int a, size_t n);
> > > +
> > > +#endif
> >
> > If I understand well, rte_memset (as rte_memcpy) is using the most recent
> > instructions available (and enabled) when compiling.
> > It is not adapting the instructions to the run-time CPU.
> > There is no need to downgrade at run-time the instruction set as it is
> > obviously not a supported case, but it would be nice to be able to upgrade a
> > "default compilation" at run-time as it is done in rte_acl.
> > I explain this case more clearly for reference:
> >
> > We can have AVX512 supported in the compiler but disable it when compiling
> > (CONFIG_RTE_MACHINE=snb) in order to build a binary running almost
> > everywhere.
> > When running this binary on a CPU having AVX512 support, it will not benefit
> > of the AVX512 improvement.
> > Though, we can compile an AVX512 version of some functions and use them
> > only if the running CPU is capable.
> > This kind of miracle can be achieved in two ways:
> >
> > 1/ For generic C code compiled with a recent GCC, a function can be built for
> > several CPUs thanks to the attribute target_clones.
> >
> > 2/ For manually optimized functions using CPU-specific intrinsics or asm, it is
> > possible to build them with non-default flags thanks to the attribute target.
> >
> > 3/ For manually optimized files using CPU-specific intrinsics or asm, we use
> > specifics flags in the makefile.
> >
> > The function clone in case 1/ is dynamically chosen at run-time through ifunc
> > resolver.
> > The specific functions in cases 2/ and 3/ must chosen at run-time by
> > initializing a function pointer thanks to rte_cpu_get_flag_enabled().
> >
> > Note that rte_hash and software crypto PMDs have a run-time check with
> > rte_cpu_get_flag_enabled() but do not override CFLAGS in the Makefile.
> > Next step for these libraries?
> >
> > Back to rte_memset, I think you should try the solution 2/.
> 
> I have read the ACL code, if I understand well , for complex algo implementation,
> it is good idea, but Choosing functions at run time will bring some overhead. For frequently  called function
> Which consumes small cycles, the overhead maybe is more than  the gains optimizations brings
> For example, for most applications in dpdk, memset only set N = 10 or 12bytes. It consumes fewer cycles.

But then what the point to have an rte_memset() using vector instructions at all?
>From what you are saying the most common case is even less then SSE register size.
Konstantin

> 
> Thanks
> Zhiyong


More information about the dev mailing list