mk: using initial-exec model for thread local variable

Message ID 20180705141321.129989-1-yong.liu@intel.com (mailing list archive)
State Rejected, archived
Delegated to: Thomas Monjalon
Headers
Series mk: using initial-exec model for thread local variable |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Marvin Liu July 5, 2018, 2:13 p.m. UTC
  When building share library, thread-local storage model will be changed
to global-dynamic. It will add additional cost for reading thread local
variable. On the other hand, dynamically load share library with static
TLS will request additional DTV slot which is limited by loader. By now
only librte_pmd_eal.so contain thread local variable. So that can make
TLS model back to initial-exec like static library for better
performance.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
  

Comments

Thomas Monjalon July 5, 2018, 9:25 a.m. UTC | #1
05/07/2018 16:13, Marvin Liu:
> When building share library, thread-local storage model will be changed
> to global-dynamic. It will add additional cost for reading thread local
> variable. On the other hand, dynamically load share library with static
> TLS will request additional DTV slot which is limited by loader. By now
> only librte_pmd_eal.so contain thread local variable. So that can make
> TLS model back to initial-exec like static library for better
> performance.
> 
> Signed-off-by: Marvin Liu <yong.liu@intel.com>
> 
> diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> index 7e4531bab..19d5e11ef 100644
> --- a/mk/toolchain/gcc/rte.vars.mk
> +++ b/mk/toolchain/gcc/rte.vars.mk

It is only for GCC? not clang?

> +# Initial execution TLS model has better performane compared to dynamic
> +# global. But this model require for addtional slot on DTV when dlopen
> +# object with thread local variable.

Few typos in this comment.

> +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> +TOOLCHAIN_CFLAGS += -ftls-model=initial-exec
> +endif

We really need more test or review of this patch.

Cc techboard: do we take the risk of getting it in RC1
without review? It is waiting for long.
  
Sachin Saxena July 5, 2018, 2:46 p.m. UTC | #2
> 
> When building share library, thread-local storage model will be changed to
> global-dynamic. It will add additional cost for reading thread local variable.
> On the other hand, dynamically load share library with static TLS will request
> additional DTV slot which is limited by loader. By now only librte_pmd_eal.so
> contain thread local variable. So that can make TLS model back to initial-exec
> like static library for better performance.
> 
> Signed-off-by: Marvin Liu <yong.liu@intel.com>
> 
> diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> index 7e4531bab..19d5e11ef 100644
> --- a/mk/toolchain/gcc/rte.vars.mk
> +++ b/mk/toolchain/gcc/rte.vars.mk
> @@ -43,6 +43,13 @@ ifeq (,$(findstring -O0,$(EXTRA_CFLAGS)))  endif  endif
> 
> +# Initial execution TLS model has better performane compared to dynamic
> +# global. But this model require for addtional slot on DTV when dlopen
> +# object with thread local variable.
> +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> +TOOLCHAIN_CFLAGS += -ftls-model=initial-exec endif
> +

[Sachin Saxena]   Using initial-exec model for shared object is not recommended. If you link a shared object containing IE-model, the object will have the DF_STATIC_TLS flag set. By the spec, this means that dlopen() might refuse to load it if TLS usage is greater than static TLS space.
This is what happening, when I tried to validate this change on ARM64 based NXP platform with VPP-dpdk solution. VPP initialization fails with following error:
  "load_one_plugin:145: /usr/lib/vpp_plugins/dpdk_plugin.so: cannot allocate memory in static TLS block"

Note that dpdk dpaa2 driver and VPP both uses TLS variables quite significantly. When forced to Initial-exec model in dpdk shared object, VPP static TLS space is getting exhausted and dlopen() returns error while trying to load the DPDK object.
For same reason, when we use "-fPIC" the default TLS model changed to "global-dynamics" from "Initial-exec".

In my opinion, this change should not be merged as it is breaking basic functionality.

>  WERROR_FLAGS := -W -Wall -Wstrict-prototypes -Wmissing-prototypes
> WERROR_FLAGS += -Wmissing-declarations -Wold-style-definition -Wpointer-
> arith  WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual
> --
> 2.17.0
  
Marvin Liu July 6, 2018, 2:22 a.m. UTC | #3
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Sachin Saxena
> Sent: Thursday, July 05, 2018 10:46 PM
> To: Liu, Yong <yong.liu@intel.com>; Yang, Zhiyong <zhiyong.yang@intel.com>;
> thomas@monjalon.net; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] mk: using initial-exec model for thread
> local variable
> 
> 
> 
> >
> > When building share library, thread-local storage model will be changed
> to
> > global-dynamic. It will add additional cost for reading thread local
> variable.
> > On the other hand, dynamically load share library with static TLS will
> request
> > additional DTV slot which is limited by loader. By now only
> librte_pmd_eal.so
> > contain thread local variable. So that can make TLS model back to
> initial-exec
> > like static library for better performance.
> >
> > Signed-off-by: Marvin Liu <yong.liu@intel.com>
> >
> > diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> > index 7e4531bab..19d5e11ef 100644
> > --- a/mk/toolchain/gcc/rte.vars.mk
> > +++ b/mk/toolchain/gcc/rte.vars.mk
> > @@ -43,6 +43,13 @@ ifeq (,$(findstring -O0,$(EXTRA_CFLAGS)))  endif
> endif
> >
> > +# Initial execution TLS model has better performane compared to dynamic
> > +# global. But this model require for addtional slot on DTV when dlopen
> > +# object with thread local variable.
> > +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> > +TOOLCHAIN_CFLAGS += -ftls-model=initial-exec endif
> > +
> 
> [Sachin Saxena]   Using initial-exec model for shared object is not
> recommended. If you link a shared object containing IE-model, the object
> will have the DF_STATIC_TLS flag set. By the spec, this means that dlopen()
> might refuse to load it if TLS usage is greater than static TLS space.
> This is what happening, when I tried to validate this change on ARM64
> based NXP platform with VPP-dpdk solution. VPP initialization fails with
> following error:
>   "load_one_plugin:145: /usr/lib/vpp_plugins/dpdk_plugin.so: cannot
> allocate memory in static TLS block"
> 
> Note that dpdk dpaa2 driver and VPP both uses TLS variables quite
> significantly. When forced to Initial-exec model in dpdk shared object,
> VPP static TLS space is getting exhausted and dlopen() returns error while
> trying to load the DPDK object.
> For same reason, when we use "-fPIC" the default TLS model changed to
> "global-dynamics" from "Initial-exec".
> 
> In my opinion, this change should not be merged as it is breaking basic
> functionality.

Thanks for your opinion, Sachin. 
IE model may cause problem when using dlopen open share object. On the other hand, it can benefit performance.
It will be better to keep current workable setting and users may change it by themselves.

Regards,
Marvin

> 
> >  WERROR_FLAGS := -W -Wall -Wstrict-prototypes -Wmissing-prototypes
> > WERROR_FLAGS += -Wmissing-declarations -Wold-style-definition -Wpointer-
> > arith  WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual
> > --
> > 2.17.0
  
Bruce Richardson July 6, 2018, 10:02 a.m. UTC | #4
On Fri, Jul 06, 2018 at 02:22:14AM +0000, Liu, Yong wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Sachin Saxena
> > Sent: Thursday, July 05, 2018 10:46 PM
> > To: Liu, Yong <yong.liu@intel.com>; Yang, Zhiyong <zhiyong.yang@intel.com>;
> > thomas@monjalon.net; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] mk: using initial-exec model for thread
> > local variable
> > 
> > 
> > 
> > >
> > > When building share library, thread-local storage model will be changed
> > to
> > > global-dynamic. It will add additional cost for reading thread local
> > variable.
> > > On the other hand, dynamically load share library with static TLS will
> > request
> > > additional DTV slot which is limited by loader. By now only
> > librte_pmd_eal.so
> > > contain thread local variable. So that can make TLS model back to
> > initial-exec
> > > like static library for better performance.
> > >
> > > Signed-off-by: Marvin Liu <yong.liu@intel.com>
> > >
> > > diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> > > index 7e4531bab..19d5e11ef 100644
> > > --- a/mk/toolchain/gcc/rte.vars.mk
> > > +++ b/mk/toolchain/gcc/rte.vars.mk
> > > @@ -43,6 +43,13 @@ ifeq (,$(findstring -O0,$(EXTRA_CFLAGS)))  endif
> > endif
> > >
> > > +# Initial execution TLS model has better performane compared to dynamic
> > > +# global. But this model require for addtional slot on DTV when dlopen
> > > +# object with thread local variable.
> > > +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> > > +TOOLCHAIN_CFLAGS += -ftls-model=initial-exec endif
> > > +
> > 
> > [Sachin Saxena]   Using initial-exec model for shared object is not
> > recommended. If you link a shared object containing IE-model, the object
> > will have the DF_STATIC_TLS flag set. By the spec, this means that dlopen()
> > might refuse to load it if TLS usage is greater than static TLS space.
> > This is what happening, when I tried to validate this change on ARM64
> > based NXP platform with VPP-dpdk solution. VPP initialization fails with
> > following error:
> >   "load_one_plugin:145: /usr/lib/vpp_plugins/dpdk_plugin.so: cannot
> > allocate memory in static TLS block"
> > 
> > Note that dpdk dpaa2 driver and VPP both uses TLS variables quite
> > significantly. When forced to Initial-exec model in dpdk shared object,
> > VPP static TLS space is getting exhausted and dlopen() returns error while
> > trying to load the DPDK object.
> > For same reason, when we use "-fPIC" the default TLS model changed to
> > "global-dynamics" from "Initial-exec".
> > 
> > In my opinion, this change should not be merged as it is breaking basic
> > functionality.
> 
> Thanks for your opinion, Sachin. 
> IE model may cause problem when using dlopen open share object. On the other hand, it can benefit performance.
> It will be better to keep current workable setting and users may change it by themselves.
> 
What is the performance delta, and where is it most seen? I suggest for
future patches like this, that the commit message itself should give a
rough/approx indication of the perf impacts.

/Bruce
  

Patch

diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
index 7e4531bab..19d5e11ef 100644
--- a/mk/toolchain/gcc/rte.vars.mk
+++ b/mk/toolchain/gcc/rte.vars.mk
@@ -43,6 +43,13 @@  ifeq (,$(findstring -O0,$(EXTRA_CFLAGS)))
 endif
 endif
 
+# Initial execution TLS model has better performane compared to dynamic
+# global. But this model require for addtional slot on DTV when dlopen
+# object with thread local variable.
+ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
+TOOLCHAIN_CFLAGS += -ftls-model=initial-exec
+endif
+
 WERROR_FLAGS := -W -Wall -Wstrict-prototypes -Wmissing-prototypes
 WERROR_FLAGS += -Wmissing-declarations -Wold-style-definition -Wpointer-arith
 WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual