[RFC] eal/x86: disable array bounds checks in rte_memcpy_generic with gcc-12

Message ID 20220608224928.457440-1-stephen@networkplumber.org (mailing list archive)
State Rejected, archived
Delegated to: David Marchand
Headers
Series [RFC] eal/x86: disable array bounds checks in rte_memcpy_generic with gcc-12 |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Testing fail Testing issues

Commit Message

Stephen Hemminger June 8, 2022, 10:49 p.m. UTC
  Gcc 12 adds more array bounds checking (good); but it is not smart
enough to realize that for small fixed sizes, the bigger move options
are not used.

An example is using rte_memcpy() on a RSS key of 40 bytes may trigger
rte_memcpy complaints from rte_mov128 reading past end of input.

In order to keep some of the checks add special case for calls
to rte_memcpy() with fixed size arguments to use the compiler
builtin instead. Don't want to give all the checking for
code that uses rte_memcpy() everywhere.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/eal/x86/include/rte_memcpy.h | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)
  

Comments

Morten Brørup June 9, 2022, 7:26 a.m. UTC | #1
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, 9 June 2022 00.49
> 
> Gcc 12 adds more array bounds checking (good); but it is not smart
> enough to realize that for small fixed sizes, the bigger move options
> are not used.
> 
> An example is using rte_memcpy() on a RSS key of 40 bytes may trigger
> rte_memcpy complaints from rte_mov128 reading past end of input.
> 
> In order to keep some of the checks add special case for calls
> to rte_memcpy() with fixed size arguments to use the compiler
> builtin instead. Don't want to give all the checking for
> code that uses rte_memcpy() everywhere.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/eal/x86/include/rte_memcpy.h | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/eal/x86/include/rte_memcpy.h
> b/lib/eal/x86/include/rte_memcpy.h
> index 18aa4e43a743..b90cdd8d7326 100644
> --- a/lib/eal/x86/include/rte_memcpy.h
> +++ b/lib/eal/x86/include/rte_memcpy.h
> @@ -27,6 +27,10 @@ extern "C" {
>  #pragma GCC diagnostic ignored "-Wstringop-overflow"
>  #endif
> 
> +#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 120000)
> +#pragma GCC diagnostic ignored "-Warray-bounds"
> +#endif
> +
>  /**
>   * Copy bytes from one location to another. The locations must not
> overlap.
>   *
> @@ -842,19 +846,21 @@ rte_memcpy_aligned(void *dst, const void *src,
> size_t n)
>  	return ret;
>  }
> 
> +#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 100000)
> +#pragma GCC diagnostic pop
> +#endif
> +
>  static __rte_always_inline void *
>  rte_memcpy(void *dst, const void *src, size_t n)
>  {
> -	if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
> +	if (__builtin_constant_p(n))
> +		return __builtin_memcpy(dst, src, n);
> +	else if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
>  		return rte_memcpy_aligned(dst, src, n);
>  	else
>  		return rte_memcpy_generic(dst, src, n);
>  }
> 
> -#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 100000)
> -#pragma GCC diagnostic pop
> -#endif
> -
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 2.35.1
> 

Very good.

Reviewed-by: Morten Brørup <mb@smartsharesystems.com>

While you are at it, would you consider concealing the definition of ALIGNMENT_MASK too? It seems to be leaking out from this header file.
  
Konstantin Ananyev June 10, 2022, 12:08 a.m. UTC | #2
08/06/2022 23:49, Stephen Hemminger пишет:
> Gcc 12 adds more array bounds checking (good); but it is not smart
> enough to realize that for small fixed sizes, the bigger move options
> are not used.
> 
> An example is using rte_memcpy() on a RSS key of 40 bytes may trigger
> rte_memcpy complaints from rte_mov128 reading past end of input.
> 
> In order to keep some of the checks add special case for calls
> to rte_memcpy() with fixed size arguments to use the compiler
> builtin instead. Don't want to give all the checking for
> code that uses rte_memcpy() everywhere.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>   lib/eal/x86/include/rte_memcpy.h | 16 +++++++++++-----
>   1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h
> index 18aa4e43a743..b90cdd8d7326 100644
> --- a/lib/eal/x86/include/rte_memcpy.h
> +++ b/lib/eal/x86/include/rte_memcpy.h
> @@ -27,6 +27,10 @@ extern "C" {
>   #pragma GCC diagnostic ignored "-Wstringop-overflow"
>   #endif
>   
> +#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 120000)
> +#pragma GCC diagnostic ignored "-Warray-bounds"
> +#endif
> +
>   /**
>    * Copy bytes from one location to another. The locations must not overlap.
>    *
> @@ -842,19 +846,21 @@ rte_memcpy_aligned(void *dst, const void *src, size_t n)
>   	return ret;
>   }
>   
> +#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 100000)
> +#pragma GCC diagnostic pop
> +#endif
> +
>   static __rte_always_inline void *
>   rte_memcpy(void *dst, const void *src, size_t n)
>   {
> -	if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
> +	if (__builtin_constant_p(n))
> +		return __builtin_memcpy(dst, src, n);
> +	else if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
>   		return rte_memcpy_aligned(dst, src, n);
>   	else
>   		return rte_memcpy_generic(dst, src, n);
>   }
>   
> -#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 100000)
> -#pragma GCC diagnostic pop
> -#endif
> -
>   #ifdef __cplusplus
>   }
>   #endif

Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
  
Ferruh Yigit June 10, 2022, 10:12 a.m. UTC | #3
On 6/8/2022 11:49 PM, Stephen Hemminger wrote:
> Gcc 12 adds more array bounds checking (good); but it is not smart
> enough to realize that for small fixed sizes, the bigger move options
> are not used.
> 
> An example is using rte_memcpy() on a RSS key of 40 bytes may trigger
> rte_memcpy complaints from rte_mov128 reading past end of input.
> 
> In order to keep some of the checks add special case for calls
> to rte_memcpy() with fixed size arguments to use the compiler
> builtin instead. Don't want to give all the checking for
> code that uses rte_memcpy() everywhere.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>   lib/eal/x86/include/rte_memcpy.h | 16 +++++++++++-----
>   1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h
> index 18aa4e43a743..b90cdd8d7326 100644
> --- a/lib/eal/x86/include/rte_memcpy.h
> +++ b/lib/eal/x86/include/rte_memcpy.h
> @@ -27,6 +27,10 @@ extern "C" {
>   #pragma GCC diagnostic ignored "-Wstringop-overflow"
>   #endif
>   
> +#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 120000)
> +#pragma GCC diagnostic ignored "-Warray-bounds"
> +#endif
> +
>   /**
>    * Copy bytes from one location to another. The locations must not overlap.
>    *
> @@ -842,19 +846,21 @@ rte_memcpy_aligned(void *dst, const void *src, size_t n)
>   	return ret;
>   }
>   
> +#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 100000)
> +#pragma GCC diagnostic pop
> +#endif
> +
>   static __rte_always_inline void *
>   rte_memcpy(void *dst, const void *src, size_t n)
>   {
> -	if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
> +	if (__builtin_constant_p(n))
> +		return __builtin_memcpy(dst, src, n);
> +	else if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))

This patch does two things,

1. Disable "-Warray-bounds" with above pragma to silence compiler warnings.

2. Use compiler builtin for some cases.

Second can impact the performance and not really needed for the build 
error, what do you think to split the patch in two, since 1. is simple 
change but 2. may require more testing before accepting.
  
Morten Brørup June 10, 2022, 10:39 a.m. UTC | #4
> From: Ferruh Yigit [mailto:ferruh.yigit@xilinx.com]
> Sent: Friday, 10 June 2022 12.13
> 
> On 6/8/2022 11:49 PM, Stephen Hemminger wrote:
> > Gcc 12 adds more array bounds checking (good); but it is not smart
> > enough to realize that for small fixed sizes, the bigger move options
> > are not used.
> >
> > An example is using rte_memcpy() on a RSS key of 40 bytes may trigger
> > rte_memcpy complaints from rte_mov128 reading past end of input.
> >
> > In order to keep some of the checks add special case for calls
> > to rte_memcpy() with fixed size arguments to use the compiler
> > builtin instead. Don't want to give all the checking for
> > code that uses rte_memcpy() everywhere.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >   lib/eal/x86/include/rte_memcpy.h | 16 +++++++++++-----
> >   1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/eal/x86/include/rte_memcpy.h
> b/lib/eal/x86/include/rte_memcpy.h
> > index 18aa4e43a743..b90cdd8d7326 100644
> > --- a/lib/eal/x86/include/rte_memcpy.h
> > +++ b/lib/eal/x86/include/rte_memcpy.h
> > @@ -27,6 +27,10 @@ extern "C" {
> >   #pragma GCC diagnostic ignored "-Wstringop-overflow"
> >   #endif
> >
> > +#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 120000)
> > +#pragma GCC diagnostic ignored "-Warray-bounds"
> > +#endif
> > +
> >   /**
> >    * Copy bytes from one location to another. The locations must not
> overlap.
> >    *
> > @@ -842,19 +846,21 @@ rte_memcpy_aligned(void *dst, const void *src,
> size_t n)
> >   	return ret;
> >   }
> >
> > +#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 100000)
> > +#pragma GCC diagnostic pop
> > +#endif
> > +
> >   static __rte_always_inline void *
> >   rte_memcpy(void *dst, const void *src, size_t n)
> >   {
> > -	if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
> > +	if (__builtin_constant_p(n))
> > +		return __builtin_memcpy(dst, src, n);
> > +	else if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
> 
> This patch does two things,
> 
> 1. Disable "-Warray-bounds" with above pragma to silence compiler
> warnings.
> 
> 2. Use compiler builtin for some cases.
> 
> Second can impact the performance and not really needed for the build
> error, what do you think to split the patch in two, since 1. is simple
> change but 2. may require more testing before accepting.

Any such testing will be highly compiler dependent.

Do you have any specific compilers in mind, where you see a risk for lower performance?
  
Ferruh Yigit June 10, 2022, 12:06 p.m. UTC | #5
On 6/10/2022 11:39 AM, Morten Brørup wrote:
> CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
> 
> 
>> From: Ferruh Yigit [mailto:ferruh.yigit@xilinx.com]
>> Sent: Friday, 10 June 2022 12.13
>>
>> On 6/8/2022 11:49 PM, Stephen Hemminger wrote:
>>> Gcc 12 adds more array bounds checking (good); but it is not smart
>>> enough to realize that for small fixed sizes, the bigger move options
>>> are not used.
>>>
>>> An example is using rte_memcpy() on a RSS key of 40 bytes may trigger
>>> rte_memcpy complaints from rte_mov128 reading past end of input.
>>>
>>> In order to keep some of the checks add special case for calls
>>> to rte_memcpy() with fixed size arguments to use the compiler
>>> builtin instead. Don't want to give all the checking for
>>> code that uses rte_memcpy() everywhere.
>>>
>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>> ---
>>>    lib/eal/x86/include/rte_memcpy.h | 16 +++++++++++-----
>>>    1 file changed, 11 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/lib/eal/x86/include/rte_memcpy.h
>> b/lib/eal/x86/include/rte_memcpy.h
>>> index 18aa4e43a743..b90cdd8d7326 100644
>>> --- a/lib/eal/x86/include/rte_memcpy.h
>>> +++ b/lib/eal/x86/include/rte_memcpy.h
>>> @@ -27,6 +27,10 @@ extern "C" {
>>>    #pragma GCC diagnostic ignored "-Wstringop-overflow"
>>>    #endif
>>>
>>> +#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 120000)
>>> +#pragma GCC diagnostic ignored "-Warray-bounds"
>>> +#endif
>>> +
>>>    /**
>>>     * Copy bytes from one location to another. The locations must not
>> overlap.
>>>     *
>>> @@ -842,19 +846,21 @@ rte_memcpy_aligned(void *dst, const void *src,
>> size_t n)
>>>      return ret;
>>>    }
>>>
>>> +#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 100000)
>>> +#pragma GCC diagnostic pop
>>> +#endif
>>> +
>>>    static __rte_always_inline void *
>>>    rte_memcpy(void *dst, const void *src, size_t n)
>>>    {
>>> -   if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
>>> +   if (__builtin_constant_p(n))
>>> +           return __builtin_memcpy(dst, src, n);
>>> +   else if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
>>
>> This patch does two things,
>>
>> 1. Disable "-Warray-bounds" with above pragma to silence compiler
>> warnings.
>>
>> 2. Use compiler builtin for some cases.
>>
>> Second can impact the performance and not really needed for the build
>> error, what do you think to split the patch in two, since 1. is simple
>> change but 2. may require more testing before accepting.
> 
> Any such testing will be highly compiler dependent.
> 
> Do you have any specific compilers in mind, where you see a risk for lower performance?
> 

Hi Morten,

My point is possible performance impact, not about any possible risk or 
specific compiler version.
The possible performance impact part can be separated to its own patch 
and these can be discussed there, independent from gcc12 build error.
  

Patch

diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h
index 18aa4e43a743..b90cdd8d7326 100644
--- a/lib/eal/x86/include/rte_memcpy.h
+++ b/lib/eal/x86/include/rte_memcpy.h
@@ -27,6 +27,10 @@  extern "C" {
 #pragma GCC diagnostic ignored "-Wstringop-overflow"
 #endif
 
+#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 120000)
+#pragma GCC diagnostic ignored "-Warray-bounds"
+#endif
+
 /**
  * Copy bytes from one location to another. The locations must not overlap.
  *
@@ -842,19 +846,21 @@  rte_memcpy_aligned(void *dst, const void *src, size_t n)
 	return ret;
 }
 
+#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 100000)
+#pragma GCC diagnostic pop
+#endif
+
 static __rte_always_inline void *
 rte_memcpy(void *dst, const void *src, size_t n)
 {
-	if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
+	if (__builtin_constant_p(n))
+		return __builtin_memcpy(dst, src, n);
+	else if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
 		return rte_memcpy_aligned(dst, src, n);
 	else
 		return rte_memcpy_generic(dst, src, n);
 }
 
-#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 100000)
-#pragma GCC diagnostic pop
-#endif
-
 #ifdef __cplusplus
 }
 #endif