[v2] examples/l3fwd: fix unaligned memory access

Message ID 20190725162903.106262-1-hariprasad.govindharajan@intel.com (mailing list archive)
State Superseded, archived
Headers
Series [v2] examples/l3fwd: fix unaligned memory access |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Hariprasad Govindharajan July 25, 2019, 4:29 p.m. UTC
  Fix unaligned memory access when reading IPv6 header which
leads to segmentation fault by changing aligned memory read
to unaligned memory read.

Bugzilla ID: 279
Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")
Cc: maciej.czekaj@caviumnetworks.com
Cc: stable@dpdk.org
Signed-off-by: hgovindh <hariprasad.govindharajan@intel.com>
---
V2: Added functions which will do unaligned load based on the
underlying architecture
---
---
 examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)
  

Comments

Bruce Richardson July 25, 2019, 4:46 p.m. UTC | #1
On Thu, Jul 25, 2019 at 05:29:03PM +0100, hgovindh wrote:
> Fix unaligned memory access when reading IPv6 header which
> leads to segmentation fault by changing aligned memory read
> to unaligned memory read.
> 
> Bugzilla ID: 279
> Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")
> Cc: maciej.czekaj@caviumnetworks.com
> Cc: stable@dpdk.org
> Signed-off-by: hgovindh <hariprasad.govindharajan@intel.com>
> ---
> V2: Added functions which will do unaligned load based on the
> underlying architecture
> ---
> ---
>  examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
> index fa8f82be6..f2641586b 100644
> --- a/examples/l3fwd/l3fwd_em.c
> +++ b/examples/l3fwd/l3fwd_em.c
> @@ -244,6 +244,29 @@ em_mask_key(void *key, xmm_t mask)
>  #error No vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
>  #endif
>  
> +#if defined(RTE_MACHINE_CPUFLAG_SSE2)
> +static inline xmm_t
> +em_load_key(void *key)
> +{
> +	return _mm_loadu_si128((__m128i *)(key));
> +}
> +#elif defined(RTE_MACHINE_CPUFLAG_NEON)
> +static inline xmm_t
> +em_load_key(void *key)
> +{
> +	return vld1q_s32((int32_t *)key);
> +}
> +#elif defined(RTE_MACHINE_CPUFLAG_ALTIVEC)
> +static inline xmm_t
> +em_load_key(void *key)
> +{
> +	return vec_ld(0, (xmm_t *)(key));
> +}

Two minor nits:

Since you are passing in a void *, no typecasts should be needed in any of
these functions.

Also, is it neater if you just have the ifdefs in the middle of the
function, rather than duplicating the function prototype each time? Third
option is to make the load a single-line macro rather than 5-lines of a
function.

/Bruce
  
Jerin Jacob Kollanukkaran July 25, 2019, 5:14 p.m. UTC | #2
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Bruce Richardson
> Sent: Thursday, July 25, 2019 10:16 PM
> To: hgovindh <hariprasad.govindharajan@intel.com>
> Cc: Remy Horton <remy.horton@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Ori Kam <orika@mellanox.com>; Pablo de
> Lara <pablo.de.lara.guarch@intel.com>; Radu Nicolau
> <radu.nicolau@intel.com>; Akhil Goyal <akhil.goyal@nxp.com>; Tomasz
> Kantecki <tomasz.kantecki@intel.com>; dev@dpdk.org;
> maciej.czekaj@caviumnetworks.com; stable@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] examples/l3fwd: fix unaligned memory
> access
> 
> On Thu, Jul 25, 2019 at 05:29:03PM +0100, hgovindh wrote:
> > Fix unaligned memory access when reading IPv6 header which leads to
> > segmentation fault by changing aligned memory read to unaligned memory
> > read.
> >
> > Bugzilla ID: 279
> > Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")
> > Cc: maciej.czekaj@caviumnetworks.com
> > Cc: stable@dpdk.org
> > Signed-off-by: hgovindh <hariprasad.govindharajan@intel.com>
> > ---
> > V2: Added functions which will do unaligned load based on the
> > underlying architecture
> > ---
> > ---
> >  examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++--
> >  1 file changed, 24 insertions(+), 2 deletions(-)
> >
> > diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
> > index fa8f82be6..f2641586b 100644
> > --- a/examples/l3fwd/l3fwd_em.c
> > +++ b/examples/l3fwd/l3fwd_em.c
> > @@ -244,6 +244,29 @@ em_mask_key(void *key, xmm_t mask)  #error No
> > vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
> > #endif
> >
> > +#if defined(RTE_MACHINE_CPUFLAG_SSE2) static inline xmm_t
> > +em_load_key(void *key) {
> > +	return _mm_loadu_si128((__m128i *)(key)); } #elif
> > +defined(RTE_MACHINE_CPUFLAG_NEON)
> > +static inline xmm_t
> > +em_load_key(void *key)
> > +{
> > +	return vld1q_s32((int32_t *)key);
> > +}
> > +#elif defined(RTE_MACHINE_CPUFLAG_ALTIVEC)
> > +static inline xmm_t
> > +em_load_key(void *key)
> > +{
> > +	return vec_ld(0, (xmm_t *)(key));
> > +}

Added power pc maintainer

Not sure all architecture need SIMD instructions for access to unaligned memory location.

@hgovindh,
Could you provide exact setup details for reproducing this issue, I can test it on arm64.
Like l3fwd command, Traffic generator traffic pattern
  
David Christensen July 25, 2019, 6:56 p.m. UTC | #3
>> On Thu, Jul 25, 2019 at 05:29:03PM +0100, hgovindh wrote:
>>> Fix unaligned memory access when reading IPv6 header which leads to
>>> segmentation fault by changing aligned memory read to unaligned memory
>>> read.
>>>
>>> Bugzilla ID: 279
>>> Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")
>>> Cc: maciej.czekaj@caviumnetworks.com
>>> Cc: stable@dpdk.org
>>> Signed-off-by: hgovindh <hariprasad.govindharajan@intel.com>
>>> ---
>>> V2: Added functions which will do unaligned load based on the
>>> underlying architecture
>>> ---
>>> ---
>>>   examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++--
>>>   1 file changed, 24 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
>>> index fa8f82be6..f2641586b 100644
>>> --- a/examples/l3fwd/l3fwd_em.c
>>> +++ b/examples/l3fwd/l3fwd_em.c
>>> @@ -244,6 +244,29 @@ em_mask_key(void *key, xmm_t mask)  #error No
>>> vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
>>> #endif
>>>
>>> +#if defined(RTE_MACHINE_CPUFLAG_SSE2) static inline xmm_t
>>> +em_load_key(void *key) {
>>> +	return _mm_loadu_si128((__m128i *)(key)); } #elif
>>> +defined(RTE_MACHINE_CPUFLAG_NEON)
>>> +static inline xmm_t
>>> +em_load_key(void *key)
>>> +{
>>> +	return vld1q_s32((int32_t *)key);
>>> +}
>>> +#elif defined(RTE_MACHINE_CPUFLAG_ALTIVEC)
>>> +static inline xmm_t
>>> +em_load_key(void *key)
>>> +{
>>> +	return vec_ld(0, (xmm_t *)(key));
>>> +}
> 
> Added power pc maintainer

> Not sure all architecture need SIMD instructions for access to unaligned memory location.
> 
> @hgovindh,
> Could you provide exact setup details for reproducing this issue, I can test it on arm64.
> Like l3fwd command, Traffic generator traffic pattern

The vec_ld() function requires 16 byte alignment.  (My understanding is 
that GCC code will mask the lower four bits of the address to enforce 
the requirement: 
https://gcc.gcc.gnu.narkive.com/cJndcMpR/vec-ld-versus-vec-vsx-ld-on-power8) 
  Power 8 and later processors support the vec_vsx_ld() function which 
does not have the same memory alignment requirements.

I'll need to try and reproduce the original bug to see what code is 
actually being generated.  Outside of vector instructions I wouldn't 
expect to see errors with unaligned data references.

Dave
  
David Christensen July 25, 2019, 10:06 p.m. UTC | #4
>>>> Fix unaligned memory access when reading IPv6 header which leads to
>>>> segmentation fault by changing aligned memory read to unaligned memory
>>>> read.
>>>>
>>>> Bugzilla ID: 279
>>>> Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")
>>>> Cc: maciej.czekaj@caviumnetworks.com
>>>> Cc: stable@dpdk.org
>>>> Signed-off-by: hgovindh <hariprasad.govindharajan@intel.com>
>>>> ---
>>>> V2: Added functions which will do unaligned load based on the
>>>> underlying architecture
>>>> ---
>>>> ---
>>>>   examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++--
>>>>   1 file changed, 24 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
>>>> index fa8f82be6..f2641586b 100644
>>>> --- a/examples/l3fwd/l3fwd_em.c
>>>> +++ b/examples/l3fwd/l3fwd_em.c
>>>> @@ -244,6 +244,29 @@ em_mask_key(void *key, xmm_t mask)  #error No
>>>> vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
>>>> #endif
>>>>
>>>> +#if defined(RTE_MACHINE_CPUFLAG_SSE2) static inline xmm_t
>>>> +em_load_key(void *key) {
>>>> +    return _mm_loadu_si128((__m128i *)(key)); } #elif
>>>> +defined(RTE_MACHINE_CPUFLAG_NEON)
>>>> +static inline xmm_t
>>>> +em_load_key(void *key)
>>>> +{
>>>> +    return vld1q_s32((int32_t *)key);
>>>> +}
>>>> +#elif defined(RTE_MACHINE_CPUFLAG_ALTIVEC)
>>>> +static inline xmm_t
>>>> +em_load_key(void *key)
>>>> +{
>>>> +    return vec_ld(0, (xmm_t *)(key));
>>>> +}
>>
>> Added power pc maintainer
> 
>> Not sure all architecture need SIMD instructions for access to 
>> unaligned memory location.
>>
>> @hgovindh,
>> Could you provide exact setup details for reproducing this issue, I 
>> can test it on arm64.
>> Like l3fwd command, Traffic generator traffic pattern
> 
> The vec_ld() function requires 16 byte alignment.  (My understanding is 
> that GCC code will mask the lower four bits of the address to enforce 
> the requirement: 
> https://gcc.gcc.gnu.narkive.com/cJndcMpR/vec-ld-versus-vec-vsx-ld-on-power8) 
>   Power 8 and later processors support the vec_vsx_ld() function which 
> does not have the same memory alignment requirements.
> 
> I'll need to try and reproduce the original bug to see what code is 
> actually being generated.  Outside of vector instructions I wouldn't 
> expect to see errors with unaligned data references.

Tested original bugzilla 279 on Power 9 system with RHEL 7.6 and gcc 
4.8.5, no segmentation fault observed after 30 minutes (observed 
segmentation fault on Intel system immediately).

Code dissassembly:
(gdb) info line l3fwd_em.c:290
Line 290 of "/home/davec/src/dpdk/examples/l3fwd/l3fwd_em.c" starts at 
address 0x10146fbc <em_main_loop+1660>
    and ends at 0x10146fc0 <em_main_loop+1664>.
(gdb) disass /m 0x10146fbc,0x10146fc0
Dump of assembler code from 0x10146fbc to 0x10146fc0:
290		key.xmm[1] = *(xmm_t *)data1;
    0x0000000010146fbc <em_main_loop+1660>:	li      r7,20

End of assembler dump.

Since vector element ordering is different on Intel vs Power/ARM, 
suggest only applying vector operation to Intel code at this time 
otherwise additional steps may be required to modify MASK values to 
match the new vector operations.

Dave
  
Jerin Jacob Kollanukkaran July 26, 2019, 10:58 a.m. UTC | #5
> -----Original Message-----
> From: David Christensen <drc@linux.vnet.ibm.com>
> Sent: Friday, July 26, 2019 3:36 AM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Bruce Richardson
> <bruce.richardson@intel.com>; hgovindh
> <hariprasad.govindharajan@intel.com>
> Cc: Remy Horton <remy.horton@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Ori Kam <orika@mellanox.com>; Pablo de
> Lara <pablo.de.lara.guarch@intel.com>; Radu Nicolau
> <radu.nicolau@intel.com>; Akhil Goyal <akhil.goyal@nxp.com>; Tomasz
> Kantecki <tomasz.kantecki@intel.com>; dev@dpdk.org;
> maciej.czekaj@caviumnetworks.com; stable@dpdk.org; Gavin Hu
> <gavin.hu@arm.com>
> Subject: [EXT] Re: [dpdk-dev] [PATCH v2] examples/l3fwd: fix unaligned
> memory access
> 
> 
>>>> Fix unaligned memory access when reading IPv6 header which leads to
> >>>> segmentation fault by changing aligned memory read to unaligned
> >>>> memory read.
> >>>>
> >>>> Bugzilla ID: 279
> >>>> Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")
> >>>> Cc: maciej.czekaj@caviumnetworks.com
> >>>> Cc: stable@dpdk.org
> >>>> Signed-off-by: hgovindh <hariprasad.govindharajan@intel.com>
> >>>> ---
> >>>> V2: Added functions which will do unaligned load based on the
> >>>> underlying architecture
> >>>> ---
> >>>> ---
> >>>>   examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++--
> >>>>   1 file changed, 24 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/examples/l3fwd/l3fwd_em.c
> b/examples/l3fwd/l3fwd_em.c
> >>>> index fa8f82be6..f2641586b 100644
> >>>> --- a/examples/l3fwd/l3fwd_em.c
> >>>> +++ b/examples/l3fwd/l3fwd_em.c
> >>>> @@ -244,6 +244,29 @@ em_mask_key(void *key, xmm_t
> mask)  #error No
> >>>> vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
> >>>> #endif
> >>>>
> >>>> +#if defined(RTE_MACHINE_CPUFLAG_SSE2) static inline xmm_t
> >>>> +em_load_key(void *key) {
> >>>> +    return _mm_loadu_si128((__m128i *)(key)); } #elif
> >>>> +defined(RTE_MACHINE_CPUFLAG_NEON)
> >>>> +static inline xmm_t
> >>>> +em_load_key(void *key)
> >>>> +{
> >>>> +    return vld1q_s32((int32_t *)key); } #elif
> >>>> +defined(RTE_MACHINE_CPUFLAG_ALTIVEC)
> >>>> +static inline xmm_t
> >>>> +em_load_key(void *key)
> >>>> +{
> >>>> +    return vec_ld(0, (xmm_t *)(key)); }
> >>
> >> Added power pc maintainer
> >
> >> Not sure all architecture need SIMD instructions for access to
> >> unaligned memory location.
> >>
> >> @hgovindh,
> >> Could you provide exact setup details for reproducing this issue, I
> >> can test it on arm64.
> >> Like l3fwd command, Traffic generator traffic pattern
> >
> > The vec_ld() function requires 16 byte alignment.  (My understanding
> > is that GCC code will mask the lower four bits of the address to
> > enforce the requirement:
> > https://gcc.gcc.gnu.narkive.com/cJndcMpR/vec-ld-versus-vec-vsx-ld-on-p
> > ower8)
> >   Power 8 and later processors support the vec_vsx_ld() function which
> > does not have the same memory alignment requirements.
> >
> > I'll need to try and reproduce the original bug to see what code is
> > actually being generated.  Outside of vector instructions I wouldn't
> > expect to see errors with unaligned data references.
> 
> Tested original bugzilla 279 on Power 9 system with RHEL 7.6 and gcc 4.8.5, no
> segmentation fault observed after 30 minutes (observed segmentation fault
> on Intel system immediately).
> 
> Code dissassembly:
> (gdb) info line l3fwd_em.c:290
> Line 290 of "/home/davec/src/dpdk/examples/l3fwd/l3fwd_em.c" starts at
> address 0x10146fbc <em_main_loop+1660>
>     and ends at 0x10146fc0 <em_main_loop+1664>.
> (gdb) disass /m 0x10146fbc,0x10146fc0
> Dump of assembler code from 0x10146fbc to 0x10146fc0:
> 290		key.xmm[1] = *(xmm_t *)data1;
>     0x0000000010146fbc <em_main_loop+1660>:	li      r7,20
> 
> End of assembler dump.
> 
> Since vector element ordering is different on Intel vs Power/ARM, suggest
> only applying vector operation to Intel code at this time otherwise additional
> steps may be required to modify MASK values to match the new vector
> operations.

On arm64, Generated assembly is following. Where LDUR and STR works
With unaligned memory(i.e no need for special handling).
I would suggest to have eal function to abstract The difference between x86 vs Power/ARM
to avoid ifdef clutter in all the applications.

             key.xmm[1] = *(xmm_t *)data1;
   0x00000000004ebed4 <+1188>:  60 40 c1 3c     ldur    q0, [x3, #20]
   0x00000000004ebedc <+1196>:  a0 73 80 3d     str     q0, [x29, #448]
   0x00000000004ec064 <+1588>:  41 40 c1 3c     ldur    q1, [x2, #20]
   0x00000000004ec06c <+1596>:  a1 73 80 3d     str     q1, [x29, #448]

> 
> Dave
  

Patch

diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
index fa8f82be6..f2641586b 100644
--- a/examples/l3fwd/l3fwd_em.c
+++ b/examples/l3fwd/l3fwd_em.c
@@ -244,6 +244,29 @@  em_mask_key(void *key, xmm_t mask)
 #error No vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
 #endif
 
+#if defined(RTE_MACHINE_CPUFLAG_SSE2)
+static inline xmm_t
+em_load_key(void *key)
+{
+	return _mm_loadu_si128((__m128i *)(key));
+}
+#elif defined(RTE_MACHINE_CPUFLAG_NEON)
+static inline xmm_t
+em_load_key(void *key)
+{
+	return vld1q_s32((int32_t *)key);
+}
+#elif defined(RTE_MACHINE_CPUFLAG_ALTIVEC)
+static inline xmm_t
+em_load_key(void *key)
+{
+	return vec_ld(0, (xmm_t *)(key));
+}
+#else
+#error No vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
+#endif
+
+
 static inline uint16_t
 em_get_ipv4_dst_port(void *ipv4_hdr, uint16_t portid, void *lookup_struct)
 {
@@ -285,8 +308,7 @@  em_get_ipv6_dst_port(void *ipv6_hdr, uint16_t portid, void *lookup_struct)
 	 * Get part of 5 tuple: dst IP address lower 96 bits
 	 * and src IP address higher 32 bits.
 	 */
-	key.xmm[1] = *(xmm_t *)data1;
-
+	key.xmm[1] = em_load_key(data1);
 	/*
 	 * Get part of 5 tuple: dst port and src port
 	 * and dst IP address higher 32 bits.