[v2] examples/l3fwd: fix unaligned memory access
Checks
Commit Message
Fix unaligned memory access when reading IPv6 header which
leads to segmentation fault by changing aligned memory read
to unaligned memory read.
Bugzilla ID: 279
Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")
Cc: maciej.czekaj@caviumnetworks.com
Cc: stable@dpdk.org
Signed-off-by: hgovindh <hariprasad.govindharajan@intel.com>
---
V2: Added functions which will do unaligned load based on the
underlying architecture
---
---
examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++--
1 file changed, 24 insertions(+), 2 deletions(-)
Comments
On Thu, Jul 25, 2019 at 05:29:03PM +0100, hgovindh wrote:
> Fix unaligned memory access when reading IPv6 header which
> leads to segmentation fault by changing aligned memory read
> to unaligned memory read.
>
> Bugzilla ID: 279
> Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")
> Cc: maciej.czekaj@caviumnetworks.com
> Cc: stable@dpdk.org
> Signed-off-by: hgovindh <hariprasad.govindharajan@intel.com>
> ---
> V2: Added functions which will do unaligned load based on the
> underlying architecture
> ---
> ---
> examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++--
> 1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
> index fa8f82be6..f2641586b 100644
> --- a/examples/l3fwd/l3fwd_em.c
> +++ b/examples/l3fwd/l3fwd_em.c
> @@ -244,6 +244,29 @@ em_mask_key(void *key, xmm_t mask)
> #error No vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
> #endif
>
> +#if defined(RTE_MACHINE_CPUFLAG_SSE2)
> +static inline xmm_t
> +em_load_key(void *key)
> +{
> + return _mm_loadu_si128((__m128i *)(key));
> +}
> +#elif defined(RTE_MACHINE_CPUFLAG_NEON)
> +static inline xmm_t
> +em_load_key(void *key)
> +{
> + return vld1q_s32((int32_t *)key);
> +}
> +#elif defined(RTE_MACHINE_CPUFLAG_ALTIVEC)
> +static inline xmm_t
> +em_load_key(void *key)
> +{
> + return vec_ld(0, (xmm_t *)(key));
> +}
Two minor nits:
Since you are passing in a void *, no typecasts should be needed in any of
these functions.
Also, is it neater if you just have the ifdefs in the middle of the
function, rather than duplicating the function prototype each time? Third
option is to make the load a single-line macro rather than 5-lines of a
function.
/Bruce
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Bruce Richardson
> Sent: Thursday, July 25, 2019 10:16 PM
> To: hgovindh <hariprasad.govindharajan@intel.com>
> Cc: Remy Horton <remy.horton@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Ori Kam <orika@mellanox.com>; Pablo de
> Lara <pablo.de.lara.guarch@intel.com>; Radu Nicolau
> <radu.nicolau@intel.com>; Akhil Goyal <akhil.goyal@nxp.com>; Tomasz
> Kantecki <tomasz.kantecki@intel.com>; dev@dpdk.org;
> maciej.czekaj@caviumnetworks.com; stable@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] examples/l3fwd: fix unaligned memory
> access
>
> On Thu, Jul 25, 2019 at 05:29:03PM +0100, hgovindh wrote:
> > Fix unaligned memory access when reading IPv6 header which leads to
> > segmentation fault by changing aligned memory read to unaligned memory
> > read.
> >
> > Bugzilla ID: 279
> > Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")
> > Cc: maciej.czekaj@caviumnetworks.com
> > Cc: stable@dpdk.org
> > Signed-off-by: hgovindh <hariprasad.govindharajan@intel.com>
> > ---
> > V2: Added functions which will do unaligned load based on the
> > underlying architecture
> > ---
> > ---
> > examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++--
> > 1 file changed, 24 insertions(+), 2 deletions(-)
> >
> > diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
> > index fa8f82be6..f2641586b 100644
> > --- a/examples/l3fwd/l3fwd_em.c
> > +++ b/examples/l3fwd/l3fwd_em.c
> > @@ -244,6 +244,29 @@ em_mask_key(void *key, xmm_t mask) #error No
> > vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
> > #endif
> >
> > +#if defined(RTE_MACHINE_CPUFLAG_SSE2) static inline xmm_t
> > +em_load_key(void *key) {
> > + return _mm_loadu_si128((__m128i *)(key)); } #elif
> > +defined(RTE_MACHINE_CPUFLAG_NEON)
> > +static inline xmm_t
> > +em_load_key(void *key)
> > +{
> > + return vld1q_s32((int32_t *)key);
> > +}
> > +#elif defined(RTE_MACHINE_CPUFLAG_ALTIVEC)
> > +static inline xmm_t
> > +em_load_key(void *key)
> > +{
> > + return vec_ld(0, (xmm_t *)(key));
> > +}
Added power pc maintainer
Not sure all architecture need SIMD instructions for access to unaligned memory location.
@hgovindh,
Could you provide exact setup details for reproducing this issue, I can test it on arm64.
Like l3fwd command, Traffic generator traffic pattern
>> On Thu, Jul 25, 2019 at 05:29:03PM +0100, hgovindh wrote:
>>> Fix unaligned memory access when reading IPv6 header which leads to
>>> segmentation fault by changing aligned memory read to unaligned memory
>>> read.
>>>
>>> Bugzilla ID: 279
>>> Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")
>>> Cc: maciej.czekaj@caviumnetworks.com
>>> Cc: stable@dpdk.org
>>> Signed-off-by: hgovindh <hariprasad.govindharajan@intel.com>
>>> ---
>>> V2: Added functions which will do unaligned load based on the
>>> underlying architecture
>>> ---
>>> ---
>>> examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++--
>>> 1 file changed, 24 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
>>> index fa8f82be6..f2641586b 100644
>>> --- a/examples/l3fwd/l3fwd_em.c
>>> +++ b/examples/l3fwd/l3fwd_em.c
>>> @@ -244,6 +244,29 @@ em_mask_key(void *key, xmm_t mask) #error No
>>> vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
>>> #endif
>>>
>>> +#if defined(RTE_MACHINE_CPUFLAG_SSE2) static inline xmm_t
>>> +em_load_key(void *key) {
>>> + return _mm_loadu_si128((__m128i *)(key)); } #elif
>>> +defined(RTE_MACHINE_CPUFLAG_NEON)
>>> +static inline xmm_t
>>> +em_load_key(void *key)
>>> +{
>>> + return vld1q_s32((int32_t *)key);
>>> +}
>>> +#elif defined(RTE_MACHINE_CPUFLAG_ALTIVEC)
>>> +static inline xmm_t
>>> +em_load_key(void *key)
>>> +{
>>> + return vec_ld(0, (xmm_t *)(key));
>>> +}
>
> Added power pc maintainer
> Not sure all architecture need SIMD instructions for access to unaligned memory location.
>
> @hgovindh,
> Could you provide exact setup details for reproducing this issue, I can test it on arm64.
> Like l3fwd command, Traffic generator traffic pattern
The vec_ld() function requires 16 byte alignment. (My understanding is
that GCC code will mask the lower four bits of the address to enforce
the requirement:
https://gcc.gcc.gnu.narkive.com/cJndcMpR/vec-ld-versus-vec-vsx-ld-on-power8)
Power 8 and later processors support the vec_vsx_ld() function which
does not have the same memory alignment requirements.
I'll need to try and reproduce the original bug to see what code is
actually being generated. Outside of vector instructions I wouldn't
expect to see errors with unaligned data references.
Dave
>>>> Fix unaligned memory access when reading IPv6 header which leads to
>>>> segmentation fault by changing aligned memory read to unaligned memory
>>>> read.
>>>>
>>>> Bugzilla ID: 279
>>>> Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")
>>>> Cc: maciej.czekaj@caviumnetworks.com
>>>> Cc: stable@dpdk.org
>>>> Signed-off-by: hgovindh <hariprasad.govindharajan@intel.com>
>>>> ---
>>>> V2: Added functions which will do unaligned load based on the
>>>> underlying architecture
>>>> ---
>>>> ---
>>>> examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++--
>>>> 1 file changed, 24 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
>>>> index fa8f82be6..f2641586b 100644
>>>> --- a/examples/l3fwd/l3fwd_em.c
>>>> +++ b/examples/l3fwd/l3fwd_em.c
>>>> @@ -244,6 +244,29 @@ em_mask_key(void *key, xmm_t mask) #error No
>>>> vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
>>>> #endif
>>>>
>>>> +#if defined(RTE_MACHINE_CPUFLAG_SSE2) static inline xmm_t
>>>> +em_load_key(void *key) {
>>>> + return _mm_loadu_si128((__m128i *)(key)); } #elif
>>>> +defined(RTE_MACHINE_CPUFLAG_NEON)
>>>> +static inline xmm_t
>>>> +em_load_key(void *key)
>>>> +{
>>>> + return vld1q_s32((int32_t *)key);
>>>> +}
>>>> +#elif defined(RTE_MACHINE_CPUFLAG_ALTIVEC)
>>>> +static inline xmm_t
>>>> +em_load_key(void *key)
>>>> +{
>>>> + return vec_ld(0, (xmm_t *)(key));
>>>> +}
>>
>> Added power pc maintainer
>
>> Not sure all architecture need SIMD instructions for access to
>> unaligned memory location.
>>
>> @hgovindh,
>> Could you provide exact setup details for reproducing this issue, I
>> can test it on arm64.
>> Like l3fwd command, Traffic generator traffic pattern
>
> The vec_ld() function requires 16 byte alignment. (My understanding is
> that GCC code will mask the lower four bits of the address to enforce
> the requirement:
> https://gcc.gcc.gnu.narkive.com/cJndcMpR/vec-ld-versus-vec-vsx-ld-on-power8)
> Power 8 and later processors support the vec_vsx_ld() function which
> does not have the same memory alignment requirements.
>
> I'll need to try and reproduce the original bug to see what code is
> actually being generated. Outside of vector instructions I wouldn't
> expect to see errors with unaligned data references.
Tested original bugzilla 279 on Power 9 system with RHEL 7.6 and gcc
4.8.5, no segmentation fault observed after 30 minutes (observed
segmentation fault on Intel system immediately).
Code dissassembly:
(gdb) info line l3fwd_em.c:290
Line 290 of "/home/davec/src/dpdk/examples/l3fwd/l3fwd_em.c" starts at
address 0x10146fbc <em_main_loop+1660>
and ends at 0x10146fc0 <em_main_loop+1664>.
(gdb) disass /m 0x10146fbc,0x10146fc0
Dump of assembler code from 0x10146fbc to 0x10146fc0:
290 key.xmm[1] = *(xmm_t *)data1;
0x0000000010146fbc <em_main_loop+1660>: li r7,20
End of assembler dump.
Since vector element ordering is different on Intel vs Power/ARM,
suggest only applying vector operation to Intel code at this time
otherwise additional steps may be required to modify MASK values to
match the new vector operations.
Dave
> -----Original Message-----
> From: David Christensen <drc@linux.vnet.ibm.com>
> Sent: Friday, July 26, 2019 3:36 AM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Bruce Richardson
> <bruce.richardson@intel.com>; hgovindh
> <hariprasad.govindharajan@intel.com>
> Cc: Remy Horton <remy.horton@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Ori Kam <orika@mellanox.com>; Pablo de
> Lara <pablo.de.lara.guarch@intel.com>; Radu Nicolau
> <radu.nicolau@intel.com>; Akhil Goyal <akhil.goyal@nxp.com>; Tomasz
> Kantecki <tomasz.kantecki@intel.com>; dev@dpdk.org;
> maciej.czekaj@caviumnetworks.com; stable@dpdk.org; Gavin Hu
> <gavin.hu@arm.com>
> Subject: [EXT] Re: [dpdk-dev] [PATCH v2] examples/l3fwd: fix unaligned
> memory access
>
>
>>>> Fix unaligned memory access when reading IPv6 header which leads to
> >>>> segmentation fault by changing aligned memory read to unaligned
> >>>> memory read.
> >>>>
> >>>> Bugzilla ID: 279
> >>>> Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")
> >>>> Cc: maciej.czekaj@caviumnetworks.com
> >>>> Cc: stable@dpdk.org
> >>>> Signed-off-by: hgovindh <hariprasad.govindharajan@intel.com>
> >>>> ---
> >>>> V2: Added functions which will do unaligned load based on the
> >>>> underlying architecture
> >>>> ---
> >>>> ---
> >>>> examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++--
> >>>> 1 file changed, 24 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/examples/l3fwd/l3fwd_em.c
> b/examples/l3fwd/l3fwd_em.c
> >>>> index fa8f82be6..f2641586b 100644
> >>>> --- a/examples/l3fwd/l3fwd_em.c
> >>>> +++ b/examples/l3fwd/l3fwd_em.c
> >>>> @@ -244,6 +244,29 @@ em_mask_key(void *key, xmm_t
> mask) #error No
> >>>> vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
> >>>> #endif
> >>>>
> >>>> +#if defined(RTE_MACHINE_CPUFLAG_SSE2) static inline xmm_t
> >>>> +em_load_key(void *key) {
> >>>> + return _mm_loadu_si128((__m128i *)(key)); } #elif
> >>>> +defined(RTE_MACHINE_CPUFLAG_NEON)
> >>>> +static inline xmm_t
> >>>> +em_load_key(void *key)
> >>>> +{
> >>>> + return vld1q_s32((int32_t *)key); } #elif
> >>>> +defined(RTE_MACHINE_CPUFLAG_ALTIVEC)
> >>>> +static inline xmm_t
> >>>> +em_load_key(void *key)
> >>>> +{
> >>>> + return vec_ld(0, (xmm_t *)(key)); }
> >>
> >> Added power pc maintainer
> >
> >> Not sure all architecture need SIMD instructions for access to
> >> unaligned memory location.
> >>
> >> @hgovindh,
> >> Could you provide exact setup details for reproducing this issue, I
> >> can test it on arm64.
> >> Like l3fwd command, Traffic generator traffic pattern
> >
> > The vec_ld() function requires 16 byte alignment. (My understanding
> > is that GCC code will mask the lower four bits of the address to
> > enforce the requirement:
> > https://gcc.gcc.gnu.narkive.com/cJndcMpR/vec-ld-versus-vec-vsx-ld-on-p
> > ower8)
> > Power 8 and later processors support the vec_vsx_ld() function which
> > does not have the same memory alignment requirements.
> >
> > I'll need to try and reproduce the original bug to see what code is
> > actually being generated. Outside of vector instructions I wouldn't
> > expect to see errors with unaligned data references.
>
> Tested original bugzilla 279 on Power 9 system with RHEL 7.6 and gcc 4.8.5, no
> segmentation fault observed after 30 minutes (observed segmentation fault
> on Intel system immediately).
>
> Code dissassembly:
> (gdb) info line l3fwd_em.c:290
> Line 290 of "/home/davec/src/dpdk/examples/l3fwd/l3fwd_em.c" starts at
> address 0x10146fbc <em_main_loop+1660>
> and ends at 0x10146fc0 <em_main_loop+1664>.
> (gdb) disass /m 0x10146fbc,0x10146fc0
> Dump of assembler code from 0x10146fbc to 0x10146fc0:
> 290 key.xmm[1] = *(xmm_t *)data1;
> 0x0000000010146fbc <em_main_loop+1660>: li r7,20
>
> End of assembler dump.
>
> Since vector element ordering is different on Intel vs Power/ARM, suggest
> only applying vector operation to Intel code at this time otherwise additional
> steps may be required to modify MASK values to match the new vector
> operations.
On arm64, Generated assembly is following. Where LDUR and STR works
With unaligned memory(i.e no need for special handling).
I would suggest to have eal function to abstract The difference between x86 vs Power/ARM
to avoid ifdef clutter in all the applications.
key.xmm[1] = *(xmm_t *)data1;
0x00000000004ebed4 <+1188>: 60 40 c1 3c ldur q0, [x3, #20]
0x00000000004ebedc <+1196>: a0 73 80 3d str q0, [x29, #448]
0x00000000004ec064 <+1588>: 41 40 c1 3c ldur q1, [x2, #20]
0x00000000004ec06c <+1596>: a1 73 80 3d str q1, [x29, #448]
>
> Dave
@@ -244,6 +244,29 @@ em_mask_key(void *key, xmm_t mask)
#error No vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
#endif
+#if defined(RTE_MACHINE_CPUFLAG_SSE2)
+static inline xmm_t
+em_load_key(void *key)
+{
+ return _mm_loadu_si128((__m128i *)(key));
+}
+#elif defined(RTE_MACHINE_CPUFLAG_NEON)
+static inline xmm_t
+em_load_key(void *key)
+{
+ return vld1q_s32((int32_t *)key);
+}
+#elif defined(RTE_MACHINE_CPUFLAG_ALTIVEC)
+static inline xmm_t
+em_load_key(void *key)
+{
+ return vec_ld(0, (xmm_t *)(key));
+}
+#else
+#error No vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
+#endif
+
+
static inline uint16_t
em_get_ipv4_dst_port(void *ipv4_hdr, uint16_t portid, void *lookup_struct)
{
@@ -285,8 +308,7 @@ em_get_ipv6_dst_port(void *ipv6_hdr, uint16_t portid, void *lookup_struct)
* Get part of 5 tuple: dst IP address lower 96 bits
* and src IP address higher 32 bits.
*/
- key.xmm[1] = *(xmm_t *)data1;
-
+ key.xmm[1] = em_load_key(data1);
/*
* Get part of 5 tuple: dst port and src port
* and dst IP address higher 32 bits.