ring: fix unaligned memory access on aarch32

Message ID 1583774395-10233-1-git-send-email-phil.yang@arm.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series ring: fix unaligned memory access on aarch32 |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/travis-robot success Travis build: passed
ci/iol-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/Intel-compilation fail apply issues

Commit Message

Phil Yang March 9, 2020, 5:19 p.m. UTC
  The 32-bit arm machine doesn't support unaligned memory access. It
will cause a bus error on aarch32 with the custom element size ring.

Thread 1 "test" received signal SIGBUS, Bus error.
__rte_ring_enqueue_elems_64 (n=1, obj_table=0xf5edfe41, prod_head=0, \
r=0xf5edfb80) at /build/dpdk/build/include/rte_ring_elem.h:177
177                             ring[idx++] = obj[i++];

Fixes: cc4b218790f6 ("ring: support configurable element size")

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_ring/rte_ring_elem.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

David Marchand March 19, 2020, 3:56 p.m. UTC | #1
On Mon, Mar 9, 2020 at 6:20 PM Phil Yang <phil.yang@arm.com> wrote:
>
> The 32-bit arm machine doesn't support unaligned memory access. It
> will cause a bus error on aarch32 with the custom element size ring.
>
> Thread 1 "test" received signal SIGBUS, Bus error.
> __rte_ring_enqueue_elems_64 (n=1, obj_table=0xf5edfe41, prod_head=0, \
> r=0xf5edfb80) at /build/dpdk/build/include/rte_ring_elem.h:177
> 177                             ring[idx++] = obj[i++];
>
> Fixes: cc4b218790f6 ("ring: support configurable element size")
>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Applied, thanks.
  
Morten Brørup Nov. 4, 2023, 12:04 a.m. UTC | #2
I have for a long time now wondered why the ring functions for enqueue/dequeue of 64-bit objects supports unaligned addresses, and now I finally found the patch introducing it.

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Phil Yang
> Sent: Monday, 9 March 2020 18.20
> 
> The 32-bit arm machine doesn't support unaligned memory access. It
> will cause a bus error on aarch32 with the custom element size ring.
> 
> Thread 1 "test" received signal SIGBUS, Bus error.
> __rte_ring_enqueue_elems_64 (n=1, obj_table=0xf5edfe41, prod_head=0, \
> r=0xf5edfb80) at /build/dpdk/build/include/rte_ring_elem.h:177
> 177                             ring[idx++] = obj[i++];

Which test is this? Why is it using an unaligned array of 64-bit objects? (Notice that obj_table=0xf5edfe41.)

Nobody in their right mind would use an unaligned array of 64-bit objects. You can only create such an array if you force the compiler to prevent automatic alignment! And all the functions in your application using this array would also need to support unaligned addressing of these objects.

This seems extremely exotic, and not something any real application would do!

I would like to revert this patch for performance reasons.

> 
> Fixes: cc4b218790f6 ("ring: support configurable element size")
> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  lib/librte_ring/rte_ring_elem.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring_elem.h
> b/lib/librte_ring/rte_ring_elem.h
> index 3976757..663addc 100644
> --- a/lib/librte_ring/rte_ring_elem.h
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -160,7 +160,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r,
> uint32_t prod_head,
>  	const uint32_t size = r->size;
>  	uint32_t idx = prod_head & r->mask;
>  	uint64_t *ring = (uint64_t *)&r[1];
> -	const uint64_t *obj = (const uint64_t *)obj_table;
> +	const unaligned_uint64_t *obj = (const unaligned_uint64_t
> *)obj_table;
>  	if (likely(idx + n < size)) {
>  		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
>  			ring[idx] = obj[i];
> @@ -294,7 +294,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring *r,
> uint32_t prod_head,
>  	const uint32_t size = r->size;
>  	uint32_t idx = prod_head & r->mask;
>  	uint64_t *ring = (uint64_t *)&r[1];
> -	uint64_t *obj = (uint64_t *)obj_table;
> +	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
>  	if (likely(idx + n < size)) {
>  		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
>  			obj[i] = ring[idx];
> --
> 2.7.4
> 

References:
https://git.dpdk.org/dpdk/commit/lib/librte_ring/rte_ring_elem.h?id=3ba51478a3ab3132c33effc8b132641233275b36
https://patchwork.dpdk.org/project/dpdk/patch/1583774395-10233-1-git-send-email-phil.yang@arm.com/
  
Honnappa Nagarahalli Nov. 4, 2023, 4:32 p.m. UTC | #3
> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Friday, November 3, 2023 7:04 PM
> To: Phil Yang <Phil.Yang@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; dev@dpdk.org
> Cc: david.marchand@redhat.com; olivier.matz@6wind.com; Dharmik Jayesh
> Thakkar <DharmikJayesh.Thakkar@arm.com>; Gavin Hu
> <Gavin.Hu@arm.com>; nd <nd@arm.com>; andrew.rybchenko@oktetlabs.ru
> Subject: RE: [dpdk-dev] [PATCH] ring: fix unaligned memory access on aarch32
> 
> I have for a long time now wondered why the ring functions for
> enqueue/dequeue of 64-bit objects supports unaligned addresses, and now I
> finally found the patch introducing it.
> 
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Phil Yang
> > Sent: Monday, 9 March 2020 18.20
> >
> > The 32-bit arm machine doesn't support unaligned memory access. It
> > will cause a bus error on aarch32 with the custom element size ring.
> >
> > Thread 1 "test" received signal SIGBUS, Bus error.
> > __rte_ring_enqueue_elems_64 (n=1, obj_table=0xf5edfe41, prod_head=0, \
> > r=0xf5edfb80) at /build/dpdk/build/include/rte_ring_elem.h:177
> > 177                             ring[idx++] = obj[i++];
> 
> Which test is this? Why is it using an unaligned array of 64-bit objects? (Notice
> that obj_table=0xf5edfe41.)
Can't recollect which test it is. I am guessing one of the unit test cases. We might have to reinvestigate, not sure why the obj_table is unaligned.

> 
> Nobody in their right mind would use an unaligned array of 64-bit objects. You
> can only create such an array if you force the compiler to prevent automatic
> alignment! And all the functions in your application using this array would also
> need to support unaligned addressing of these objects.
> 
> This seems extremely exotic, and not something any real application would do!
> 
> I would like to revert this patch for performance reasons.
Can you provide more details? Platform, test, how much is the regression?

> 
> >
> > Fixes: cc4b218790f6 ("ring: support configurable element size")
> >
> > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >  lib/librte_ring/rte_ring_elem.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_ring/rte_ring_elem.h
> > b/lib/librte_ring/rte_ring_elem.h index 3976757..663addc 100644
> > --- a/lib/librte_ring/rte_ring_elem.h
> > +++ b/lib/librte_ring/rte_ring_elem.h
> > @@ -160,7 +160,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r,
> > uint32_t prod_head,
> >  	const uint32_t size = r->size;
> >  	uint32_t idx = prod_head & r->mask;
> >  	uint64_t *ring = (uint64_t *)&r[1];
> > -	const uint64_t *obj = (const uint64_t *)obj_table;
> > +	const unaligned_uint64_t *obj = (const unaligned_uint64_t
> > *)obj_table;
> >  	if (likely(idx + n < size)) {
> >  		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> >  			ring[idx] = obj[i];
> > @@ -294,7 +294,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring *r,
> > uint32_t prod_head,
> >  	const uint32_t size = r->size;
> >  	uint32_t idx = prod_head & r->mask;
> >  	uint64_t *ring = (uint64_t *)&r[1];
> > -	uint64_t *obj = (uint64_t *)obj_table;
> > +	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
> >  	if (likely(idx + n < size)) {
> >  		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> >  			obj[i] = ring[idx];
> > --
> > 2.7.4
> >
> 
> References:
> https://git.dpdk.org/dpdk/commit/lib/librte_ring/rte_ring_elem.h?id=3ba514
> 78a3ab3132c33effc8b132641233275b36
> https://patchwork.dpdk.org/project/dpdk/patch/1583774395-10233-1-git-
> send-email-phil.yang@arm.com/
  
Morten Brørup Nov. 4, 2023, 4:54 p.m. UTC | #4
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Saturday, 4 November 2023 17.32
> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Friday, November 3, 2023 7:04 PM
> >
> > I have for a long time now wondered why the ring functions for
> > enqueue/dequeue of 64-bit objects supports unaligned addresses, and
> now I
> > finally found the patch introducing it.
> >
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Phil Yang
> > > Sent: Monday, 9 March 2020 18.20
> > >
> > > The 32-bit arm machine doesn't support unaligned memory access. It
> > > will cause a bus error on aarch32 with the custom element size
> ring.
> > >
> > > Thread 1 "test" received signal SIGBUS, Bus error.
> > > __rte_ring_enqueue_elems_64 (n=1, obj_table=0xf5edfe41,
> prod_head=0, \
> > > r=0xf5edfb80) at /build/dpdk/build/include/rte_ring_elem.h:177
> > > 177                             ring[idx++] = obj[i++];
> >
> > Which test is this? Why is it using an unaligned array of 64-bit
> objects? (Notice
> > that obj_table=0xf5edfe41.)
> Can't recollect which test it is. I am guessing one of the unit test
> cases. We might have to reinvestigate, not sure why the obj_table is
> unaligned.

Thank you for picking this up, Honnappa.

> 
> >
> > Nobody in their right mind would use an unaligned array of 64-bit
> objects. You
> > can only create such an array if you force the compiler to prevent
> automatic
> > alignment! And all the functions in your application using this array
> would also
> > need to support unaligned addressing of these objects.
> >
> > This seems extremely exotic, and not something any real application
> would do!
> >
> > I would like to revert this patch for performance reasons.
> Can you provide more details? Platform, test, how much is the
> regression?

I haven't seen a regression, but I speculate some performance cost on low-end CPUs. Maybe it is purely academic.

Maybe not purely academic... I just tested on Godbolt, which shows different code generated:

uint64_t fa(void *p)
{
    return *(uint64_t *)p;
}

uint64_t fu(void *p)
{
    return *(unaligned_uint64_t *)p;
}

Generates different output:

fa:
        ldrd    r0, [r0]
        bx      lr

fu:
        mov     r3, r0
        ldr     r0, [r0]  @ unaligned
        ldr     r1, [r3, #4]      @ unaligned
        bx      lr

> 
> >
> > >
> > > Fixes: cc4b218790f6 ("ring: support configurable element size")
> > >
> > > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > ---
> > >  lib/librte_ring/rte_ring_elem.h | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/lib/librte_ring/rte_ring_elem.h
> > > b/lib/librte_ring/rte_ring_elem.h index 3976757..663addc 100644
> > > --- a/lib/librte_ring/rte_ring_elem.h
> > > +++ b/lib/librte_ring/rte_ring_elem.h
> > > @@ -160,7 +160,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r,
> > > uint32_t prod_head,
> > >  	const uint32_t size = r->size;
> > >  	uint32_t idx = prod_head & r->mask;
> > >  	uint64_t *ring = (uint64_t *)&r[1];
> > > -	const uint64_t *obj = (const uint64_t *)obj_table;
> > > +	const unaligned_uint64_t *obj = (const unaligned_uint64_t
> > > *)obj_table;
> > >  	if (likely(idx + n < size)) {
> > >  		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > >  			ring[idx] = obj[i];
> > > @@ -294,7 +294,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring *r,
> > > uint32_t prod_head,
> > >  	const uint32_t size = r->size;
> > >  	uint32_t idx = prod_head & r->mask;
> > >  	uint64_t *ring = (uint64_t *)&r[1];
> > > -	uint64_t *obj = (uint64_t *)obj_table;
> > > +	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
> > >  	if (likely(idx + n < size)) {
> > >  		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > >  			obj[i] = ring[idx];
> > > --
> > > 2.7.4
> > >
> >
> > References:
> >
> https://git.dpdk.org/dpdk/commit/lib/librte_ring/rte_ring_elem.h?id=3ba
> 514
> > 78a3ab3132c33effc8b132641233275b36
> > https://patchwork.dpdk.org/project/dpdk/patch/1583774395-10233-1-git-
> > send-email-phil.yang@arm.com/
  
Ruifeng Wang Nov. 10, 2023, 8:39 a.m. UTC | #5
On 2023/11/4 8:04 AM, Morten Brørup wrote:
> I have for a long time now wondered why the ring functions for enqueue/dequeue of 64-bit objects supports unaligned addresses, and now I finally found the patch introducing it.
> 
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Phil Yang
>> Sent: Monday, 9 March 2020 18.20
>>
>> The 32-bit arm machine doesn't support unaligned memory access. It
>> will cause a bus error on aarch32 with the custom element size ring.
>>
>> Thread 1 "test" received signal SIGBUS, Bus error.
>> __rte_ring_enqueue_elems_64 (n=1, obj_table=0xf5edfe41, prod_head=0, \
>> r=0xf5edfb80) at /build/dpdk/build/include/rte_ring_elem.h:177
>> 177                             ring[idx++] = obj[i++];
> 
> Which test is this? Why is it using an unaligned array of 64-bit objects? (Notice that obj_table=0xf5edfe41.)

The test case is:
https://elixir.bootlin.com/dpdk/latest/source/app/test/test_ring.c#L1121
This case deliberately use unaligned objects.

> 
> Nobody in their right mind would use an unaligned array of 64-bit objects. You can only create such an array if you force the compiler to prevent automatic alignment! And all the functions in your application using this array would also need to support unaligned addressing of these objects.
> 
> This seems extremely exotic, and not something any real application would do!
> 
> I would like to revert this patch for performance reasons.
> 
>>
>> Fixes: cc4b218790f6 ("ring: support configurable element size")
>>
>> Signed-off-by: Phil Yang <phil.yang@arm.com>
>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>> ---
>>   lib/librte_ring/rte_ring_elem.h | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/librte_ring/rte_ring_elem.h
>> b/lib/librte_ring/rte_ring_elem.h
>> index 3976757..663addc 100644
>> --- a/lib/librte_ring/rte_ring_elem.h
>> +++ b/lib/librte_ring/rte_ring_elem.h
>> @@ -160,7 +160,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r,
>> uint32_t prod_head,
>>   	const uint32_t size = r->size;
>>   	uint32_t idx = prod_head & r->mask;
>>   	uint64_t *ring = (uint64_t *)&r[1];
>> -	const uint64_t *obj = (const uint64_t *)obj_table;
>> +	const unaligned_uint64_t *obj = (const unaligned_uint64_t
>> *)obj_table;
>>   	if (likely(idx + n < size)) {
>>   		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
>>   			ring[idx] = obj[i];
>> @@ -294,7 +294,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring *r,
>> uint32_t prod_head,
>>   	const uint32_t size = r->size;
>>   	uint32_t idx = prod_head & r->mask;
>>   	uint64_t *ring = (uint64_t *)&r[1];
>> -	uint64_t *obj = (uint64_t *)obj_table;
>> +	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
>>   	if (likely(idx + n < size)) {
>>   		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
>>   			obj[i] = ring[idx];
>> --
>> 2.7.4
>>
> 
> References:
> https://git.dpdk.org/dpdk/commit/lib/librte_ring/rte_ring_elem.h?id=3ba51478a3ab3132c33effc8b132641233275b36
> https://patchwork.dpdk.org/project/dpdk/patch/1583774395-10233-1-git-send-email-phil.yang@arm.com/
>
  
Morten Brørup Nov. 10, 2023, 9:34 a.m. UTC | #6
+CC Gavin, reviewed the test case

> From: Ruifeng Wang [mailto:Ruifeng.Wang@arm.com]
> Sent: Friday, 10 November 2023 09.40
> 
> On 2023/11/4 8:04 AM, Morten Brørup wrote:
> > I have for a long time now wondered why the ring functions for
> enqueue/dequeue of 64-bit objects supports unaligned addresses, and now
> I finally found the patch introducing it.
> >
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Phil Yang
> >> Sent: Monday, 9 March 2020 18.20
> >>
> >> The 32-bit arm machine doesn't support unaligned memory access. It
> >> will cause a bus error on aarch32 with the custom element size ring.
> >>
> >> Thread 1 "test" received signal SIGBUS, Bus error.
> >> __rte_ring_enqueue_elems_64 (n=1, obj_table=0xf5edfe41, prod_head=0,
> \
> >> r=0xf5edfb80) at /build/dpdk/build/include/rte_ring_elem.h:177
> >> 177                             ring[idx++] = obj[i++];
> >
> > Which test is this? Why is it using an unaligned array of 64-bit
> objects? (Notice that obj_table=0xf5edfe41.)
> 
> The test case is:
> https://elixir.bootlin.com/dpdk/latest/source/app/test/test_ring.c#L112
> 1
> This case deliberately use unaligned objects.

Thank you, Ruifeng.

Honnappa, I see you signed off on the patch introducing the test for unaligned objects:
http://git.dpdk.org/dpdk/commit/app/test/test_ring.c?id=a9fe152363e283d4c590ab8e8d51396f2ffab9ff

What was the rationale behind testing support for unaligned object pointers? Did any applications/customers use unaligned object pointers, or is it a purely academic test case?

> 
> >
> > Nobody in their right mind would use an unaligned array of 64-bit
> objects. You can only create such an array if you force the compiler to
> prevent automatic alignment! And all the functions in your application
> using this array would also need to support unaligned addressing of
> these objects.
> >
> > This seems extremely exotic, and not something any real application
> would do!
> >
> > I would like to revert this patch for performance reasons.

I could add an RTE_ASSERT() to verify that the pointer is aligned, for debugging purposes.

> >
> >>
> >> Fixes: cc4b218790f6 ("ring: support configurable element size")
> >>
> >> Signed-off-by: Phil Yang <phil.yang@arm.com>
> >> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >> ---
> >>   lib/librte_ring/rte_ring_elem.h | 4 ++--
> >>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/lib/librte_ring/rte_ring_elem.h
> >> b/lib/librte_ring/rte_ring_elem.h
> >> index 3976757..663addc 100644
> >> --- a/lib/librte_ring/rte_ring_elem.h
> >> +++ b/lib/librte_ring/rte_ring_elem.h
> >> @@ -160,7 +160,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r,
> >> uint32_t prod_head,
> >>   	const uint32_t size = r->size;
> >>   	uint32_t idx = prod_head & r->mask;
> >>   	uint64_t *ring = (uint64_t *)&r[1];
> >> -	const uint64_t *obj = (const uint64_t *)obj_table;
> >> +	const unaligned_uint64_t *obj = (const unaligned_uint64_t
> >> *)obj_table;
> >>   	if (likely(idx + n < size)) {
> >>   		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> >>   			ring[idx] = obj[i];
> >> @@ -294,7 +294,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring *r,
> >> uint32_t prod_head,
> >>   	const uint32_t size = r->size;
> >>   	uint32_t idx = prod_head & r->mask;
> >>   	uint64_t *ring = (uint64_t *)&r[1];
> >> -	uint64_t *obj = (uint64_t *)obj_table;
> >> +	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
> >>   	if (likely(idx + n < size)) {
> >>   		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> >>   			obj[i] = ring[idx];
> >> --
> >> 2.7.4
> >>
> >
> > References:
> >
> https://git.dpdk.org/dpdk/commit/lib/librte_ring/rte_ring_elem.h?id=3ba
> 51478a3ab3132c33effc8b132641233275b36
> > https://patchwork.dpdk.org/project/dpdk/patch/1583774395-10233-1-git-
> send-email-phil.yang@arm.com/
> >
  
Konstantin Ananyev Nov. 10, 2023, 9:44 a.m. UTC | #7
> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Friday, November 10, 2023 9:34 AM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; honnappa.nagarahalli@arm.com
> Cc: dev@dpdk.org; david.marchand@redhat.com; olivier.matz@6wind.com; dharmik.thakkar@arm.com; nd@arm.com;
> andrew.rybchenko@oktetlabs.ru; Gavin Hu <Gavin.Hu@arm.com>
> Subject: RE: [dpdk-dev] [PATCH] ring: fix unaligned memory access on aarch32
> 
> +CC Gavin, reviewed the test case
> 
> > From: Ruifeng Wang [mailto:Ruifeng.Wang@arm.com]
> > Sent: Friday, 10 November 2023 09.40
> >
> > On 2023/11/4 8:04 AM, Morten Brørup wrote:
> > > I have for a long time now wondered why the ring functions for
> > enqueue/dequeue of 64-bit objects supports unaligned addresses, and now
> > I finally found the patch introducing it.
> > >
> > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Phil Yang
> > >> Sent: Monday, 9 March 2020 18.20
> > >>
> > >> The 32-bit arm machine doesn't support unaligned memory access. It
> > >> will cause a bus error on aarch32 with the custom element size ring.
> > >>
> > >> Thread 1 "test" received signal SIGBUS, Bus error.
> > >> __rte_ring_enqueue_elems_64 (n=1, obj_table=0xf5edfe41, prod_head=0,
> > \
> > >> r=0xf5edfb80) at /build/dpdk/build/include/rte_ring_elem.h:177
> > >> 177                             ring[idx++] = obj[i++];
> > >
> > > Which test is this? Why is it using an unaligned array of 64-bit
> > objects? (Notice that obj_table=0xf5edfe41.)
> >
> > The test case is:
> > https://elixir.bootlin.com/dpdk/latest/source/app/test/test_ring.c#L112
> > 1
> > This case deliberately use unaligned objects.
> 
> Thank you, Ruifeng.
> 
> Honnappa, I see you signed off on the patch introducing the test for unaligned objects:
> http://git.dpdk.org/dpdk/commit/app/test/test_ring.c?id=a9fe152363e283d4c590ab8e8d51396f2ffab9ff
> 
> What was the rationale behind testing support for unaligned object pointers? Did any applications/customers use unaligned object
> pointers, or is it a purely academic test case?
> 
> >
> > >
> > > Nobody in their right mind would use an unaligned array of 64-bit
> > objects. You can only create such an array if you force the compiler to
> > prevent automatic alignment! And all the functions in your application
> > using this array would also need to support unaligned addressing of
> > these objects.

It could be just one elem, not an array.
And the user can use 'packed' struct or so...
Agree, not a common case, but probably still possible.

> > >
> > > This seems extremely exotic, and not something any real application
> > would do!
> > >
> > > I would like to revert this patch for performance reasons.

Morten, could you probably explain first the problems you encountered with this patch?
You mention about 'performance reasons', so did you notice any real slowdown? 
 
> 
> I could add an RTE_ASSERT() to verify that the pointer is aligned, for debugging purposes.
> 
> > >
> > >>
> > >> Fixes: cc4b218790f6 ("ring: support configurable element size")
> > >>
> > >> Signed-off-by: Phil Yang <phil.yang@arm.com>
> > >> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > >> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > >> ---
> > >>   lib/librte_ring/rte_ring_elem.h | 4 ++--
> > >>   1 file changed, 2 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/lib/librte_ring/rte_ring_elem.h
> > >> b/lib/librte_ring/rte_ring_elem.h
> > >> index 3976757..663addc 100644
> > >> --- a/lib/librte_ring/rte_ring_elem.h
> > >> +++ b/lib/librte_ring/rte_ring_elem.h
> > >> @@ -160,7 +160,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r,
> > >> uint32_t prod_head,
> > >>   	const uint32_t size = r->size;
> > >>   	uint32_t idx = prod_head & r->mask;
> > >>   	uint64_t *ring = (uint64_t *)&r[1];
> > >> -	const uint64_t *obj = (const uint64_t *)obj_table;
> > >> +	const unaligned_uint64_t *obj = (const unaligned_uint64_t
> > >> *)obj_table;
> > >>   	if (likely(idx + n < size)) {
> > >>   		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > >>   			ring[idx] = obj[i];
> > >> @@ -294,7 +294,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring *r,
> > >> uint32_t prod_head,
> > >>   	const uint32_t size = r->size;
> > >>   	uint32_t idx = prod_head & r->mask;
> > >>   	uint64_t *ring = (uint64_t *)&r[1];
> > >> -	uint64_t *obj = (uint64_t *)obj_table;
> > >> +	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
> > >>   	if (likely(idx + n < size)) {
> > >>   		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > >>   			obj[i] = ring[idx];
> > >> --
> > >> 2.7.4
> > >>
> > >
> > > References:
> > >
> > https://git.dpdk.org/dpdk/commit/lib/librte_ring/rte_ring_elem.h?id=3ba
> > 51478a3ab3132c33effc8b132641233275b36
> > > https://patchwork.dpdk.org/project/dpdk/patch/1583774395-10233-1-git-
> > send-email-phil.yang@arm.com/
> > >
  
Morten Brørup Nov. 10, 2023, 10:43 a.m. UTC | #8
> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> Sent: Friday, 10 November 2023 10.45
> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Friday, November 10, 2023 9:34 AM
> >
> > +CC Gavin, reviewed the test case
> >
> > > From: Ruifeng Wang [mailto:Ruifeng.Wang@arm.com]
> > > Sent: Friday, 10 November 2023 09.40
> > >
> > > On 2023/11/4 8:04 AM, Morten Brørup wrote:
> > > > I have for a long time now wondered why the ring functions for
> > > enqueue/dequeue of 64-bit objects supports unaligned addresses, and
> now
> > > I finally found the patch introducing it.
> > > >
> > > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Phil Yang
> > > >> Sent: Monday, 9 March 2020 18.20
> > > >>
> > > >> The 32-bit arm machine doesn't support unaligned memory access.
> It
> > > >> will cause a bus error on aarch32 with the custom element size
> ring.
> > > >>
> > > >> Thread 1 "test" received signal SIGBUS, Bus error.
> > > >> __rte_ring_enqueue_elems_64 (n=1, obj_table=0xf5edfe41,
> prod_head=0,
> > > \
> > > >> r=0xf5edfb80) at /build/dpdk/build/include/rte_ring_elem.h:177
> > > >> 177                             ring[idx++] = obj[i++];
> > > >
> > > > Which test is this? Why is it using an unaligned array of 64-bit
> > > objects? (Notice that obj_table=0xf5edfe41.)
> > >
> > > The test case is:
> > >
> https://elixir.bootlin.com/dpdk/latest/source/app/test/test_ring.c#L112
> > > 1
> > > This case deliberately use unaligned objects.
> >
> > Thank you, Ruifeng.
> >
> > Honnappa, I see you signed off on the patch introducing the test for
> unaligned objects:
> >
> http://git.dpdk.org/dpdk/commit/app/test/test_ring.c?id=a9fe152363e283d
> 4c590ab8e8d51396f2ffab9ff
> >
> > What was the rationale behind testing support for unaligned object
> pointers? Did any applications/customers use unaligned object
> > pointers, or is it a purely academic test case?
> >
> > >
> > > >
> > > > Nobody in their right mind would use an unaligned array of 64-bit
> > > objects. You can only create such an array if you force the
> compiler to
> > > prevent automatic alignment! And all the functions in your
> application
> > > using this array would also need to support unaligned addressing of
> > > these objects.
> 
> It could be just one elem, not an array.
> And the user can use 'packed' struct or so...
> Agree, not a common case, but probably still possible.

Very good point, Konstantin. A single unaligned object in a packed structure is not as exotic as an unaligned array of objects (which I consider completely unrealistic).

If anyone is using an architecture which doesn't support unaligned accesses, I would expect them to completely avoid using unaligned objects. But perhaps you are right; however unlikely, it might happen.

If we think this is a real use case, should we add support for unaligned 32 bit objects?
(128 bit objects already support unaligned access; they are type casted to void pointer and accessed using memcpy().)

And how about the stack library, should it support unaligned objects too?

> 
> > > >
> > > > This seems extremely exotic, and not something any real
> application
> > > would do!
> > > >
> > > > I would like to revert this patch for performance reasons.
> 
> Morten, could you probably explain first the problems you encountered
> with this patch?
> You mention about 'performance reasons', so did you notice any real
> slowdown?

Please check my reply to the same question here:
http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9EFD3@smartserver.smartshare.dk/

> 
> >
> > I could add an RTE_ASSERT() to verify that the pointer is aligned,
> for debugging purposes.
> >
> > > >
> > > >>
> > > >> Fixes: cc4b218790f6 ("ring: support configurable element size")
> > > >>
> > > >> Signed-off-by: Phil Yang <phil.yang@arm.com>
> > > >> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > >> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > >> ---
> > > >>   lib/librte_ring/rte_ring_elem.h | 4 ++--
> > > >>   1 file changed, 2 insertions(+), 2 deletions(-)
> > > >>
> > > >> diff --git a/lib/librte_ring/rte_ring_elem.h
> > > >> b/lib/librte_ring/rte_ring_elem.h
> > > >> index 3976757..663addc 100644
> > > >> --- a/lib/librte_ring/rte_ring_elem.h
> > > >> +++ b/lib/librte_ring/rte_ring_elem.h
> > > >> @@ -160,7 +160,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring
> *r,
> > > >> uint32_t prod_head,
> > > >>   	const uint32_t size = r->size;
> > > >>   	uint32_t idx = prod_head & r->mask;
> > > >>   	uint64_t *ring = (uint64_t *)&r[1];
> > > >> -	const uint64_t *obj = (const uint64_t *)obj_table;
> > > >> +	const unaligned_uint64_t *obj = (const unaligned_uint64_t
> > > >> *)obj_table;
> > > >>   	if (likely(idx + n < size)) {
> > > >>   		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > > >>   			ring[idx] = obj[i];
> > > >> @@ -294,7 +294,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring
> *r,
> > > >> uint32_t prod_head,
> > > >>   	const uint32_t size = r->size;
> > > >>   	uint32_t idx = prod_head & r->mask;
> > > >>   	uint64_t *ring = (uint64_t *)&r[1];
> > > >> -	uint64_t *obj = (uint64_t *)obj_table;
> > > >> +	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
> > > >>   	if (likely(idx + n < size)) {
> > > >>   		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > > >>   			obj[i] = ring[idx];
> > > >> --
> > > >> 2.7.4
> > > >>
> > > >
> > > > References:
> > > >
> > >
> https://git.dpdk.org/dpdk/commit/lib/librte_ring/rte_ring_elem.h?id=3ba
> > > 51478a3ab3132c33effc8b132641233275b36
> > > > https://patchwork.dpdk.org/project/dpdk/patch/1583774395-10233-1-
> git-
> > > send-email-phil.yang@arm.com/
> > > >
  
Morten Brørup Nov. 10, 2023, 1:18 p.m. UTC | #9
Dear Ruifeng,
+CC: all CPU architecture maintainers,

I'm trying to figure out the requirements for supporting unaligned memory access in DPDK (specifically the ring library), and need your expert feedback.

The #define RTE_ARCH_STRICT_ALIGN - which is undocumented, but probably means that CPU memory access must be aligned - is only set by "generic_aarch32".

So we will assume that all other CPU architectures supported by DPDK can access unaligned memory.

As discussed in this thread, "generic_aarch32" has special requirements for performing 64-bit load/store at unaligned addresses.

Now comes the big question: Can "generic_aarch32" perform 32-bit load/store at unaligned addresses without similar special requirements? Then the ring library already supports unaligned 32-bit objects, and doesn't need to be fixed in this regard.


Med venlig hilsen / Kind regards,
-Morten Brørup


> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Friday, 10 November 2023 11.44
> 
> > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > Sent: Friday, 10 November 2023 10.45
> >
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: Friday, November 10, 2023 9:34 AM
> > >
> > > +CC Gavin, reviewed the test case
> > >
> > > > From: Ruifeng Wang [mailto:Ruifeng.Wang@arm.com]
> > > > Sent: Friday, 10 November 2023 09.40
> > > >
> > > > On 2023/11/4 8:04 AM, Morten Brørup wrote:
> > > > > I have for a long time now wondered why the ring functions for
> > > > enqueue/dequeue of 64-bit objects supports unaligned addresses,
> and
> > now
> > > > I finally found the patch introducing it.
> > > > >
> > > > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Phil Yang
> > > > >> Sent: Monday, 9 March 2020 18.20
> > > > >>
> > > > >> The 32-bit arm machine doesn't support unaligned memory
> access.
> > It
> > > > >> will cause a bus error on aarch32 with the custom element size
> > ring.
> > > > >>
> > > > >> Thread 1 "test" received signal SIGBUS, Bus error.
> > > > >> __rte_ring_enqueue_elems_64 (n=1, obj_table=0xf5edfe41,
> > prod_head=0,
> > > > \
> > > > >> r=0xf5edfb80) at /build/dpdk/build/include/rte_ring_elem.h:177
> > > > >> 177                             ring[idx++] = obj[i++];
> > > > >
> > > > > Which test is this? Why is it using an unaligned array of 64-
> bit
> > > > objects? (Notice that obj_table=0xf5edfe41.)
> > > >
> > > > The test case is:
> > > >
> >
> https://elixir.bootlin.com/dpdk/latest/source/app/test/test_ring.c#L112
> > > > 1
> > > > This case deliberately use unaligned objects.
> > >
> > > Thank you, Ruifeng.
> > >
> > > Honnappa, I see you signed off on the patch introducing the test
> for
> > unaligned objects:
> > >
> >
> http://git.dpdk.org/dpdk/commit/app/test/test_ring.c?id=a9fe152363e283d
> > 4c590ab8e8d51396f2ffab9ff
> > >
> > > What was the rationale behind testing support for unaligned object
> > pointers? Did any applications/customers use unaligned object
> > > pointers, or is it a purely academic test case?
> > >
> > > >
> > > > >
> > > > > Nobody in their right mind would use an unaligned array of 64-
> bit
> > > > objects. You can only create such an array if you force the
> > compiler to
> > > > prevent automatic alignment! And all the functions in your
> > application
> > > > using this array would also need to support unaligned addressing
> of
> > > > these objects.
> >
> > It could be just one elem, not an array.
> > And the user can use 'packed' struct or so...
> > Agree, not a common case, but probably still possible.
> 
> Very good point, Konstantin. A single unaligned object in a packed
> structure is not as exotic as an unaligned array of objects (which I
> consider completely unrealistic).
> 
> If anyone is using an architecture which doesn't support unaligned
> accesses, I would expect them to completely avoid using unaligned
> objects. But perhaps you are right; however unlikely, it might happen.
> 
> If we think this is a real use case, should we add support for
> unaligned 32 bit objects?
> (128 bit objects already support unaligned access; they are type casted
> to void pointer and accessed using memcpy().)
> 
> And how about the stack library, should it support unaligned objects
> too?
> 
> >
> > > > >
> > > > > This seems extremely exotic, and not something any real
> > application
> > > > would do!
> > > > >
> > > > > I would like to revert this patch for performance reasons.
> >
> > Morten, could you probably explain first the problems you encountered
> > with this patch?
> > You mention about 'performance reasons', so did you notice any real
> > slowdown?
> 
> Please check my reply to the same question here:
> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9EFD3@smarts
> erver.smartshare.dk/
> 
> >
> > >
> > > I could add an RTE_ASSERT() to verify that the pointer is aligned,
> > for debugging purposes.
> > >
> > > > >
> > > > >>
> > > > >> Fixes: cc4b218790f6 ("ring: support configurable element
> size")
> > > > >>
> > > > >> Signed-off-by: Phil Yang <phil.yang@arm.com>
> > > > >> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > >> Reviewed-by: Honnappa Nagarahalli
> <honnappa.nagarahalli@arm.com>
> > > > >> ---
> > > > >>   lib/librte_ring/rte_ring_elem.h | 4 ++--
> > > > >>   1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >>
> > > > >> diff --git a/lib/librte_ring/rte_ring_elem.h
> > > > >> b/lib/librte_ring/rte_ring_elem.h
> > > > >> index 3976757..663addc 100644
> > > > >> --- a/lib/librte_ring/rte_ring_elem.h
> > > > >> +++ b/lib/librte_ring/rte_ring_elem.h
> > > > >> @@ -160,7 +160,7 @@ __rte_ring_enqueue_elems_64(struct
> rte_ring
> > *r,
> > > > >> uint32_t prod_head,
> > > > >>   	const uint32_t size = r->size;
> > > > >>   	uint32_t idx = prod_head & r->mask;
> > > > >>   	uint64_t *ring = (uint64_t *)&r[1];
> > > > >> -	const uint64_t *obj = (const uint64_t *)obj_table;
> > > > >> +	const unaligned_uint64_t *obj = (const unaligned_uint64_t
> > > > >> *)obj_table;
> > > > >>   	if (likely(idx + n < size)) {
> > > > >>   		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > > > >>   			ring[idx] = obj[i];
> > > > >> @@ -294,7 +294,7 @@ __rte_ring_dequeue_elems_64(struct
> rte_ring
> > *r,
> > > > >> uint32_t prod_head,
> > > > >>   	const uint32_t size = r->size;
> > > > >>   	uint32_t idx = prod_head & r->mask;
> > > > >>   	uint64_t *ring = (uint64_t *)&r[1];
> > > > >> -	uint64_t *obj = (uint64_t *)obj_table;
> > > > >> +	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
> > > > >>   	if (likely(idx + n < size)) {
> > > > >>   		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > > > >>   			obj[i] = ring[idx];
> > > > >> --
> > > > >> 2.7.4
> > > > >>
> > > > >
> > > > > References:
> > > > >
> > > >
> >
> https://git.dpdk.org/dpdk/commit/lib/librte_ring/rte_ring_elem.h?id=3ba
> > > > 51478a3ab3132c33effc8b132641233275b36
> > > > > https://patchwork.dpdk.org/project/dpdk/patch/1583774395-10233-
> 1-
> > git-
> > > > send-email-phil.yang@arm.com/
> > > > >
  
Konstantin Ananyev Nov. 10, 2023, 7:05 p.m. UTC | #10
> > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > Sent: Friday, 10 November 2023 10.45
> >
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: Friday, November 10, 2023 9:34 AM
> > >
> > > +CC Gavin, reviewed the test case
> > >
> > > > From: Ruifeng Wang [mailto:Ruifeng.Wang@arm.com]
> > > > Sent: Friday, 10 November 2023 09.40
> > > >
> > > > On 2023/11/4 8:04 AM, Morten Brørup wrote:
> > > > > I have for a long time now wondered why the ring functions for
> > > > enqueue/dequeue of 64-bit objects supports unaligned addresses, and
> > now
> > > > I finally found the patch introducing it.
> > > > >
> > > > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Phil Yang
> > > > >> Sent: Monday, 9 March 2020 18.20
> > > > >>
> > > > >> The 32-bit arm machine doesn't support unaligned memory access.
> > It
> > > > >> will cause a bus error on aarch32 with the custom element size
> > ring.
> > > > >>
> > > > >> Thread 1 "test" received signal SIGBUS, Bus error.
> > > > >> __rte_ring_enqueue_elems_64 (n=1, obj_table=0xf5edfe41,
> > prod_head=0,
> > > > \
> > > > >> r=0xf5edfb80) at /build/dpdk/build/include/rte_ring_elem.h:177
> > > > >> 177                             ring[idx++] = obj[i++];
> > > > >
> > > > > Which test is this? Why is it using an unaligned array of 64-bit
> > > > objects? (Notice that obj_table=0xf5edfe41.)
> > > >
> > > > The test case is:
> > > >
> > https://elixir.bootlin.com/dpdk/latest/source/app/test/test_ring.c#L112
> > > > 1
> > > > This case deliberately use unaligned objects.
> > >
> > > Thank you, Ruifeng.
> > >
> > > Honnappa, I see you signed off on the patch introducing the test for
> > unaligned objects:
> > >
> > http://git.dpdk.org/dpdk/commit/app/test/test_ring.c?id=a9fe152363e283d
> > 4c590ab8e8d51396f2ffab9ff
> > >
> > > What was the rationale behind testing support for unaligned object
> > pointers? Did any applications/customers use unaligned object
> > > pointers, or is it a purely academic test case?
> > >
> > > >
> > > > >
> > > > > Nobody in their right mind would use an unaligned array of 64-bit
> > > > objects. You can only create such an array if you force the
> > compiler to
> > > > prevent automatic alignment! And all the functions in your
> > application
> > > > using this array would also need to support unaligned addressing of
> > > > these objects.
> >
> > It could be just one elem, not an array.
> > And the user can use 'packed' struct or so...
> > Agree, not a common case, but probably still possible.
> 
> Very good point, Konstantin. A single unaligned object in a packed structure is not as exotic as an unaligned array of objects (which I
> consider completely unrealistic).
> 
> If anyone is using an architecture which doesn't support unaligned accesses, I would expect them to completely avoid using unaligned
> objects. But perhaps you are right; however unlikely, it might happen.
> 
> If we think this is a real use case, should we add support for unaligned 32 bit objects?
> (128 bit objects already support unaligned access; they are type casted to void pointer and accessed using memcpy().)
> 
> And how about the stack library, should it support unaligned objects too?

 Hmm... good point, from my understanding - yes.
Probably we just been lucky so far here - no-one hit un-aligned case yet.
 
> >
> > > > >
> > > > > This seems extremely exotic, and not something any real
> > application
> > > > would do!
> > > > >
> > > > > I would like to revert this patch for performance reasons.
> >
> > Morten, could you probably explain first the problems you encountered
> > with this patch?
> > You mention about 'performance reasons', so did you notice any real
> > slowdown?
> 
> Please check my reply to the same question here:
> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9EFD3@smartserver.smartshare.dk/

As I remember, right now unaligned_uint(64,32)_t are really unaligned only for one specific architecture - aarch32.
All others should be unaffected.
 
> 
> >
> > >
> > > I could add an RTE_ASSERT() to verify that the pointer is aligned,
> > for debugging purposes.
> > >
> > > > >
> > > > >>
> > > > >> Fixes: cc4b218790f6 ("ring: support configurable element size")
> > > > >>
> > > > >> Signed-off-by: Phil Yang <phil.yang@arm.com>
> > > > >> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > >> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > >> ---
> > > > >>   lib/librte_ring/rte_ring_elem.h | 4 ++--
> > > > >>   1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >>
> > > > >> diff --git a/lib/librte_ring/rte_ring_elem.h
> > > > >> b/lib/librte_ring/rte_ring_elem.h
> > > > >> index 3976757..663addc 100644
> > > > >> --- a/lib/librte_ring/rte_ring_elem.h
> > > > >> +++ b/lib/librte_ring/rte_ring_elem.h
> > > > >> @@ -160,7 +160,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring
> > *r,
> > > > >> uint32_t prod_head,
> > > > >>   	const uint32_t size = r->size;
> > > > >>   	uint32_t idx = prod_head & r->mask;
> > > > >>   	uint64_t *ring = (uint64_t *)&r[1];
> > > > >> -	const uint64_t *obj = (const uint64_t *)obj_table;
> > > > >> +	const unaligned_uint64_t *obj = (const unaligned_uint64_t
> > > > >> *)obj_table;
> > > > >>   	if (likely(idx + n < size)) {
> > > > >>   		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > > > >>   			ring[idx] = obj[i];
> > > > >> @@ -294,7 +294,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring
> > *r,
> > > > >> uint32_t prod_head,
> > > > >>   	const uint32_t size = r->size;
> > > > >>   	uint32_t idx = prod_head & r->mask;
> > > > >>   	uint64_t *ring = (uint64_t *)&r[1];
> > > > >> -	uint64_t *obj = (uint64_t *)obj_table;
> > > > >> +	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
> > > > >>   	if (likely(idx + n < size)) {
> > > > >>   		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > > > >>   			obj[i] = ring[idx];
> > > > >> --
> > > > >> 2.7.4
> > > > >>
> > > > >
> > > > > References:
> > > > >
> > > >
> > https://git.dpdk.org/dpdk/commit/lib/librte_ring/rte_ring_elem.h?id=3ba
> > > > 51478a3ab3132c33effc8b132641233275b36
> > > > > https://patchwork.dpdk.org/project/dpdk/patch/1583774395-10233-1-
> > git-
> > > > send-email-phil.yang@arm.com/
> > > > >
  
Honnappa Nagarahalli Nov. 13, 2023, 1:53 a.m. UTC | #11
> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Friday, November 10, 2023 4:44 AM
> To: Konstantin Ananyev <konstantin.ananyev@huawei.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> Cc: dev@dpdk.org; david.marchand@redhat.com; olivier.matz@6wind.com;
> Dharmik Jayesh Thakkar <DharmikJayesh.Thakkar@arm.com>; nd
> <nd@arm.com>; andrew.rybchenko@oktetlabs.ru; Gavin Hu
> <Gavin.Hu@arm.com>
> Subject: RE: [dpdk-dev] [PATCH] ring: fix unaligned memory access on aarch32
> 
> > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > Sent: Friday, 10 November 2023 10.45
> >
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: Friday, November 10, 2023 9:34 AM
> > >
> > > +CC Gavin, reviewed the test case
> > >
> > > > From: Ruifeng Wang [mailto:Ruifeng.Wang@arm.com]
> > > > Sent: Friday, 10 November 2023 09.40
> > > >
> > > > On 2023/11/4 8:04 AM, Morten Brørup wrote:
> > > > > I have for a long time now wondered why the ring functions for
> > > > enqueue/dequeue of 64-bit objects supports unaligned addresses,
> > > > and
> > now
> > > > I finally found the patch introducing it.
> > > > >
> > > > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Phil Yang
> > > > >> Sent: Monday, 9 March 2020 18.20
> > > > >>
> > > > >> The 32-bit arm machine doesn't support unaligned memory access.
> > It
> > > > >> will cause a bus error on aarch32 with the custom element size
> > ring.
> > > > >>
> > > > >> Thread 1 "test" received signal SIGBUS, Bus error.
> > > > >> __rte_ring_enqueue_elems_64 (n=1, obj_table=0xf5edfe41,
> > prod_head=0,
> > > > \
> > > > >> r=0xf5edfb80) at /build/dpdk/build/include/rte_ring_elem.h:177
> > > > >> 177                             ring[idx++] = obj[i++];
> > > > >
> > > > > Which test is this? Why is it using an unaligned array of 64-bit
> > > > objects? (Notice that obj_table=0xf5edfe41.)
> > > >
> > > > The test case is:
> > > >
> > https://elixir.bootlin.com/dpdk/latest/source/app/test/test_ring.c#L11
> > 2
> > > > 1
> > > > This case deliberately use unaligned objects.
> > >
> > > Thank you, Ruifeng.
> > >
> > > Honnappa, I see you signed off on the patch introducing the test for
> > unaligned objects:
> > >
> > http://git.dpdk.org/dpdk/commit/app/test/test_ring.c?id=a9fe152363e283
> > d
> > 4c590ab8e8d51396f2ffab9ff
> > >
> > > What was the rationale behind testing support for unaligned object
> > pointers? Did any applications/customers use unaligned object
> > > pointers, or is it a purely academic test case?
> > >
> > > >
> > > > >
> > > > > Nobody in their right mind would use an unaligned array of
> > > > > 64-bit
> > > > objects. You can only create such an array if you force the
> > compiler to
> > > > prevent automatic alignment! And all the functions in your
> > application
> > > > using this array would also need to support unaligned addressing
> > > > of these objects.
> >
> > It could be just one elem, not an array.
> > And the user can use 'packed' struct or so...
> > Agree, not a common case, but probably still possible.
> 
> Very good point, Konstantin. A single unaligned object in a packed structure is
> not as exotic as an unaligned array of objects (which I consider completely
> unrealistic).
The requirement is coming from RCU library, it has a requirement to dequeue 1 element.

> 
> If anyone is using an architecture which doesn't support unaligned accesses, I
> would expect them to completely avoid using unaligned objects. But perhaps
> you are right; however unlikely, it might happen.
> 
> If we think this is a real use case, should we add support for unaligned 32 bit
> objects?
> (128 bit objects already support unaligned access; they are type casted to void
> pointer and accessed using memcpy().)
> 
> And how about the stack library, should it support unaligned objects too?
> 
> >
> > > > >
> > > > > This seems extremely exotic, and not something any real
> > application
> > > > would do!
> > > > >
> > > > > I would like to revert this patch for performance reasons.
> >
> > Morten, could you probably explain first the problems you encountered
> > with this patch?
> > You mention about 'performance reasons', so did you notice any real
> > slowdown?
> 
> Please check my reply to the same question here:
> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9EFD3@
> smartserver.smartshare.dk/
Typically, the performance tests are run and made sure to capture any perf differences. In this case, I am sure, there was not any or much difference. Otherwise, the change would have been highly contested.

> 
> >
> > >
> > > I could add an RTE_ASSERT() to verify that the pointer is aligned,
> > for debugging purposes.
> > >
> > > > >
> > > > >>
> > > > >> Fixes: cc4b218790f6 ("ring: support configurable element size")
> > > > >>
> > > > >> Signed-off-by: Phil Yang <phil.yang@arm.com>
> > > > >> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > >> Reviewed-by: Honnappa Nagarahalli
> > > > >> <honnappa.nagarahalli@arm.com>
> > > > >> ---
> > > > >>   lib/librte_ring/rte_ring_elem.h | 4 ++--
> > > > >>   1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >>
> > > > >> diff --git a/lib/librte_ring/rte_ring_elem.h
> > > > >> b/lib/librte_ring/rte_ring_elem.h index 3976757..663addc 100644
> > > > >> --- a/lib/librte_ring/rte_ring_elem.h
> > > > >> +++ b/lib/librte_ring/rte_ring_elem.h
> > > > >> @@ -160,7 +160,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring
> > *r,
> > > > >> uint32_t prod_head,
> > > > >>   	const uint32_t size = r->size;
> > > > >>   	uint32_t idx = prod_head & r->mask;
> > > > >>   	uint64_t *ring = (uint64_t *)&r[1];
> > > > >> -	const uint64_t *obj = (const uint64_t *)obj_table;
> > > > >> +	const unaligned_uint64_t *obj = (const unaligned_uint64_t
> > > > >> *)obj_table;
> > > > >>   	if (likely(idx + n < size)) {
> > > > >>   		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > > > >>   			ring[idx] = obj[i];
> > > > >> @@ -294,7 +294,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring
> > *r,
> > > > >> uint32_t prod_head,
> > > > >>   	const uint32_t size = r->size;
> > > > >>   	uint32_t idx = prod_head & r->mask;
> > > > >>   	uint64_t *ring = (uint64_t *)&r[1];
> > > > >> -	uint64_t *obj = (uint64_t *)obj_table;
> > > > >> +	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
> > > > >>   	if (likely(idx + n < size)) {
> > > > >>   		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > > > >>   			obj[i] = ring[idx];
> > > > >> --
> > > > >> 2.7.4
> > > > >>
> > > > >
> > > > > References:
> > > > >
> > > >
> > https://git.dpdk.org/dpdk/commit/lib/librte_ring/rte_ring_elem.h?id=3b
> > a
> > > > 51478a3ab3132c33effc8b132641233275b36
> > > > > https://patchwork.dpdk.org/project/dpdk/patch/1583774395-10233-1
> > > > > -
> > git-
> > > > send-email-phil.yang@arm.com/
> > > > >
  
Ruifeng Wang Nov. 13, 2023, 6:39 a.m. UTC | #12
On 2023/11/10 9:18 PM, Morten Brørup wrote:
> Dear Ruifeng,

Hi Morten,

> +CC: all CPU architecture maintainers,
> 
> I'm trying to figure out the requirements for supporting unaligned memory access in DPDK (specifically the ring library), and need your expert feedback.
> 
> The #define RTE_ARCH_STRICT_ALIGN - which is undocumented, but probably means that CPU memory access must be aligned - is only set by "generic_aarch32".
> 
> So we will assume that all other CPU architectures supported by DPDK can access unaligned memory.
> 
> As discussed in this thread, "generic_aarch32" has special requirements for performing 64-bit load/store at unaligned addresses.

Yes, 64-bit load/store at unaligned addresses causes alignment fault.

> 
> Now comes the big question: Can "generic_aarch32" perform 32-bit load/store at unaligned addresses without similar special requirements? Then the ring library already supports unaligned 32-bit objects, and doesn't need to be fixed in this regard.

Yes, 32-bit load/store support unaligned data accesses to normal memory 
(not device memory). This is documented in Arm Architecture Reference 
Manual.

Thanks,
Ruifeng
> 
> 
> Med venlig hilsen / Kind regards,
> -Morten Brørup
> 
> 
>> From: Morten Brørup [mailto:mb@smartsharesystems.com]
>> Sent: Friday, 10 November 2023 11.44
>>
>>> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
>>> Sent: Friday, 10 November 2023 10.45
>>>
>>>> From: Morten Brørup <mb@smartsharesystems.com>
>>>> Sent: Friday, November 10, 2023 9:34 AM
>>>>
>>>> +CC Gavin, reviewed the test case
>>>>
>>>>> From: Ruifeng Wang [mailto:Ruifeng.Wang@arm.com]
>>>>> Sent: Friday, 10 November 2023 09.40
>>>>>
>>>>> On 2023/11/4 8:04 AM, Morten Brørup wrote:
>>>>>> I have for a long time now wondered why the ring functions for
>>>>> enqueue/dequeue of 64-bit objects supports unaligned addresses,
>> and
>>> now
>>>>> I finally found the patch introducing it.
>>>>>>
>>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Phil Yang
>>>>>>> Sent: Monday, 9 March 2020 18.20
>>>>>>>
>>>>>>> The 32-bit arm machine doesn't support unaligned memory
>> access.
>>> It
>>>>>>> will cause a bus error on aarch32 with the custom element size
>>> ring.
>>>>>>>
>>>>>>> Thread 1 "test" received signal SIGBUS, Bus error.
>>>>>>> __rte_ring_enqueue_elems_64 (n=1, obj_table=0xf5edfe41,
>>> prod_head=0,
>>>>> \
>>>>>>> r=0xf5edfb80) at /build/dpdk/build/include/rte_ring_elem.h:177
>>>>>>> 177                             ring[idx++] = obj[i++];
>>>>>>
>>>>>> Which test is this? Why is it using an unaligned array of 64-
>> bit
>>>>> objects? (Notice that obj_table=0xf5edfe41.)
>>>>>
>>>>> The test case is:
>>>>>
>>>
>> https://elixir.bootlin.com/dpdk/latest/source/app/test/test_ring.c#L112
>>>>> 1
>>>>> This case deliberately use unaligned objects.
>>>>
>>>> Thank you, Ruifeng.
>>>>
>>>> Honnappa, I see you signed off on the patch introducing the test
>> for
>>> unaligned objects:
>>>>
>>>
>> http://git.dpdk.org/dpdk/commit/app/test/test_ring.c?id=a9fe152363e283d
>>> 4c590ab8e8d51396f2ffab9ff
>>>>
>>>> What was the rationale behind testing support for unaligned object
>>> pointers? Did any applications/customers use unaligned object
>>>> pointers, or is it a purely academic test case?
>>>>
>>>>>
>>>>>>
>>>>>> Nobody in their right mind would use an unaligned array of 64-
>> bit
>>>>> objects. You can only create such an array if you force the
>>> compiler to
>>>>> prevent automatic alignment! And all the functions in your
>>> application
>>>>> using this array would also need to support unaligned addressing
>> of
>>>>> these objects.
>>>
>>> It could be just one elem, not an array.
>>> And the user can use 'packed' struct or so...
>>> Agree, not a common case, but probably still possible.
>>
>> Very good point, Konstantin. A single unaligned object in a packed
>> structure is not as exotic as an unaligned array of objects (which I
>> consider completely unrealistic).
>>
>> If anyone is using an architecture which doesn't support unaligned
>> accesses, I would expect them to completely avoid using unaligned
>> objects. But perhaps you are right; however unlikely, it might happen.
>>
>> If we think this is a real use case, should we add support for
>> unaligned 32 bit objects?
>> (128 bit objects already support unaligned access; they are type casted
>> to void pointer and accessed using memcpy().)
>>
>> And how about the stack library, should it support unaligned objects
>> too?
>>
>>>
>>>>>>
>>>>>> This seems extremely exotic, and not something any real
>>> application
>>>>> would do!
>>>>>>
>>>>>> I would like to revert this patch for performance reasons.
>>>
>>> Morten, could you probably explain first the problems you encountered
>>> with this patch?
>>> You mention about 'performance reasons', so did you notice any real
>>> slowdown?
>>
>> Please check my reply to the same question here:
>> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9EFD3@smarts
>> erver.smartshare.dk/
>>
>>>
>>>>
>>>> I could add an RTE_ASSERT() to verify that the pointer is aligned,
>>> for debugging purposes.
>>>>
>>>>>>
>>>>>>>
>>>>>>> Fixes: cc4b218790f6 ("ring: support configurable element
>> size")
>>>>>>>
>>>>>>> Signed-off-by: Phil Yang <phil.yang@arm.com>
>>>>>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>>>>>> Reviewed-by: Honnappa Nagarahalli
>> <honnappa.nagarahalli@arm.com>
>>>>>>> ---
>>>>>>>    lib/librte_ring/rte_ring_elem.h | 4 ++--
>>>>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/lib/librte_ring/rte_ring_elem.h
>>>>>>> b/lib/librte_ring/rte_ring_elem.h
>>>>>>> index 3976757..663addc 100644
>>>>>>> --- a/lib/librte_ring/rte_ring_elem.h
>>>>>>> +++ b/lib/librte_ring/rte_ring_elem.h
>>>>>>> @@ -160,7 +160,7 @@ __rte_ring_enqueue_elems_64(struct
>> rte_ring
>>> *r,
>>>>>>> uint32_t prod_head,
>>>>>>>    	const uint32_t size = r->size;
>>>>>>>    	uint32_t idx = prod_head & r->mask;
>>>>>>>    	uint64_t *ring = (uint64_t *)&r[1];
>>>>>>> -	const uint64_t *obj = (const uint64_t *)obj_table;
>>>>>>> +	const unaligned_uint64_t *obj = (const unaligned_uint64_t
>>>>>>> *)obj_table;
>>>>>>>    	if (likely(idx + n < size)) {
>>>>>>>    		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
>>>>>>>    			ring[idx] = obj[i];
>>>>>>> @@ -294,7 +294,7 @@ __rte_ring_dequeue_elems_64(struct
>> rte_ring
>>> *r,
>>>>>>> uint32_t prod_head,
>>>>>>>    	const uint32_t size = r->size;
>>>>>>>    	uint32_t idx = prod_head & r->mask;
>>>>>>>    	uint64_t *ring = (uint64_t *)&r[1];
>>>>>>> -	uint64_t *obj = (uint64_t *)obj_table;
>>>>>>> +	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
>>>>>>>    	if (likely(idx + n < size)) {
>>>>>>>    		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
>>>>>>>    			obj[i] = ring[idx];
>>>>>>> --
>>>>>>> 2.7.4
>>>>>>>
>>>>>>
>>>>>> References:
>>>>>>
>>>>>
>>>
>> https://git.dpdk.org/dpdk/commit/lib/librte_ring/rte_ring_elem.h?id=3ba
>>>>> 51478a3ab3132c33effc8b132641233275b36
>>>>>> https://patchwork.dpdk.org/project/dpdk/patch/1583774395-10233-
>> 1-
>>> git-
>>>>> send-email-phil.yang@arm.com/
>>>>>>
  

Patch

diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 3976757..663addc 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -160,7 +160,7 @@  __rte_ring_enqueue_elems_64(struct rte_ring *r, uint32_t prod_head,
 	const uint32_t size = r->size;
 	uint32_t idx = prod_head & r->mask;
 	uint64_t *ring = (uint64_t *)&r[1];
-	const uint64_t *obj = (const uint64_t *)obj_table;
+	const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
 	if (likely(idx + n < size)) {
 		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
 			ring[idx] = obj[i];
@@ -294,7 +294,7 @@  __rte_ring_dequeue_elems_64(struct rte_ring *r, uint32_t prod_head,
 	const uint32_t size = r->size;
 	uint32_t idx = prod_head & r->mask;
 	uint64_t *ring = (uint64_t *)&r[1];
-	uint64_t *obj = (uint64_t *)obj_table;
+	unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
 	if (likely(idx + n < size)) {
 		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
 			obj[i] = ring[idx];