[PATCH v2] dmadev: fix structure alignment

fengchengwen fengchengwen at huawei.com
Wed Mar 20 05:11:42 CET 2024


Hi Wenwu,

On 2024/3/15 17:27, Ma, WenwuX wrote:
> Hi Chengwen
> 
>> -----Original Message-----
>> From: fengchengwen <fengchengwen at huawei.com>
>> Sent: Friday, March 15, 2024 4:32 PM
>> To: Ma, WenwuX <wenwux.ma at intel.com>; dev at dpdk.org
>> Cc: Jiale, SongX <songx.jiale at intel.com>; stable at dpdk.org
>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>>
>> Hi Wenwu,
>>
>> On 2024/3/15 15:44, Ma, WenwuX wrote:
>>> Hi Chengwen,
>>>
>>>> -----Original Message-----
>>>> From: Ma, WenwuX
>>>> Sent: Friday, March 15, 2024 2:26 PM
>>>> To: fengchengwen <fengchengwen at huawei.com>; dev at dpdk.org
>>>> Cc: Jiale, SongX <songx.jiale at intel.com>; stable at dpdk.org
>>>> Subject: RE: [PATCH v2] dmadev: fix structure alignment
>>>>
>>>> Hi Chengwen,
>>>>
>>>>> -----Original Message-----
>>>>> From: fengchengwen <fengchengwen at huawei.com>
>>>>> Sent: Friday, March 15, 2024 2:06 PM
>>>>> To: Ma, WenwuX <wenwux.ma at intel.com>; dev at dpdk.org
>>>>> Cc: Jiale, SongX <songx.jiale at intel.com>; stable at dpdk.org
>>>>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>>>>>
>>>>> Hi Wenwu,
>>>>>
>>>>> On 2024/3/15 9:43, Wenwu Ma wrote:
>>>>>> The structure rte_dma_dev needs only 8 byte alignment.
>>>>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
>>>>>> __rte_aligned(8).
>>>>>>
>>>>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
>>>>>> Cc: stable at dpdk.org
>>>>>>
>>>>>> Signed-off-by: Wenwu Ma <wenwux.ma at intel.com>
>>>>>> ---
>>>>>> v2:
>>>>>>  - Because of performance drop, adjust the code to
>>>>>>    no longer demand cache line alignment
>>>>>
>>>>> Which two versions observed performance drop? And which benchmark
>>>>> observed drop?
>>>>> Could you provide more information?
>>>>>
>>>>>>
>>>> V1 patch:
>>>>
>> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
>>>> 1-wenwux.ma at intel.com/
>>>>
>>>> To view detailed results, visit:
>>>> https://lab.dpdk.org/results/dashboard/patchsets/29472/
>>>>
>>>>>> ---
>>>>>>  lib/dmadev/rte_dmadev_pmd.h | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
>>>>> b/lib/dmadev/rte_dmadev_pmd.h
>>>>>> index 58729088ff..b569bb3502 100644
>>>>>> --- a/lib/dmadev/rte_dmadev_pmd.h
>>>>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
>>>>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
>>>>>>   * @internal
>>>>>>   * The generic data structure associated with each DMA device.
>>>>>>   */
>>>>>> -struct __rte_cache_aligned rte_dma_dev {
>>>>>> +struct __rte_aligned(8) rte_dma_dev {
>>>>>
>>>>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which
>>>>> is not rte_dma_dev? So why is it a problem here?
>>>>>
>>>>> Thanks
>>>>>
>>>> The DMA device object is expected to align cache line, so clang will
>>>> use “vmovaps” assembly instruction,
>>>>
>>>> And the instruction demands 16 bytes alignment or will cause segment
>>>> fault in some environments.
>>>>
>>> Test case:
>>> 1. compile dpdk
>>> rm -rf x86_64-native-linuxapp-clang
>>> CC=clang meson -Denable_kmods=True -Dlibdir=lib
>>> --default-library=static x86_64-native-linuxapp-clang ninja -C
>>> x86_64-native-linuxapp-clang -j 72 2. start dpdk-test
>>> /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39
>>> --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note:
>>> If it cannot be reproduced, please try using a different core)
>>> 3. exit dpdk-test
>>> RTE>>quit
>>> Segmentation fault (core dumped)

I reproduce it just with --vdev=dma_skeleton.
When execute quit command, it will invoke rte_dma_close->dma_release, pls see my annotations (//) below:

void
dma_release(struct rte_dma_dev *dev)
{
	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
		rte_free(dev->data->dev_private);
		memset(dev->data, 0, sizeof(struct rte_dma_dev_data));
	}

	dma_fp_object_dummy(dev->fp_obj);
	memset(dev, 0, sizeof(struct rte_dma_dev));   // this memset was compiles using vmovaps, its
						//  8c24da:       c5 f8 57 c0             vxorps %xmm0,%xmm0,%xmm0
						//  8c24de:       c5 fc 29 43 20          vmovaps %ymm0,0x20(%rbx)
						//  8c24e3:       c5 fc 29 03             vmovaps %ymm0,(%rbx)
						// but the dev is not align 16B (in my env the rte_dma_devices addr is 0x15d39950)
}

>>
>> I will try to reproduce, but still a question: does above test has already merged
>> your patch [1] or the current main branch code has this problem?
>>
>> [1]
>> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
>> 1-wenwux.ma at intel.com/
>>
>> Thanks
>>
> the current main branch code has this problem.
> 
> Both patch v1 and v2 are able to solve this problem, but v1 has a performance issue.

The performance issue is ethdev benchmark, it will not invoke any dmadev API, I don't think these two has any relations.

So I prefer v1, Plus Pavan also submit a commit [1] to align the struct, but it was not a fix for clang-x86-platform.

[1] https://lore.kernel.org/all/20240210062758.1510-1-pbhagavatula@marvell.com/T/

> 
>>>
>>>>
>>>>>>  	/** Device info which supplied during device initialization. */
>>>>>>  	struct rte_device *device;
>>>>>>  	struct rte_dma_dev_data *data; /**< Pointer to shared device data.
>>>>>> */
>>>>>>

What more, could you please send v3? I hope it will contain the root cause and optional solutions of the segment fault problem.

BTW: dmadev is the first one which dynamic alloc dmadev struct, later maybe more xxxdev will use this type, I think that's typical.
     Maybe we should add a such mem_align() function in eal library, but this could done later.

Thanks



More information about the stable mailing list