[PATCH v2] dmadev: fix structure alignment
fengchengwen
fengchengwen at huawei.com
Wed Mar 20 05:11:42 CET 2024
Hi Wenwu,
On 2024/3/15 17:27, Ma, WenwuX wrote:
> Hi Chengwen
>
>> -----Original Message-----
>> From: fengchengwen <fengchengwen at huawei.com>
>> Sent: Friday, March 15, 2024 4:32 PM
>> To: Ma, WenwuX <wenwux.ma at intel.com>; dev at dpdk.org
>> Cc: Jiale, SongX <songx.jiale at intel.com>; stable at dpdk.org
>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>>
>> Hi Wenwu,
>>
>> On 2024/3/15 15:44, Ma, WenwuX wrote:
>>> Hi Chengwen,
>>>
>>>> -----Original Message-----
>>>> From: Ma, WenwuX
>>>> Sent: Friday, March 15, 2024 2:26 PM
>>>> To: fengchengwen <fengchengwen at huawei.com>; dev at dpdk.org
>>>> Cc: Jiale, SongX <songx.jiale at intel.com>; stable at dpdk.org
>>>> Subject: RE: [PATCH v2] dmadev: fix structure alignment
>>>>
>>>> Hi Chengwen,
>>>>
>>>>> -----Original Message-----
>>>>> From: fengchengwen <fengchengwen at huawei.com>
>>>>> Sent: Friday, March 15, 2024 2:06 PM
>>>>> To: Ma, WenwuX <wenwux.ma at intel.com>; dev at dpdk.org
>>>>> Cc: Jiale, SongX <songx.jiale at intel.com>; stable at dpdk.org
>>>>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>>>>>
>>>>> Hi Wenwu,
>>>>>
>>>>> On 2024/3/15 9:43, Wenwu Ma wrote:
>>>>>> The structure rte_dma_dev needs only 8 byte alignment.
>>>>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
>>>>>> __rte_aligned(8).
>>>>>>
>>>>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
>>>>>> Cc: stable at dpdk.org
>>>>>>
>>>>>> Signed-off-by: Wenwu Ma <wenwux.ma at intel.com>
>>>>>> ---
>>>>>> v2:
>>>>>> - Because of performance drop, adjust the code to
>>>>>> no longer demand cache line alignment
>>>>>
>>>>> Which two versions observed performance drop? And which benchmark
>>>>> observed drop?
>>>>> Could you provide more information?
>>>>>
>>>>>>
>>>> V1 patch:
>>>>
>> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
>>>> 1-wenwux.ma at intel.com/
>>>>
>>>> To view detailed results, visit:
>>>> https://lab.dpdk.org/results/dashboard/patchsets/29472/
>>>>
>>>>>> ---
>>>>>> lib/dmadev/rte_dmadev_pmd.h | 2 +-
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
>>>>> b/lib/dmadev/rte_dmadev_pmd.h
>>>>>> index 58729088ff..b569bb3502 100644
>>>>>> --- a/lib/dmadev/rte_dmadev_pmd.h
>>>>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
>>>>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
>>>>>> * @internal
>>>>>> * The generic data structure associated with each DMA device.
>>>>>> */
>>>>>> -struct __rte_cache_aligned rte_dma_dev {
>>>>>> +struct __rte_aligned(8) rte_dma_dev {
>>>>>
>>>>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which
>>>>> is not rte_dma_dev? So why is it a problem here?
>>>>>
>>>>> Thanks
>>>>>
>>>> The DMA device object is expected to align cache line, so clang will
>>>> use “vmovaps” assembly instruction,
>>>>
>>>> And the instruction demands 16 bytes alignment or will cause segment
>>>> fault in some environments.
>>>>
>>> Test case:
>>> 1. compile dpdk
>>> rm -rf x86_64-native-linuxapp-clang
>>> CC=clang meson -Denable_kmods=True -Dlibdir=lib
>>> --default-library=static x86_64-native-linuxapp-clang ninja -C
>>> x86_64-native-linuxapp-clang -j 72 2. start dpdk-test
>>> /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39
>>> --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note:
>>> If it cannot be reproduced, please try using a different core)
>>> 3. exit dpdk-test
>>> RTE>>quit
>>> Segmentation fault (core dumped)
I reproduce it just with --vdev=dma_skeleton.
When execute quit command, it will invoke rte_dma_close->dma_release, pls see my annotations (//) below:
void
dma_release(struct rte_dma_dev *dev)
{
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
rte_free(dev->data->dev_private);
memset(dev->data, 0, sizeof(struct rte_dma_dev_data));
}
dma_fp_object_dummy(dev->fp_obj);
memset(dev, 0, sizeof(struct rte_dma_dev)); // this memset was compiles using vmovaps, its
// 8c24da: c5 f8 57 c0 vxorps %xmm0,%xmm0,%xmm0
// 8c24de: c5 fc 29 43 20 vmovaps %ymm0,0x20(%rbx)
// 8c24e3: c5 fc 29 03 vmovaps %ymm0,(%rbx)
// but the dev is not align 16B (in my env the rte_dma_devices addr is 0x15d39950)
}
>>
>> I will try to reproduce, but still a question: does above test has already merged
>> your patch [1] or the current main branch code has this problem?
>>
>> [1]
>> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
>> 1-wenwux.ma at intel.com/
>>
>> Thanks
>>
> the current main branch code has this problem.
>
> Both patch v1 and v2 are able to solve this problem, but v1 has a performance issue.
The performance issue is ethdev benchmark, it will not invoke any dmadev API, I don't think these two has any relations.
So I prefer v1, Plus Pavan also submit a commit [1] to align the struct, but it was not a fix for clang-x86-platform.
[1] https://lore.kernel.org/all/20240210062758.1510-1-pbhagavatula@marvell.com/T/
>
>>>
>>>>
>>>>>> /** Device info which supplied during device initialization. */
>>>>>> struct rte_device *device;
>>>>>> struct rte_dma_dev_data *data; /**< Pointer to shared device data.
>>>>>> */
>>>>>>
What more, could you please send v3? I hope it will contain the root cause and optional solutions of the segment fault problem.
BTW: dmadev is the first one which dynamic alloc dmadev struct, later maybe more xxxdev will use this type, I think that's typical.
Maybe we should add a such mem_align() function in eal library, but this could done later.
Thanks
More information about the stable
mailing list