Bug 786 - dynamic memory model may cause potential DMA silent error
Summary: dynamic memory model may cause potential DMA silent error
Status: UNCONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: core (show other bugs)
Version: unspecified
Hardware: All All
: Normal normal
Target Milestone: ---
Assignee: Anatoly Burakov
URL:
Depends on:
Blocks:
 
Reported: 2021-08-10 10:00 CEST by Changpeng Liu
Modified: 2022-05-18 11:39 CEST (History)
1 user (show)



Attachments

Description Changpeng Liu 2021-08-10 10:00:58 CEST
We found that in some very rare situations the vfio dynamic memory model has an issue which may result the DMA engine doesn't put the data to the right IO buffer, here is the tests we do to identify the issue:

1. Start the application and call rte_zmalloc to allocate IO buffers.
Hotplug one NVMe drive, then DPDK will register existing memory region to kernel vfio driver via dma_map ioctl, we added one trace before this ioctl:
DPDK dma_map vaddr: 0x200000200000, iova: 0x200000200000, size: 0x14200000, ret: 0

2. Then we call rte_free to free some memory buffers, and DPDK will call dma_unmap to vfio driver and release related huge files:
DPDK dma_unmap iova: 0x20000a400000, size: 0x0, ret: 0

Here we saw that the return value is 0, which means success, but the unmap size is 0, the kernel vfio driver didn't do the real unmap action, because the IOVA range isn't same with the previous map one. The new DPDK version will print an error for this case now.

3. Then we call rte_zmalloc again, DPDK will create new huge files and remap to the previous virtual address, and then call dma_map to register to kernel vfio driver:

DPDK dma_map vaddr: 0x20000a400000, iova: 0x20000a400000, size: 0x400000, ret=-1, errno was set to EEXIST

but DPDK will ignore this errno, so rte_zmalloc will return success.

Then if the new malloced memory was used as NVMe IO buffer, the DMA engine may move data to the previous pinned pages, because the kernel vfio driver didn't update the memory map, but all the IO stack will not print any warning log.

We can use static memory model as a workaround.
Comment 1 Ajit Khaparde 2021-08-10 22:41:36 CEST
Anatoly, Can you please take a look? Thanks
Comment 2 Anatoly Burakov 2021-08-12 11:27:36 CEST
My best guess is that it's related to partial unmaps with VFIO. I'll have a look at the code and see if there's anything that i can spot, because i was under the impression that EAL now maps things page-by-page when using internal memory, and thus partial unmaps should work.
Comment 3 Changpeng Liu 2021-08-12 11:41:05 CEST
(In reply to Anatoly Burakov from comment #2)
> My best guess is that it's related to partial unmaps with VFIO. I'll have a
> look at the code and see if there's anything that i can spot, because i was
> under the impression that EAL now maps things page-by-page when using
> internal memory, and thus partial unmaps should work.

Yes, it's related with partial unmaps. For dynamic memory model, partial unmaps is a normal action, it's better not to print an error log, now I can see "EAL: Unexpected size 0 of DMA remapping cleared instead of 2097152" in DPDK.

Another question is that, for IOVA=VA mode, does DPDK reuse the old vaddr with a new huge page file and not to do a new dma map with vfio?
Comment 4 Changpeng Liu 2022-05-17 04:19:50 CEST
(In reply to Anatoly Burakov from comment #2)
> My best guess is that it's related to partial unmaps with VFIO. I'll have a
> look at the code and see if there's anything that i can spot, because i was
> under the impression that EAL now maps things page-by-page when using
> internal memory, and thus partial unmaps should work.

Hi Anatoly,

Any progress on this issue ?

VFIO doesn't work for partial unmaps, it will return error for partial unmaps, and in `vfio_mem_event_callback`, DPDK doesn't process the error return.
Comment 5 Anatoly Burakov 2022-05-17 12:02:17 CEST
Didn't we fix this issue? I seem to remember code changes addressing inability to perform partial unmaps.

I think commit 56259f7fc0104ca73f776b76cfd056ccf0470e4c addressed this issue. Could you please double-check?
Comment 6 Changpeng Liu 2022-05-17 15:55:39 CEST
(In reply to Anatoly Burakov from comment #5)
> Didn't we fix this issue? I seem to remember code changes addressing
> inability to perform partial unmaps.
> 
> I think commit 56259f7fc0104ca73f776b76cfd056ccf0470e4c addressed this
> issue. Could you please double-check?

No, I'm using DPDK 22.03 which already include the commit, the issue still exist.
I think you guys may ignore one usage scenario, for us in SPDK, we started the application without any PCI devices at the beginning, then hotplug the device after start up, so when PCI device is pluged, DPDK will call DMA map with large memory region, when releasing memory, partial unmap can still happen.
Comment 7 Anatoly Burakov 2022-05-17 16:07:10 CEST
Okay, so please tell me if I understand your usage scenario correctly.

1) We start with no PCI devices, but we have some amount of memory allocated
2) We then hotplug a PCI device, and the entire allocated memory is mapped for DMA at that point
3) We then deallocate part of the memory

At that point, since we have mapped the entire memory for DMA, we have mapped it in such a way as to map entire chunks of memory for DMA, and thus can't partially unmap that deallocated memory.

If the above is correct, then this would be extremely difficult to solve without making all mapping page-by-page. I will have to think about how to address this in a way that makes it work by default.

However, I also highly suspect that this use case should be worked around by `--match-allocations` flag. Can you please check and see if this addresses your problem?
Comment 8 Changpeng Liu 2022-05-17 16:22:09 CEST
(In reply to Anatoly Burakov from comment #7)
> Okay, so please tell me if I understand your usage scenario correctly.
> 
> 1) We start with no PCI devices, but we have some amount of memory allocated
> 2) We then hotplug a PCI device, and the entire allocated memory is mapped
> for DMA at that point
> 3) We then deallocate part of the memory
> 
> At that point, since we have mapped the entire memory for DMA, we have
> mapped it in such a way as to map entire chunks of memory for DMA, and thus
> can't partially unmap that deallocated memory.
> 
> If the above is correct, then this would be extremely difficult to solve
> without making all mapping page-by-page. I will have to think about how to
> address this in a way that makes it work by default.
> 
> However, I also highly suspect that this use case should be worked around by
> `--match-allocations` flag. Can you please check and see if this addresses
> your problem?

Correct, that's the usage case in SPDK, it's a quite common usage scenario in SPDK.
We already enabled `--match-allocations` by default, this option doesn't fix the issue from my tests.  Currently we can use IOVA=PA to workaround this issue, it happen only with IOVA=VA mode.
Comment 9 Anatoly Burakov 2022-05-17 16:25:10 CEST
Thanks, that helps me understand the problem better (although i'm not quite sure why `--match-allocations` doesn't help). I'll make some time to dig into this.
Comment 10 Anatoly Burakov 2022-05-18 11:39:15 CEST
I think I know why this is happening. For IOVA as VA, whenever we initialize a PCI device, we map the entire contiguous segment regardless of whether it's been broken up by match allocations. Maybe if we can do mappings along the matched-allocations boundaries, this would fix the issue.

Obviously, this wouldn't affect non-matched-allocations use case, so there still would be an issue there. I'll have to think about how to fix this for everything.

Note You need to log in before you can comment on or make changes to this bug.