Bug 786
Summary: | dynamic memory model may cause potential DMA silent error | ||
---|---|---|---|
Product: | DPDK | Reporter: | Changpeng Liu (changpeng.liu) |
Component: | core | Assignee: | Anatoly Burakov (anatoly.burakov) |
Status: | UNCONFIRMED --- | ||
Severity: | normal | CC: | ajit.khaparde |
Priority: | Normal | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Hardware: | All | ||
OS: | All |
Description
Changpeng Liu
2021-08-10 10:00:58 CEST
Anatoly, Can you please take a look? Thanks My best guess is that it's related to partial unmaps with VFIO. I'll have a look at the code and see if there's anything that i can spot, because i was under the impression that EAL now maps things page-by-page when using internal memory, and thus partial unmaps should work. (In reply to Anatoly Burakov from comment #2) > My best guess is that it's related to partial unmaps with VFIO. I'll have a > look at the code and see if there's anything that i can spot, because i was > under the impression that EAL now maps things page-by-page when using > internal memory, and thus partial unmaps should work. Yes, it's related with partial unmaps. For dynamic memory model, partial unmaps is a normal action, it's better not to print an error log, now I can see "EAL: Unexpected size 0 of DMA remapping cleared instead of 2097152" in DPDK. Another question is that, for IOVA=VA mode, does DPDK reuse the old vaddr with a new huge page file and not to do a new dma map with vfio? (In reply to Anatoly Burakov from comment #2) > My best guess is that it's related to partial unmaps with VFIO. I'll have a > look at the code and see if there's anything that i can spot, because i was > under the impression that EAL now maps things page-by-page when using > internal memory, and thus partial unmaps should work. Hi Anatoly, Any progress on this issue ? VFIO doesn't work for partial unmaps, it will return error for partial unmaps, and in `vfio_mem_event_callback`, DPDK doesn't process the error return. Didn't we fix this issue? I seem to remember code changes addressing inability to perform partial unmaps. I think commit 56259f7fc0104ca73f776b76cfd056ccf0470e4c addressed this issue. Could you please double-check? (In reply to Anatoly Burakov from comment #5) > Didn't we fix this issue? I seem to remember code changes addressing > inability to perform partial unmaps. > > I think commit 56259f7fc0104ca73f776b76cfd056ccf0470e4c addressed this > issue. Could you please double-check? No, I'm using DPDK 22.03 which already include the commit, the issue still exist. I think you guys may ignore one usage scenario, for us in SPDK, we started the application without any PCI devices at the beginning, then hotplug the device after start up, so when PCI device is pluged, DPDK will call DMA map with large memory region, when releasing memory, partial unmap can still happen. Okay, so please tell me if I understand your usage scenario correctly. 1) We start with no PCI devices, but we have some amount of memory allocated 2) We then hotplug a PCI device, and the entire allocated memory is mapped for DMA at that point 3) We then deallocate part of the memory At that point, since we have mapped the entire memory for DMA, we have mapped it in such a way as to map entire chunks of memory for DMA, and thus can't partially unmap that deallocated memory. If the above is correct, then this would be extremely difficult to solve without making all mapping page-by-page. I will have to think about how to address this in a way that makes it work by default. However, I also highly suspect that this use case should be worked around by `--match-allocations` flag. Can you please check and see if this addresses your problem? (In reply to Anatoly Burakov from comment #7) > Okay, so please tell me if I understand your usage scenario correctly. > > 1) We start with no PCI devices, but we have some amount of memory allocated > 2) We then hotplug a PCI device, and the entire allocated memory is mapped > for DMA at that point > 3) We then deallocate part of the memory > > At that point, since we have mapped the entire memory for DMA, we have > mapped it in such a way as to map entire chunks of memory for DMA, and thus > can't partially unmap that deallocated memory. > > If the above is correct, then this would be extremely difficult to solve > without making all mapping page-by-page. I will have to think about how to > address this in a way that makes it work by default. > > However, I also highly suspect that this use case should be worked around by > `--match-allocations` flag. Can you please check and see if this addresses > your problem? Correct, that's the usage case in SPDK, it's a quite common usage scenario in SPDK. We already enabled `--match-allocations` by default, this option doesn't fix the issue from my tests. Currently we can use IOVA=PA to workaround this issue, it happen only with IOVA=VA mode. Thanks, that helps me understand the problem better (although i'm not quite sure why `--match-allocations` doesn't help). I'll make some time to dig into this. I think I know why this is happening. For IOVA as VA, whenever we initialize a PCI device, we map the entire contiguous segment regardless of whether it's been broken up by match allocations. Maybe if we can do mappings along the matched-allocations boundaries, this would fix the issue. Obviously, this wouldn't affect non-matched-allocations use case, so there still would be an issue there. I'll have to think about how to fix this for everything. |