[dpdk-dev] Running DPDK as an unprivileged user
Sergio Gonzalez Monroy
sergio.gonzalez.monroy at intel.com
Thu Jan 5 11:09:14 CET 2017
On 04/01/2017 21:34, Walker, Benjamin wrote:
> On Wed, 2017-01-04 at 19:39 +0800, Tan, Jianfeng wrote:
>> Hi Benjamin,
>> On 12/30/2016 4:41 AM, Walker, Benjamin wrote:
>>> DPDK today begins by allocating all of the required
>>> hugepages, then finds all of the physical addresses for
>>> those hugepages using /proc/self/pagemap, sorts the
>>> hugepages by physical address, then remaps the pages to
>>> contiguous virtual addresses. Later on and if vfio is
>>> enabled, it asks vfio to pin the hugepages and to set their
>>> DMA addresses in the IOMMU to be the physical addresses
>>> discovered earlier. Of course, running as an unprivileged
>>> user means all of the physical addresses in
>>> /proc/self/pagemap are just 0, so this doesn't end up
>>> working. Further, there is no real reason to choose the
>>> physical address as the DMA address in the IOMMU - it would
>>> be better to just count up starting at 0.
>> Why not just using virtual address as the DMA address in this case to
>> avoid maintaining another kind of addresses?
> That's a valid choice, although I'm just storing the DMA address in the
> physical address field that already exists. You either have a physical
> address or a DMA address and never both.
>>> Also, because the
>>> pages are pinned after the virtual to physical mapping is
>>> looked up, there is a window where a page could be moved.
>>> Hugepage mappings can be moved on more recent kernels (at
>>> least 4.x), and the reliability of hugepages having static
>>> mappings decreases with every kernel release.
>> Do you mean kernel might take back a physical page after mapping it to a
>> virtual page (maybe copy the data to another physical page)? Could you
>> please show some links or kernel commits?
> Yes - the kernel can move a physical page to another physical page
> and change the virtual mapping at any time. For a concise example
> see 'man migrate_pages(2)', or for a more serious example the code
> that performs memory page compaction in the kernel which was
> recently extended to support hugepages.
> Before we go down the path of me proving that the mapping isn't static,
> let me turn that line of thinking around. Do you have any documentation
> demonstrating that the mapping is static? It's not static for 4k pages, so
> why are we assuming that it is static for 2MB pages? I understand that
> it happened to be static for some versions of the kernel, but my understanding
> is that this was purely by coincidence and never by intention.
It looks to me as if you are talking about Transparent hugepages, and
not hugetlbfs managed hugepages (DPDK usecase).
AFAIK memory (hugepages) managed by hugetlbfs is not compacted and/or
moved, they are not part of the kernel memory management.
So again, do you have some references to code/articles where this
"dynamic" behavior of hugepages managed by hugetlbfs is mentioned?
>>> Note that this
>>> probably means that using uio on recent kernels is subtly
>>> broken and cannot be supported going forward because there
>>> is no uio mechanism to pin the memory.
>>> The first open question I have is whether DPDK should allow
>>> uio at all on recent (4.x) kernels. My current understanding
>>> is that there is no way to pin memory and hugepages can now
>>> be moved around, so uio would be unsafe. What does the
>>> community think here?
>>> My second question is whether the user should be allowed to
>>> mix uio and vfio usage simultaneously. For vfio, the
>>> physical addresses are really DMA addresses and are best
>>> when arbitrarily chosen to appear sequential relative to
>>> their virtual addresses.
>> Why "sequential relative to their virtual addresses"? IOMMU table is for
>> DMA addr -> physical addr mapping. So we need to DMA addresses
>> "sequential relative to their physical addresses"? Based on your above
>> analysis on how hugepages are initialized, virtual addresses is a good
>> candidate for DMA address?
> The code already goes through a separate organizational step on all of
> the pages that remaps the virtual addresses such that they're sequential
> relative to the physical backing pages, so this mostly ends up as the same
> Choosing to use the virtual address is a totally valid choice, but I worry it
> may lead to confusion during debugging or in a multi-process scenario.
> I'm open to making this choice instead of starting from zero, though.
More information about the dev