[dpdk-dev] Running DPDK as an unprivileged user

Sergio Gonzalez Monroy sergio.gonzalez.monroy at intel.com
Thu Jan 5 11:16:53 CET 2017

On 05/01/2017 10:09, Sergio Gonzalez Monroy wrote:
> On 04/01/2017 21:34, Walker, Benjamin wrote:
>> On Wed, 2017-01-04 at 19:39 +0800, Tan, Jianfeng wrote:
>>> Hi Benjamin,
>>> On 12/30/2016 4:41 AM, Walker, Benjamin wrote:
>>>> DPDK today begins by allocating all of the required
>>>> hugepages, then finds all of the physical addresses for
>>>> those hugepages using /proc/self/pagemap, sorts the
>>>> hugepages by physical address, then remaps the pages to
>>>> contiguous virtual addresses. Later on and if vfio is
>>>> enabled, it asks vfio to pin the hugepages and to set their
>>>> DMA addresses in the IOMMU to be the physical addresses
>>>> discovered earlier. Of course, running as an unprivileged
>>>> user means all of the physical addresses in
>>>> /proc/self/pagemap are just 0, so this doesn't end up
>>>> working. Further, there is no real reason to choose the
>>>> physical address as the DMA address in the IOMMU - it would
>>>> be better to just count up starting at 0.
>>> Why not just using virtual address as the DMA address in this case to
>>> avoid maintaining another kind of addresses?
>> That's a valid choice, although I'm just storing the DMA address in the
>> physical address field that already exists. You either have a physical
>> address or a DMA address and never both.
>>>>    Also, because the
>>>> pages are pinned after the virtual to physical mapping is
>>>> looked up, there is a window where a page could be moved.
>>>> Hugepage mappings can be moved on more recent kernels (at
>>>> least 4.x), and the reliability of hugepages having static
>>>> mappings decreases with every kernel release.
>>> Do you mean kernel might take back a physical page after mapping it 
>>> to a
>>> virtual page (maybe copy the data to another physical page)? Could you
>>> please show some links or kernel commits?
>> Yes - the kernel can move a physical page to another physical page
>> and change the virtual mapping at any time. For a concise example
>> see 'man migrate_pages(2)', or for a more serious example the code
>> that performs memory page compaction in the kernel which was
>> recently extended to support hugepages.
>> Before we go down the path of me proving that the mapping isn't static,
>> let me turn that line of thinking around. Do you have any documentation
>> demonstrating that the mapping is static? It's not static for 4k 
>> pages, so
>> why are we assuming that it is static for 2MB pages? I understand that
>> it happened to be static for some versions of the kernel, but my 
>> understanding
>> is that this was purely by coincidence and never by intention.
> It looks to me as if you are talking about Transparent hugepages, and 
> not hugetlbfs managed hugepages (DPDK usecase).
> AFAIK memory (hugepages) managed by hugetlbfs is not compacted and/or 
> moved, they are not part of the kernel memory management.

Please forgive my loose/poor use of words here when saying that "they 
are not part of the kernel memory management", I mean to say that
they are not part of the kernel memory management process you were 
mentioning, ie. compacting, moving, etc.


> So again, do you have some references to code/articles where this 
> "dynamic" behavior of hugepages managed by hugetlbfs is mentioned?
> Sergio
>>>> Note that this
>>>> probably means that using uio on recent kernels is subtly
>>>> broken and cannot be supported going forward because there
>>>> is no uio mechanism to pin the memory.
>>>> The first open question I have is whether DPDK should allow
>>>> uio at all on recent (4.x) kernels. My current understanding
>>>> is that there is no way to pin memory and hugepages can now
>>>> be moved around, so uio would be unsafe. What does the
>>>> community think here?
>>>> My second question is whether the user should be allowed to
>>>> mix uio and vfio usage simultaneously. For vfio, the
>>>> physical addresses are really DMA addresses and are best
>>>> when arbitrarily chosen to appear sequential relative to
>>>> their virtual addresses.
>>> Why "sequential relative to their virtual addresses"? IOMMU table is 
>>> for
>>> DMA addr -> physical addr mapping. So we need to DMA addresses
>>> "sequential relative to their physical addresses"? Based on your above
>>> analysis on how hugepages are initialized, virtual addresses is a good
>>> candidate for DMA address?
>> The code already goes through a separate organizational step on all of
>> the pages that remaps the virtual addresses such that they're sequential
>> relative to the physical backing pages, so this mostly ends up as the 
>> same
>> thing.
>> Choosing to use the virtual address is a totally valid choice, but I 
>> worry it
>> may lead to confusion during debugging or in a multi-process scenario.
>> I'm open to making this choice instead of starting from zero, though.
>>> Thanks,
>>> Jianfeng

More information about the dev mailing list