[dpdk-dev] Running DPDK as an unprivileged user
jianfeng.tan at intel.com
Thu Jan 5 15:58:22 CET 2017
On 1/5/2017 6:16 PM, Sergio Gonzalez Monroy wrote:
> On 05/01/2017 10:09, Sergio Gonzalez Monroy wrote:
>> On 04/01/2017 21:34, Walker, Benjamin wrote:
>>> On Wed, 2017-01-04 at 19:39 +0800, Tan, Jianfeng wrote:
>>>> Hi Benjamin,
>>>> On 12/30/2016 4:41 AM, Walker, Benjamin wrote:
>>>>> DPDK today begins by allocating all of the required
>>>>> hugepages, then finds all of the physical addresses for
>>>>> those hugepages using /proc/self/pagemap, sorts the
>>>>> hugepages by physical address, then remaps the pages to
>>>>> contiguous virtual addresses. Later on and if vfio is
>>>>> enabled, it asks vfio to pin the hugepages and to set their
>>>>> DMA addresses in the IOMMU to be the physical addresses
>>>>> discovered earlier. Of course, running as an unprivileged
>>>>> user means all of the physical addresses in
>>>>> /proc/self/pagemap are just 0, so this doesn't end up
>>>>> working. Further, there is no real reason to choose the
>>>>> physical address as the DMA address in the IOMMU - it would
>>>>> be better to just count up starting at 0.
>>>> Why not just using virtual address as the DMA address in this case to
>>>> avoid maintaining another kind of addresses?
>>> That's a valid choice, although I'm just storing the DMA address in the
>>> physical address field that already exists. You either have a physical
>>> address or a DMA address and never both.
>>>>> Also, because the
>>>>> pages are pinned after the virtual to physical mapping is
>>>>> looked up, there is a window where a page could be moved.
>>>>> Hugepage mappings can be moved on more recent kernels (at
>>>>> least 4.x), and the reliability of hugepages having static
>>>>> mappings decreases with every kernel release.
>>>> Do you mean kernel might take back a physical page after mapping it
>>>> to a
>>>> virtual page (maybe copy the data to another physical page)? Could you
>>>> please show some links or kernel commits?
>>> Yes - the kernel can move a physical page to another physical page
>>> and change the virtual mapping at any time. For a concise example
>>> see 'man migrate_pages(2)', or for a more serious example the code
>>> that performs memory page compaction in the kernel which was
>>> recently extended to support hugepages.
>>> Before we go down the path of me proving that the mapping isn't static,
>>> let me turn that line of thinking around. Do you have any documentation
>>> demonstrating that the mapping is static? It's not static for 4k
>>> pages, so
>>> why are we assuming that it is static for 2MB pages? I understand that
>>> it happened to be static for some versions of the kernel, but my
>>> is that this was purely by coincidence and never by intention.
>> It looks to me as if you are talking about Transparent hugepages, and
>> not hugetlbfs managed hugepages (DPDK usecase).
>> AFAIK memory (hugepages) managed by hugetlbfs is not compacted and/or
>> moved, they are not part of the kernel memory management.
> Please forgive my loose/poor use of words here when saying that "they
> are not part of the kernel memory management", I mean to say that
> they are not part of the kernel memory management process you were
> mentioning, ie. compacting, moving, etc.
>> So again, do you have some references to code/articles where this
>> "dynamic" behavior of hugepages managed by hugetlbfs is mentioned?
According to the information Benjamin provided, I did some home work and
find this macro in kernel config, CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION,
and further the function, hugepage_migration_supported().
Seems that there are at least three ways to make this behavior happen
(I'm basing on Linux 4.8.1):
a) Through a syscall migrate_pages();
b) through a syscall move_pages();
c) Since some version of kernel, there's a kthread named kcompactd for
each numa socket, to perform memory compaction.
>>>>> Note that this
>>>>> probably means that using uio on recent kernels is subtly
>>>>> broken and cannot be supported going forward because there
>>>>> is no uio mechanism to pin the memory.
>>>>> The first open question I have is whether DPDK should allow
>>>>> uio at all on recent (4.x) kernels. My current understanding
>>>>> is that there is no way to pin memory and hugepages can now
>>>>> be moved around, so uio would be unsafe. What does the
>>>>> community think here?
>>>>> My second question is whether the user should be allowed to
>>>>> mix uio and vfio usage simultaneously. For vfio, the
>>>>> physical addresses are really DMA addresses and are best
>>>>> when arbitrarily chosen to appear sequential relative to
>>>>> their virtual addresses.
>>>> Why "sequential relative to their virtual addresses"? IOMMU table
>>>> is for
>>>> DMA addr -> physical addr mapping. So we need to DMA addresses
>>>> "sequential relative to their physical addresses"? Based on your above
>>>> analysis on how hugepages are initialized, virtual addresses is a good
>>>> candidate for DMA address?
>>> The code already goes through a separate organizational step on all of
>>> the pages that remaps the virtual addresses such that they're
>>> relative to the physical backing pages, so this mostly ends up as
>>> the same
>>> Choosing to use the virtual address is a totally valid choice, but I
>>> worry it
>>> may lead to confusion during debugging or in a multi-process scenario.
>>> I'm open to making this choice instead of starting from zero, though.
More information about the dev