[dpdk-dev] Running DPDK as an unprivileged user

Walker, Benjamin benjamin.walker at intel.com
Wed Jan 4 22:34:26 CET 2017


On Wed, 2017-01-04 at 19:39 +0800, Tan, Jianfeng wrote:
> Hi Benjamin,
> 
> 
> On 12/30/2016 4:41 AM, Walker, Benjamin wrote:
> > DPDK today begins by allocating all of the required
> > hugepages, then finds all of the physical addresses for
> > those hugepages using /proc/self/pagemap, sorts the
> > hugepages by physical address, then remaps the pages to
> > contiguous virtual addresses. Later on and if vfio is
> > enabled, it asks vfio to pin the hugepages and to set their
> > DMA addresses in the IOMMU to be the physical addresses
> > discovered earlier. Of course, running as an unprivileged
> > user means all of the physical addresses in
> > /proc/self/pagemap are just 0, so this doesn't end up
> > working. Further, there is no real reason to choose the
> > physical address as the DMA address in the IOMMU - it would
> > be better to just count up starting at 0.
> 
> Why not just using virtual address as the DMA address in this case to 
> avoid maintaining another kind of addresses?

That's a valid choice, although I'm just storing the DMA address in the
physical address field that already exists. You either have a physical
address or a DMA address and never both.

> 
> >   Also, because the
> > pages are pinned after the virtual to physical mapping is
> > looked up, there is a window where a page could be moved.
> > Hugepage mappings can be moved on more recent kernels (at
> > least 4.x), and the reliability of hugepages having static
> > mappings decreases with every kernel release.
> 
> Do you mean kernel might take back a physical page after mapping it to a 
> virtual page (maybe copy the data to another physical page)? Could you 
> please show some links or kernel commits?

Yes - the kernel can move a physical page to another physical page
and change the virtual mapping at any time. For a concise example
see 'man migrate_pages(2)', or for a more serious example the code
that performs memory page compaction in the kernel which was
recently extended to support hugepages.

Before we go down the path of me proving that the mapping isn't static,
let me turn that line of thinking around. Do you have any documentation
demonstrating that the mapping is static? It's not static for 4k pages, so
why are we assuming that it is static for 2MB pages? I understand that
it happened to be static for some versions of the kernel, but my understanding
is that this was purely by coincidence and never by intention.

> 
> > Note that this
> > probably means that using uio on recent kernels is subtly
> > broken and cannot be supported going forward because there
> > is no uio mechanism to pin the memory.
> > 
> > The first open question I have is whether DPDK should allow
> > uio at all on recent (4.x) kernels. My current understanding
> > is that there is no way to pin memory and hugepages can now
> > be moved around, so uio would be unsafe. What does the
> > community think here?
> > 
> > My second question is whether the user should be allowed to
> > mix uio and vfio usage simultaneously. For vfio, the
> > physical addresses are really DMA addresses and are best
> > when arbitrarily chosen to appear sequential relative to
> > their virtual addresses.
> 
> Why "sequential relative to their virtual addresses"? IOMMU table is for 
> DMA addr -> physical addr mapping. So we need to DMA addresses 
> "sequential relative to their physical addresses"? Based on your above 
> analysis on how hugepages are initialized, virtual addresses is a good 
> candidate for DMA address?

The code already goes through a separate organizational step on all of
the pages that remaps the virtual addresses such that they're sequential
relative to the physical backing pages, so this mostly ends up as the same
thing.
Choosing to use the virtual address is a totally valid choice, but I worry it
may lead to confusion during debugging or in a multi-process scenario.
I'm open to making this choice instead of starting from zero, though.

> 
> Thanks,
> Jianfeng


More information about the dev mailing list