[dpdk-dev] Running DPDK as an unprivileged user
jianfeng.tan at intel.com
Wed Jan 4 12:39:18 CET 2017
On 12/30/2016 4:41 AM, Walker, Benjamin wrote:
> Hi all,
> I've been digging in to what it would take to run DPDK as an
> unprivileged user and I have some findings that I thought
> were worthy of discussion. The assumptions here are that I'm
> using a very recent Linux kernel (4.8.15 to be specific) and
> I'm using vfio with my IOMMU enabled. I'm only interested in
> making it possible to run as an unprivileged user in this
> type of environment.
> There are a few key things that DPDK needs to do in order to
> run as an unprivileged user:
> 1) Allocate hugepages
> 2) Map device resources
> 3) Map hugepage virtual addresses to DMA addresses.
> For #1 and #2, DPDK works just fine today. You simply chown
> the relevant resources in sysfs to the desired user and
> everything is happy.
> The problem is #3. This currently relies on looking up the
> mappings in /proc/self/pagemap, but the ability to get
> physical addresses in /proc/self/pagemap as an unprivileged
> user was removed from the kernel in the 4.x timeframe due to
> the Rowhammer vulnerability. At this time, it is not
> possible to run DPDK as an unprivileged user on a 4.x Linux
> There is a way to make this work though, which I'll outline
> now. Unfortunately, I think it is going to require some very
> significant changes to the initialization flow in the EAL.
> One bit of of background before I go into how to fix this -
> there are three types of memory addresses - virtual
> addresses, physical addresses, and DMA addresses. Sometimes
> DMA addresses are called bus addresses or I/O addresses, but
> I'll call them DMA addresses because I think that's the
> clearest name. In a system without an IOMMU, DMA addresses
> and physical addresses are equivalent, but in a system with
> an IOMMU any arbitrary DMA address can be chosen by the user
> to map to a given physical address. For security reasons
> (rowhammer), it is no longer considered safe to expose
> physical addresses to userspace, but it is perfectly fine to
> expose DMA addresses when an IOMMU is present.
> DPDK today begins by allocating all of the required
> hugepages, then finds all of the physical addresses for
> those hugepages using /proc/self/pagemap, sorts the
> hugepages by physical address, then remaps the pages to
> contiguous virtual addresses. Later on and if vfio is
> enabled, it asks vfio to pin the hugepages and to set their
> DMA addresses in the IOMMU to be the physical addresses
> discovered earlier. Of course, running as an unprivileged
> user means all of the physical addresses in
> /proc/self/pagemap are just 0, so this doesn't end up
> working. Further, there is no real reason to choose the
> physical address as the DMA address in the IOMMU - it would
> be better to just count up starting at 0.
Why not just using virtual address as the DMA address in this case to
avoid maintaining another kind of addresses?
> Also, because the
> pages are pinned after the virtual to physical mapping is
> looked up, there is a window where a page could be moved.
> Hugepage mappings can be moved on more recent kernels (at
> least 4.x), and the reliability of hugepages having static
> mappings decreases with every kernel release.
Do you mean kernel might take back a physical page after mapping it to a
virtual page (maybe copy the data to another physical page)? Could you
please show some links or kernel commits?
> Note that this
> probably means that using uio on recent kernels is subtly
> broken and cannot be supported going forward because there
> is no uio mechanism to pin the memory.
> The first open question I have is whether DPDK should allow
> uio at all on recent (4.x) kernels. My current understanding
> is that there is no way to pin memory and hugepages can now
> be moved around, so uio would be unsafe. What does the
> community think here?
> My second question is whether the user should be allowed to
> mix uio and vfio usage simultaneously. For vfio, the
> physical addresses are really DMA addresses and are best
> when arbitrarily chosen to appear sequential relative to
> their virtual addresses.
Why "sequential relative to their virtual addresses"? IOMMU table is for
DMA addr -> physical addr mapping. So we need to DMA addresses
"sequential relative to their physical addresses"? Based on your above
analysis on how hugepages are initialized, virtual addresses is a good
candidate for DMA address?
More information about the dev