[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

Michael S. Tsirkin mst at redhat.com
Thu Oct 1 17:22:51 CEST 2015


On Thu, Oct 01, 2015 at 07:50:37AM -0700, Stephen Hemminger wrote:
> On Thu, 1 Oct 2015 11:33:06 +0300
> "Michael S. Tsirkin" <mst at redhat.com> wrote:
> 
> > On Wed, Sep 30, 2015 at 03:28:58PM -0700, Stephen Hemminger wrote:
> > > This driver allows using PCI device with Message Signalled Interrupt
> > > from userspace. The API is similar to the igb_uio driver used by the DPDK.
> > > Via ioctl it provides a mechanism to map MSI-X interrupts into event
> > > file descriptors similar to VFIO.
> > >
> > > VFIO is a better choice if IOMMU is available, but often userspace drivers
> > > have to work in environments where IOMMU support (real or emulated) is
> > > not available.  All UIO drivers that support DMA are not secure against
> > > rogue userspace applications programming DMA hardware to access
> > > private memory; this driver is no less secure than existing code.
> > > 
> > > Signed-off-by: Stephen Hemminger <stephen at networkplumber.org>
> > 
> > I don't think copying the igb_uio interface is a good idea.
> > What DPDK is doing with igb_uio (and indeed uio_pci_generic)
> > is abusing the sysfs BAR access to provide unlimited
> > access to hardware.
> > 
> > MSI messages are memory writes so any generic device capable
> > of MSI is capable of corrupting kernel memory.
> > This means that a bug in userspace will lead to kernel memory corruption
> > and crashes.  This is something distributions can't support.
> > 
> > uio_pci_generic is already abused like that, mostly
> > because when I wrote it, I didn't add enough protections
> > against using it with DMA capable devices,
> > and we can't go back and break working userspace.
> > But at least it does not bind to VFs which all of
> > them are capable of DMA.
> > 
> > The result of merging this driver will be userspace abusing the
> > sysfs BAR access with VFs as well, and we do not want that.
> > 
> > 
> > Just forwarding events is not enough to make a valid driver.
> > What is missing is a way to access the device in a safe way.
> > 
> > On a more positive note:
> > 
> > What would be a reasonable interface? One that does the following
> > in kernel:
> > 
> > 1. initializes device rings (can be in pinned userspace memory,
> >    but can not be writeable by userspace), brings up interface link
> > 2. pins userspace memory (unless using e.g. hugetlbfs)
> > 3. gets request, make sure it's valid and belongs to
> >    the correct task, put it in the ring
> > 4. in the reverse direction, notify userspace when buffers
> >    are available in the ring
> > 5. notify userspace about MSI (what this driver does)
> > 
> > What userspace can be allowed to do:
> > 
> > 	format requests (e.g. transmit, receive) in userspace
> > 	read ring contents
> > 
> > What userspace can't be allowed to do:
> > 
> > 	access BAR
> > 	write rings
> > 
> > 
> > This means that the driver can not be a generic one,
> > and there will be a system call overhead when you
> > write the ring, but that's the price you have to
> > pay for ability to run on systems without an IOMMU.
> 
> I think I understand what you are proposing, but it really doesn't
> fit into the high speed userspace networking model.

I'm aware of the fact currently the model does everything including
bringing up the link in user-space.
But there's really no justification for this.
Only data path things should be in userspace.

A userspace bug should not be able to do things like over-writing the
on-device EEPROM.


> 1. Device rings are device specific, can't be in a generic driver.

So that's more work, and it is not going to happen if people
can get by with insecure hacks.

> 2. DPDK uses huge mememory.

Hugetlbfs? Don't see why this is an issue. Might make things simpler.

> 3. Performance requires all ring requests be done in pure userspace,
>    (ie no syscalls)

Make only the TX ring writeable then. At least you won't be
able to corrupt the kernel memory.

> 4. Ditto, can't have kernel to userspace notification per packet

RX ring can be read-only, so userspace can read it directly.

-- 
MST


More information about the dev mailing list