[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

Avi Kivity avi at scylladb.com
Thu Oct 1 13:20:37 CEST 2015



On 10/01/2015 02:09 PM, Michael S. Tsirkin wrote:
> On Thu, Oct 01, 2015 at 01:50:10PM +0300, Avi Kivity wrote:
>>>> It's not just the lack of system calls, of course, the architecture is
>>>> completely different.
>>> Absolutely - I'm not saying move all of DPDK into kernel.
>>> We just need to protect the RX rings so hardware does
>>> not corrupt kernel memory.
>>>
>>>
>>> Thinking about it some more, many devices
>>> have separate rings for DMA: TX (device reads memory)
>>> and RX (device writes memory).
>>> With such devices, a mode where userspace can write TX ring
>>> but not RX ring might make sense.
>> I'm sure you can cause havoc just by reading, if you read from I/O memory.
> Not talking about I/O memory here. These are device rings in RAM.

Right.  But you program them with DMA addresses, so the device can read 
another device's memory.

>>> This will mean userspace might read kernel memory
>>> through the device, but can not corrupt it.
>>>
>>> That's already a big win!
>>>
>>> And RX buffers do not have to be added one at a time.
>>> If we assume 0.2usec per system call, batching some 100 buffers per
>>> system call gives you 2 nano seconds overhead.  That seems quite
>>> reasonable.
>> You're ignoring the page table walk
> Some caching strategy might work here.

It may, or it may not.  I'm not against this.  I'm against blocking 
user's access to their hardware, using an existing, established 
interface, for a small subset of setups.  It doesn't help you in any way 
(you can still get reports of oopses due to buggy userspace drivers on 
physical machines, or on virtual machines that don't require 
interrupts), and it harms them.

>> and other per-descriptor processing.
> You probably can let userspace pre-format it all,
> just validate addresses.

You have to figure out if the descriptor contains an address or not 
(many devices have several descriptor formats, some with addresses and 
some without, which are intermixed).  You also have to parse the 
descriptor size and see if it crosses a page boundary or not.

>
>> Again^2, maybe this can work.  But it shouldn't block a patch enabling
>> interrupt support of VFs.  After the ring proxy is available and proven for
>> a few years, we can deprecate bus mastering from uio, and after a few more
>> years remove it.
> We are talking about DPDK patches posted in June 2015.  It's not some
> software proven for years.

dpdk has been used for years, it just won't work on VFs, if you need 
interrupt support.

>    If Linux keeps enabling hacks, no one will
> bother doing the right thing.  Upstream inclusion is the only carrot
> Linux has to make people do the right thing.

It's not a carrot, it's a stick.  Implementing you scheme will take a 
huge effort, is not guaranteed to provide the performance needed, and 
will not be available for years.  Meanwhile exactly the same thing on 
physical machines is supported.

People will just use out of tree drivers (dpdk has several already).  
It's a pain, but nowhere near what you are proposing.



More information about the dev mailing list