[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

Avi Kivity avi at scylladb.com
Wed Sep 30 17:36:17 CEST 2015


On 09/30/2015 06:21 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 05:53:54PM +0300, Avi Kivity wrote:
>> On 09/30/2015 05:39 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 04:05:40PM +0300, Avi Kivity wrote:
>>>> On 09/30/2015 03:27 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Sep 30, 2015 at 03:16:04PM +0300, Vlad Zolotarov wrote:
>>>>>> On 09/30/15 15:03, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Sep 30, 2015 at 02:53:19PM +0300, Vlad Zolotarov wrote:
>>>>>>>> On 09/30/15 14:41, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
>>>>>>>>>> The whole idea is to bypass kernel. Especially for networking...
>>>>>>>>> ... on dumb hardware that doesn't support doing that securely.
>>>>>>>> On a very capable HW that supports whatever security requirements needed
>>>>>>>> (e.g. 82599 Intel's SR-IOV VF devices).
>>>>>>> Network card type is irrelevant as long as you do not have an IOMMU,
>>>>>>> otherwise you would just use e.g. VFIO.
>>>>>> Sorry, but I don't follow your logic here - Amazon EC2 environment is a
>>>>>> example where there *is* iommu but it's not virtualized
>>>>>> and thus VFIO is
>>>>>> useless and there is an option to use directly assigned SR-IOV networking
>>>>>> device there where using the kernel drivers impose a performance impact
>>>>>> compared to user space UIO-based user space kernel bypass mode of usage. How
>>>>>> is it irrelevant? Could u, pls, clarify your point?
>>>>>>
>>>>> So it's not even dumb hardware, it's another piece of software
>>>>> that forces an "all or nothing" approach where either
>>>>> device has access to all VM memory, or none.
>>>>> And this, unfortunately, leaves you with no secure way to
>>>>> allow userspace drivers.
>>>> Some setups don't need security (they are single-user, single application).
>>>> But do need a lot of performance (like 5X-10X performance).  An example is
>>>> OpenVSwitch, security doesn't help it at all and if you force it to use the
>>>> kernel drivers you cripple it.
>>> We'd have to see there are actual users that need this.  So far, dpdk
>>> seems like the only one,
>> dpdk is a whole class if users.  It's not a specific application.
>>
>>>   and it wants to use UIO for slow path stuff
>>> like polling link status.  Why this needs kernel bypass support, I don't
>>> know.  I asked, and got no answer.
>> First, it's more than link status.  dpdk also has an interrupt mode, which
>> applications can fall back to when when the load is light in order to save
>> power (and in order not to get support calls about 100% cpu when idle).
> Aha, looks like it appeared in June. Interesting, thanks for the info.
>
>> Even for link status, you don't want to poll for that, because accessing
>> device registers is expensive.  An interrupt is the best approach for rare
>> events like link changed.
> Yea, but you probably can get by with a timer for that, even if it's ugly.

Maybe you can, but (a) why increase link status change detection latency 
(b) link status change detection is not the only user of the feature, 
since June.

>>>> Also, I'm root.  I can do anything I like, including loading a patched
>>>> pci_uio_generic.  You're not providing _any_ security, you're simply making
>>>> life harder for users.
>>> Maybe that's true on your system. But I guess you know that's not true
>>> for everyone, not in 2015.
>> Why is it not true?  if I'm root, I can do anything I like to my
>> system, and everyone is root in 2015.  I can access the BARs directly
>> and program DMA, how am I more secure by uio not allowing me to setup
>> msix?
> That's not the point.  The point always was that using uio for these
> devices (capable of DMA, in particular of msix) isn't possible in a
> secure way.

uio is used today for DMA-capable devices.  Some users are perfectly 
willing to give up security for functionality (that's all users who have 
root access to their machines, not just uio users).  You aren't adding 
any security by disallowing uio, you're just removing functionality.

As it happens, you're removing the functionality from the users who have 
no other option.  They can't use vfio because it doesn't work on 
virtualized setups.

(note even on a setup that does support vfio, high performance users 
will want to avoid it).

>   And yes, if same device happens to also do interrupts, UIO
> does not reject it as it probably should, and we can't change this
> without breaking some working setups.  But this doesn't mean we should
> add more setups like this that we'll then be forced to maintain.

pci_uio_generic is maybe the driver with the lowest maintenance burden 
in the entire kernel.  One driver supporting all pci devices, if you 
don't need msi/msix.  And with the patch, it will be one driver 
supporting all pci devices.

I don't really understand the tradeoff.  By rejecting the patch you're 
denying users the ability to use their devices, except through the much 
slower kernel drivers.  The patch would not allow a non-root user to do 
ANYTHING.  Root can already do anything.  So what security issue is there?

>
>
>> Non-root users are already secured by their inability to load the module,
>> and by the device permissions.
>>
>>>>> So it makes even less sense to add insecure work-arounds in the kernel.
>>>>> It seems quite likely that by the time the new kernel reaches
>>>>> production X years from now, EC2 will have a virtual iommu.
>>>> I can adopt a new kernel tomorrow.  I have no influence on EC2.
>>>>
>>>>
>>> Xen grant tables sound like they could be the right interface
>>> for EC2.  google search for "grant tables iommu" immediately gives me:
>>> http://lists.xenproject.org/archives/html/xen-devel/2014-04/msg00963.html
>>> Maybe latest Xen is already doing the right thing, and it's just the
>>> question of making VFIO use that.
>>>
>> grant tables only work for virtual devices, not physical devices.
> Why not? That's what the patches above seem to do.
>

Oh, I think those are for emulating transient iommu maps (new map for 
every request) on top of a real iommu.  The dpdk use case is permanently 
mapping a large chunk of guest userspace, I don't think Xen exposes 
enough grant table entries for that.

In addition, that leaves users of kvm, vmware, older Xen, or bare metal 
machines without iommus out in the cold; and bare metal users that want 
the iommu off for performance are forced to use it.  And for what, to 
prevent root from touching memory via dma that they can access in a 
million other ways?



More information about the dev mailing list