[dpdk-dev] [PATCH v6 08/11] eal: pci: introduce RTE_KDRV_VFIO_NOIOMMUi driver mode

Thomas Monjalon thomas.monjalon at 6wind.com
Tue Jan 26 14:00:52 CET 2016


2016-01-26 15:56, Santosh Shukla:
> On Mon, Jan 25, 2016 at 8:59 PM, Thomas Monjalon
> <thomas.monjalon at 6wind.com> wrote:
> > 2016-01-21 22:47, Santosh Shukla:
> >> On Thu, Jan 21, 2016 at 8:16 PM, Thomas Monjalon
> >> <thomas.monjalon at 6wind.com> wrote:
> >> > 2016-01-21 17:34, Santosh Shukla:
> >> >> On Thu, Jan 21, 2016 at 4:58 PM, Thomas Monjalon
> >> >> <thomas.monjalon at 6wind.com> wrote:
> >> >> > 2016-01-21 16:43, Santosh Shukla:
> >> >> >> David Marchand <david.marchand at 6wind.com> wrote:
> >> >> >> > This is a mode (specific to vfio), not a new kernel driver.
> >> >> >> >
> >> >> >> Yes, Specific to VFIO and this is why noiommu appended after vfio i.e..
> >> >> >> __VFIO and __VFIO_NOIOMMU.
> >> >> >
> >> >> > Woaaa! Your logic is really disappointing :)
> >> >> > Specific to VFIO => append _NOIOMMU
> >> >> > If it's for VFIO, it should be called VFIO (that's my logic).
> >> >> >
> >> >> I am confused by reading your comment. vfio works for default iommu
> >> >> and now with noiommu. drv->kdrv need to know driver mode for vfio
> >> >> case. So that user can simply read drv->kdrv value in their driver and
> >> >> accordingly use vfio rd/wr api for example {pread/pwrite}. This is how
> >> >> rte_eal_pci_vfio_read/write_bar() api implemented.
> >> >
> >> > Sorry I don't understand. Why EAL read/write functions should be different
> >> > depending of the VFIO mode?
> >>
> >> no, EAL rd/wr functions are not different for vfio or vfio modes {same
> >> for iommu or noiommu}. Pl. see pci_eal_read/write_bar() api. Those
> >> apis currently used for VFIO, Irrespective of vfio mode. If required,
> >> we can add UIO bar_rd/wr api too. pci_eal_rd/wr_bar() are abstract
> >> apis. Underneath implementation can be vfio or uio type.
> >
> > It means you agree the suffix _NOIOMMU is not needed?
> > It seems we go nowhere in this discussion. You said
> > "drv->kdrv need to know driver mode for vfio"
> 
> In my observation, currently virtio work for vfio-noiommu, that's why
> said drv->kdrv need to know vfio mode.

It is your observation. It may change in near future.

> > and after
> > "Those apis currently used for VFIO, Irrespective of vfio mode"
> > That's why I assume your first assumption was wrong.
> >
> 
> Newly introduced dpdk global api pci_eal_rd/wr_bar(),  can be used for
> vfio and uio both; can be used for vfio w/IOMMU and vfio w/o IOMMU
> both.
> 
> >> >> > Why do we care to parse noiommu only?
> >> >>
> >> >> Because pmd drivers example virtio can work with vfio only in
> >> >> _noiommu_ mode. In particular, virtio spec 0.95 / legacy virtio.
> >> >
> >> > Please could you explain the limitation (except IOMMU availability)?
> >>
> >> Ok.
> >>
> >> I believe - we both agree that noiommu mode is a need for pmd drivers
> >> like virtio, right? if so then other reason is implementation driven
> >
> > No, noiommu is a need for some environment having no IOMMU.
> > But in my understanding, virtio could run with a nested IOMMU.
> 
> Interesting, like to understand nested one, I did tried in past by
> passing "iommu=pt intel_iommu=on kvm-intel.nested=1" in cmdline for
> x86 (for guest/host both), but virtio pci device binding to vfio-pci
> driver fails. Tried on 4.2 kernel (qemu version 2.5), is it working
> for >4.2 kernel/ qemu-version?

I haven't tried.

> >> i.e..
> >>
> >> Pl. look at virtio_pci.c in this patch.. VIRTIO_RD/WR/_1/2/4()
> >> implementation. They are in-use and applicable to  virtio spec 0.95,
> >> so far support uio/ioport-way rd/wr. Now to support vfio-way rd/wr -
> >> need to check drv->kdrv value, that value should be of vfio_noiommu
> >> types __not__  generic _vfio types.
> >
> > I still don't understand why it would not work with VFIO w/IOMMU.
> 
> with vfio+iommu; binding virtio pci device to vfio-pci driver fail;
> giving below error:
> [   53.053464] VFIO - User Level meta-driver version: 0.3
> [   73.077805] vfio-pci: probe of 0000:00:03.0 failed with error -22
> [   73.077852] vfio-pci: probe of 0000:00:03.0 failed with error -22
> 
> vfio_pci_probe() --> vfio_iommu_group_get() --> iommu_group_get()
> fails: iommu doesn't have group for virtio pci device.

Yes it fails when binding.
So the later check in the virtio PMD is useless.

> In case of noiommu, it prepares the group / add device to iommu group,
> so it passes.
> 
> Jason in other thread mentioned that he is working on virtio+iommu
> approach [1], Patches are not merged and I am yet to evaluate his
> patches for virtio pmd driver for iommu(+vfio). so wondering how
> virtio pci device could work unless jason patches used?
> 
> [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg337079.html

I haven't tried nested IOMMU.
All this thread was about the kernel module in use, i.e. VFIO.
We are saying that virtio could work in both VFIO modes.
Furthermore restricting virtio to no-iommu mode doesn't bring
any improvement.
That's why I suggest to keep the initial semantic of kdrv and
not pollute it with VFIO modes.

> >> >> So at
> >> >> the initialization (example .. virtio-net) of such pmd driver, pmd
> >> >> driver should know that vfio-with-noiommu mode enabled or not? for
> >> >> that pmd driver simply checks drv->kdrv value.
> >> >
> >> > If a check is needed, I would prefer using your function
> >> > pci_vfio_is_noiommu() and remove driver modes from struct rte_kernel_driver.
> >>
> >> I don't think calling pci_vfio_no_iommu() inside
> >> virtio_reg_rd/wr_1/2/3() would be a good idea.
> >
> > Why? The value may be cached in the priv properties.
> >
> pci_vfio_is_noiommu() parses /sys for
> - enable_noiommu param
> - attached driver name is vfio-noiommu or not.
> 
> It does file operation for that, I meant to say that calling this api
> within register_rd/wr function is not correct. It would be better if
> those low level register_rd/wr api only checks driver_types.

Yes, that's why I said the return of pci_vfio_is_noiommu() may be cached
to keep efficiency.


More information about the dev mailing list