[dpdk-dev] [PATCH] igb_uio: map dummy dma forcing iommu domain attachment

Alejandro Lucero alejandro.lucero at netronome.com
Mon Feb 13 14:31:05 CET 2017


On Fri, Feb 10, 2017 at 7:03 PM, Ferruh Yigit <ferruh.yigit at intel.com>
wrote:

> On 2/8/2017 11:54 AM, Alejandro Lucero wrote:
> > Hi Ferruh,
> >
> > On Tue, Feb 7, 2017 at 3:59 PM, Ferruh Yigit <ferruh.yigit at intel.com
> > <mailto:ferruh.yigit at intel.com>> wrote:
> >
> >     Hi Alejandro,
> >
> >     On 1/18/2017 12:27 PM, Alejandro Lucero wrote:
> >     > For using a DPDK app when iommu is enabled, it requires to
> >     > add iommu=pt to the kernel command line. But using igb_uio driver
> >     > makes DMAR errors because the device has not an IOMMU domain.
> >
> >     Please help to understand the scope of the problem,
> >
> >
> > After reading your reply, I realize I could have explained it better.
> > First of all, this is related to SRIOV, exactly when the VFs are created.
> >
> >
> >     1- How can you re-produce the problem?
> >
> >
> > Using a VF from a Intel card by a DPDK app in the host and a kernel >=
> > 3.15. Although usually VFs are assigned to VMs, it could also be an
> > option to use VFs by the host.
> >
> > BTW, I did not try to reproduce the problem with an Intel card. I
> > triggered this problem with an NFP, but because the problem behind, I
> > bet that is going to happen for an Intel one as well.
>
> I can able to reproduce the problem with ixgbe, by using VF on the host.
>
> And I verified your patch fixes it, it cause device attached to a vfio
> group.
>
> So, I believe good to get this patch, but it is already to late for
> 17.02 release.
> I suggest getting this one early 17.05, so it gives more time to test.
>
>
Ok.


> >
> >
> >
> >     2- What happens get DMAR errors, is it prevents device work or some
> >     annoying error messages?
> >
> >
> > A DMAR error implies the device can not access to the DMA address given
> > by the host. I have experienced several situations where it is just that
> > device not being able to work at all, but it also has more global
> > implications and you need to reboot the system because it is unreliable.
> > I think it depends on how these DMAR errors are handled, but in any
> > case, this is a bad thing.
>
> In my test, implication was device is not working.
>
>
Yes, the device can not work for sure as it has not a IOMMU domain to work
with.
But sometimes the device is so badly after it, it does not help to create
one domain and attach the device to it, and just rebooting the system helps.


> >
> >
> >
> >     3- Can you please share the error messages?
> >
> >
> > With this problem you can expect something like this:
> >
> >  559.163874] DMAR: DRHD: handling fault status reg 2
> > [ 559.165427] DMAR: DMAR:[DMA Read] Request device [82:08.0] fault addr
> > e7b73b000
> > [ 559.165427] DMAR:[fault reason 02] Present bit in context entry is
> clear
> > [ 568.367417] DMAR: DRHD: handling fault status reg 102
> > [ 568.369025] DMAR: DMAR:[DMA Read] Request device [82:08.1] fault addr
> > ebb73b000
> > [ 568.369025] DMAR:[fault reason 02] Present bit in context entry is
> clear
> > [ 571.773944] DMAR: DRHD: handling fault status reg 202
> > [ 571.775550] DMAR: DMAR:[DMA Read] Request device [82:08.2] fault addr
> > efb73b000
> > [ 571.775550] DMAR:[fault reason 02] Present bit in context entry is
> clear
> > [ 575.039654] DMAR: DRHD: handling fault status reg 302
> > [ 575.041259] DMAR: DMAR:[DMA Read] Request device [82:08.3] fault addr
> > f3b73b000
> > [ 575.041259] DMAR:[fault reason 02] Present bit in context entry is
> clear
> >
> > There are different DMAR errors, sometimes referring to a specific
> > address being wrong. In this case it is related to the device not having
> > a context or a IOMMU domain.
> >
> > Also note we got these errors for different devices/VFs. This was with a
> > DPDK app using several VFs.
> >
> >
> >
> >
> >     >
> >     > Since kernel 3.15, iommu=pt requires to use the internal kernel
> >     > DMA API for attaching the device to the IOMMU 1:1 mapping, aka
> >     > si_domain. Previous versions did attach the device to that
> >     > domain when intel iommu notifier was called.
> >
> >     Again, what is not working since 3.15?
> >
> >
> > This specific case, yes. With older kernels, when VFs are created, IOMMU
> > code is executed (notifier chain callback) and if iommu=pt, the VF is
> > attached to the si_domain, this is the 1:1 mapping. But this has changed
> > with newer kernels, and after VFs are created they have no IOMMU domain
> > at all. The kernel expects the driver to implicitly create such a domain
> > when the kernel DMA API is used.
>
> Thanks again for clarification.
> What will be the effect of your patch for kernel < 3.15, should your
> update be protected with a kernel version check, or is it safe for all?
>
>
It is harmless. If the device got an IOMMU domain, like when iommu=pt in
older kernels,
the allocation does nothing. Note that we do not use kernel DMA API with
DPDK apps at all, and this allocation is just for forcing the binding to a
specific IOMMU domain if there is not such a binding yet.

If iommu is enabled, but iommu=pt is not set, that will create the iommu
domain, one specific for the device, but it will not avoid the DPDK app
failing afterwards, as that IOMMU domain will not be configured by anyone
properly. So, you need iommu=pt with igb_uio, or just to use VFIO.


> >
> >
> >
> >     >
> >     > This is not a problem if the driver does later some call to the
> >     > DMA API because the mapping can be done then. But DPDK apps do
> >     > not use that DMA API at all.
> >
> >     Is this same/similar with:
> >     http://dpdk.org/dev/patchwork/patch/12654/
> >     <http://dpdk.org/dev/patchwork/patch/12654/>
> >
> >
> > That case was another issue regarding IOMMU and iommu=pt. The problem
> > there was when you detach a VF from a VM, but the VF was initially
> > attached to the si_domain because the kernel did so. The patch helped to
> > attach the VF again to that domain when binding to the UIO.
> >
> > Looking at that patch now (I did comment on it then), it just solved the
> > problem if the VF was detach form the UIO, something that could be
> > easily forgotten or simply not done because, apparently, it is not
> needed.
>
> I also able to reproduce this case. When driver switched from igb_uio ->
> vfio_pci -> igb_uio, it stops working, giving similar DMAR errors.
>
> Your patch also fixing this, at least for my test. When unbind from
> vfio_pci, iommu group removed, but binding igb_uio adds it back.
>
>
> > What about to use VFIO?
> >
> > With that previous patch, it was not enough. I do not remember the
> > details now, and I'm not sure if VFIO created another IOMMU domain if
> > the device had one, but it could leave the device without an IOMMU
> > domain after the first use.
> >
> > In this particular case, VFIO would work, because the device gets its
> > own IOMMU domain. But there are two main problems if this is not fixed
> > when using UIO:
> >
> > 1) UIO is one of the two options for working with IOMMU. We all agree
> > VFIO is the right one for IOMMU, but as long as UIO is still an option,
> > that should be fixed.
> >
> > 2) Some installations need to work with and without IOMMU. Having same
> > module for both cases makes things simpler and therefore they use UIO
> > instead of VFIO.
> >
> >
> >
> >     >
> >     > Doing this dma map and unmap is harmless even when iommu is not
> >     > enabled at all.
> >     >
> >     > Signed-off-by: Alejandro Lucero <alejandro.lucero at netronome.com
> <mailto:alejandro.lucero at netronome.com>>
>
> Tested-by: Ferruh Yigit <ferruh.yigit at intel.com>
>
> >     <...>
> >
> >     Thanks,
> >     ferruh
> >
> >
> >
>
>


More information about the dev mailing list