[dpdk-dev] [PATCH] igb_uio: map dummy dma forcing iommu domain attachment

Ferruh Yigit ferruh.yigit at intel.com
Fri Feb 10 20:03:17 CET 2017


On 2/8/2017 11:54 AM, Alejandro Lucero wrote:
> Hi Ferruh,
> 
> On Tue, Feb 7, 2017 at 3:59 PM, Ferruh Yigit <ferruh.yigit at intel.com
> <mailto:ferruh.yigit at intel.com>> wrote:
> 
>     Hi Alejandro,
> 
>     On 1/18/2017 12:27 PM, Alejandro Lucero wrote:
>     > For using a DPDK app when iommu is enabled, it requires to
>     > add iommu=pt to the kernel command line. But using igb_uio driver
>     > makes DMAR errors because the device has not an IOMMU domain.
> 
>     Please help to understand the scope of the problem,
> 
> 
> After reading your reply, I realize I could have explained it better.
> First of all, this is related to SRIOV, exactly when the VFs are created.
>  
> 
>     1- How can you re-produce the problem?
> 
> 
> Using a VF from a Intel card by a DPDK app in the host and a kernel >=
> 3.15. Although usually VFs are assigned to VMs, it could also be an
> option to use VFs by the host. 
> 
> BTW, I did not try to reproduce the problem with an Intel card. I
> triggered this problem with an NFP, but because the problem behind, I
> bet that is going to happen for an Intel one as well.

I can able to reproduce the problem with ixgbe, by using VF on the host.

And I verified your patch fixes it, it cause device attached to a vfio
group.

So, I believe good to get this patch, but it is already to late for
17.02 release.
I suggest getting this one early 17.05, so it gives more time to test.

> 
>  
> 
>     2- What happens get DMAR errors, is it prevents device work or some
>     annoying error messages?
> 
> 
> A DMAR error implies the device can not access to the DMA address given
> by the host. I have experienced several situations where it is just that
> device not being able to work at all, but it also has more global
> implications and you need to reboot the system because it is unreliable.
> I think it depends on how these DMAR errors are handled, but in any
> case, this is a bad thing.

In my test, implication was device is not working.

>  
> 
> 
>     3- Can you please share the error messages?
> 
> 
> With this problem you can expect something like this:
> 
>  559.163874] DMAR: DRHD: handling fault status reg 2
> [ 559.165427] DMAR: DMAR:[DMA Read] Request device [82:08.0] fault addr
> e7b73b000
> [ 559.165427] DMAR:[fault reason 02] Present bit in context entry is clear
> [ 568.367417] DMAR: DRHD: handling fault status reg 102
> [ 568.369025] DMAR: DMAR:[DMA Read] Request device [82:08.1] fault addr
> ebb73b000
> [ 568.369025] DMAR:[fault reason 02] Present bit in context entry is clear
> [ 571.773944] DMAR: DRHD: handling fault status reg 202
> [ 571.775550] DMAR: DMAR:[DMA Read] Request device [82:08.2] fault addr
> efb73b000
> [ 571.775550] DMAR:[fault reason 02] Present bit in context entry is clear
> [ 575.039654] DMAR: DRHD: handling fault status reg 302
> [ 575.041259] DMAR: DMAR:[DMA Read] Request device [82:08.3] fault addr
> f3b73b000
> [ 575.041259] DMAR:[fault reason 02] Present bit in context entry is clear
> 
> There are different DMAR errors, sometimes referring to a specific
> address being wrong. In this case it is related to the device not having
> a context or a IOMMU domain.
> 
> Also note we got these errors for different devices/VFs. This was with a
> DPDK app using several VFs.
>  
> 
> 
> 
>     >
>     > Since kernel 3.15, iommu=pt requires to use the internal kernel
>     > DMA API for attaching the device to the IOMMU 1:1 mapping, aka
>     > si_domain. Previous versions did attach the device to that
>     > domain when intel iommu notifier was called.
> 
>     Again, what is not working since 3.15?
> 
> 
> This specific case, yes. With older kernels, when VFs are created, IOMMU
> code is executed (notifier chain callback) and if iommu=pt, the VF is
> attached to the si_domain, this is the 1:1 mapping. But this has changed
> with newer kernels, and after VFs are created they have no IOMMU domain
> at all. The kernel expects the driver to implicitly create such a domain
> when the kernel DMA API is used.

Thanks again for clarification.
What will be the effect of your patch for kernel < 3.15, should your
update be protected with a kernel version check, or is it safe for all?

>  
> 
> 
>     >
>     > This is not a problem if the driver does later some call to the
>     > DMA API because the mapping can be done then. But DPDK apps do
>     > not use that DMA API at all.
> 
>     Is this same/similar with:
>     http://dpdk.org/dev/patchwork/patch/12654/
>     <http://dpdk.org/dev/patchwork/patch/12654/>
> 
>  
> That case was another issue regarding IOMMU and iommu=pt. The problem
> there was when you detach a VF from a VM, but the VF was initially
> attached to the si_domain because the kernel did so. The patch helped to
> attach the VF again to that domain when binding to the UIO.
> 
> Looking at that patch now (I did comment on it then), it just solved the
> problem if the VF was detach form the UIO, something that could be
> easily forgotten or simply not done because, apparently, it is not needed.

I also able to reproduce this case. When driver switched from igb_uio ->
vfio_pci -> igb_uio, it stops working, giving similar DMAR errors.

Your patch also fixing this, at least for my test. When unbind from
vfio_pci, iommu group removed, but binding igb_uio adds it back.

> 
> What about to use VFIO?
> 
> With that previous patch, it was not enough. I do not remember the
> details now, and I'm not sure if VFIO created another IOMMU domain if
> the device had one, but it could leave the device without an IOMMU
> domain after the first use.
> 
> In this particular case, VFIO would work, because the device gets its
> own IOMMU domain. But there are two main problems if this is not fixed
> when using UIO:
> 
> 1) UIO is one of the two options for working with IOMMU. We all agree
> VFIO is the right one for IOMMU, but as long as UIO is still an option,
> that should be fixed.
> 
> 2) Some installations need to work with and without IOMMU. Having same
> module for both cases makes things simpler and therefore they use UIO
> instead of VFIO.
> 
>  
> 
>     >
>     > Doing this dma map and unmap is harmless even when iommu is not
>     > enabled at all.
>     >
>     > Signed-off-by: Alejandro Lucero <alejandro.lucero at netronome.com <mailto:alejandro.lucero at netronome.com>>

Tested-by: Ferruh Yigit <ferruh.yigit at intel.com>

>     <...>
> 
>     Thanks,
>     ferruh
> 
> 
> 



More information about the dev mailing list