[dpdk-dev,v1] igu_uio: fix IOMMU domain issue

Message ID 1462879301-13570-1-git-send-email-zhe.tao@intel.com (mailing list archive)
State Rejected, archived
Delegated to: Ferruh Yigit
Headers

Commit Message

Zhe Tao May 10, 2016, 11:21 a.m. UTC
  Problem:
The following  operations will cause the igb_uio based DPDK
operation failed.
--Any device assignment through the kvm_assign_device interface,
this can be the pci-assign method in QEMU
--VFIO group attachment operation(attach to the container)
this can happens in  vfio-pci assignment in QEMU

Root cause:
For the two operation above finally will call the intel_iommu_attach_device
(e.g. for vfio/ vfio_group_set_container->
vfio_iommu_type1_attach_group->intel_iommu_attach_device)
If we use iommu=pt in the grub which means intel iommu driver will create a
static identity domain for all the PCI device,
Which will set the translation type into passthrough for all the context
entry for all the PCI devices,
But once we close QEMU process, e.g. the VFIO framework will invoke the
detach group operation and finally will call the intel_iommu_detach_device
which will clean the context entry.
(now the IOMMU entry for this device is not availablei)

For AMD iommu driver it handle this detach action right which will restore the
pt_domain (the same as static identity domain for intel) to the
corresponding entry.

Solution:
Add a work around in igb_uio driver which map one single page.
Because all the DMA related alloc and map
actions will cause the intel IOMMU driver to reload the SI domain to the context
entry, that's why the kernel driver never meets such problem.


Signed-off-by: Zhe Tao <zhe.tao@intel.com>
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)
  

Comments

Stephen Hemminger May 10, 2016, 3:59 p.m. UTC | #1
On Tue, 10 May 2016 19:21:41 +0800
Zhe Tao <zhe.tao@intel.com> wrote:

> Problem:
> The following  operations will cause the igb_uio based DPDK
> operation failed.
> --Any device assignment through the kvm_assign_device interface,
> this can be the pci-assign method in QEMU
> --VFIO group attachment operation(attach to the container)
> this can happens in  vfio-pci assignment in QEMU


If you have an IOMMU why not use VFIO instead, it is better.
  
Alejandro Lucero May 11, 2016, 7:35 a.m. UTC | #2
On Tue, May 10, 2016 at 4:59 PM, Stephen Hemminger <
stephen@networkplumber.org> wrote:

> On Tue, 10 May 2016 19:21:41 +0800
> Zhe Tao <zhe.tao@intel.com> wrote:
>
> > Problem:
> > The following  operations will cause the igb_uio based DPDK
> > operation failed.
> > --Any device assignment through the kvm_assign_device interface,
> > this can be the pci-assign method in QEMU
> > --VFIO group attachment operation(attach to the container)
> > this can happens in  vfio-pci assignment in QEMU
>
>
> If you have an IOMMU why not use VFIO instead, it is better.
>

It is not about VFIO against UIO but about how iommu domains are created
and destroyed by the (old) kernel when iommu=pt. So even with VFIO you can
have problems.

We have had problems like this and other due to our device (NFP) just
mapping up to 40 bits of address space. Old kernels used in LTS
distributions like Ubuntu are iommu buggy and you need to do things like
this mapping inside the driver for solving problems. By the way, using
SRIOV just adds more problems. It is not safe to use iommu=pt with 3.13.x
Ubuntu kernels.

It would be a good thing for the original patch to identify those kernels
where the problem was detected. Of course, there could be more kernels with
the same problem but that is more work to do.
  
Ferruh Yigit May 11, 2016, 5:24 p.m. UTC | #3
On 5/11/2016 8:35 AM, Alejandro Lucero wrote:
> On Tue, May 10, 2016 at 4:59 PM, Stephen Hemminger <
> stephen@networkplumber.org> wrote:
> 
>> On Tue, 10 May 2016 19:21:41 +0800
>> Zhe Tao <zhe.tao@intel.com> wrote:
>>
>>> Problem:
>>> The following  operations will cause the igb_uio based DPDK
>>> operation failed.
>>> --Any device assignment through the kvm_assign_device interface,
>>> this can be the pci-assign method in QEMU
>>> --VFIO group attachment operation(attach to the container)
>>> this can happens in  vfio-pci assignment in QEMU
>>
>>
>> If you have an IOMMU why not use VFIO instead, it is better.
>>
> 
> It is not about VFIO against UIO but about how iommu domains are created
> and destroyed by the (old) kernel when iommu=pt. So even with VFIO you can
> have problems.

Problem is in IOMMU driver but we are adding a workaround to igb_uio, if
using VFIO solves the issue, I believe that is better workaround.

1) Is there any case IOMMU supported but VFIO is not supported? Is there
anything forces to use igb_uio?

2) Does using VFIO solves the issue defined in problem statement?

> 
> We have had problems like this and other due to our device (NFP) just
> mapping up to 40 bits of address space. Old kernels used in LTS
> distributions like Ubuntu are iommu buggy and you need to do things like
> this mapping inside the driver for solving problems. By the way, using
> SRIOV just adds more problems. It is not safe to use iommu=pt with 3.13.x
> Ubuntu kernels.
> 
> It would be a good thing for the original patch to identify those kernels
> where the problem was detected. Of course, there could be more kernels with
> the same problem but that is more work to do.
> 

Thanks,
ferruh
  
Thomas Monjalon July 8, 2016, 5:27 p.m. UTC | #4
Ping, this patch is stalled.

2016-05-11 18:24, Ferruh Yigit:
> On 5/11/2016 8:35 AM, Alejandro Lucero wrote:
> > On Tue, May 10, 2016 at 4:59 PM, Stephen Hemminger <
> > stephen@networkplumber.org> wrote:
> > 
> >> On Tue, 10 May 2016 19:21:41 +0800
> >> Zhe Tao <zhe.tao@intel.com> wrote:
> >>
> >>> Problem:
> >>> The following  operations will cause the igb_uio based DPDK
> >>> operation failed.
> >>> --Any device assignment through the kvm_assign_device interface,
> >>> this can be the pci-assign method in QEMU
> >>> --VFIO group attachment operation(attach to the container)
> >>> this can happens in  vfio-pci assignment in QEMU
> >>
> >>
> >> If you have an IOMMU why not use VFIO instead, it is better.
> >>
> > 
> > It is not about VFIO against UIO but about how iommu domains are created
> > and destroyed by the (old) kernel when iommu=pt. So even with VFIO you can
> > have problems.
> 
> Problem is in IOMMU driver but we are adding a workaround to igb_uio, if
> using VFIO solves the issue, I believe that is better workaround.
> 
> 1) Is there any case IOMMU supported but VFIO is not supported? Is there
> anything forces to use igb_uio?
> 
> 2) Does using VFIO solves the issue defined in problem statement?
> 
> > 
> > We have had problems like this and other due to our device (NFP) just
> > mapping up to 40 bits of address space. Old kernels used in LTS
> > distributions like Ubuntu are iommu buggy and you need to do things like
> > this mapping inside the driver for solving problems. By the way, using
> > SRIOV just adds more problems. It is not safe to use iommu=pt with 3.13.x
> > Ubuntu kernels.
> > 
> > It would be a good thing for the original patch to identify those kernels
> > where the problem was detected. Of course, there could be more kernels with
> > the same problem but that is more work to do.
> > 
> 
> Thanks,
> ferruh
  
Ferruh Yigit July 9, 2016, 7:09 a.m. UTC | #5
On 7/8/2016 6:27 PM, Thomas Monjalon wrote:

> 2016-05-11 18:24, Ferruh Yigit:
>> On 5/11/2016 8:35 AM, Alejandro Lucero wrote:
>>> On Tue, May 10, 2016 at 4:59 PM, Stephen Hemminger <
>>> stephen@networkplumber.org> wrote:
>>>
>>>> On Tue, 10 May 2016 19:21:41 +0800
>>>> Zhe Tao <zhe.tao@intel.com> wrote:
>>>>
>>>>> Problem:
>>>>> The following  operations will cause the igb_uio based DPDK
>>>>> operation failed.
>>>>> --Any device assignment through the kvm_assign_device interface,
>>>>> this can be the pci-assign method in QEMU
>>>>> --VFIO group attachment operation(attach to the container)
>>>>> this can happens in  vfio-pci assignment in QEMU
>>>>
>>>>
>>>> If you have an IOMMU why not use VFIO instead, it is better.
>>>>
>>>
>>> It is not about VFIO against UIO but about how iommu domains are created
>>> and destroyed by the (old) kernel when iommu=pt. So even with VFIO you can
>>> have problems.
>>
>> Problem is in IOMMU driver but we are adding a workaround to igb_uio, if
>> using VFIO solves the issue, I believe that is better workaround.
>>
>> 1) Is there any case IOMMU supported but VFIO is not supported? Is there
>> anything forces to use igb_uio?
>>
>> 2) Does using VFIO solves the issue defined in problem statement?
>>
>>>
>>> We have had problems like this and other due to our device (NFP) just
>>> mapping up to 40 bits of address space. Old kernels used in LTS
>>> distributions like Ubuntu are iommu buggy and you need to do things like
>>> this mapping inside the driver for solving problems. By the way, using
>>> SRIOV just adds more problems. It is not safe to use iommu=pt with 3.13.x
>>> Ubuntu kernels.
>>>
>>> It would be a good thing for the original patch to identify those kernels
>>> where the problem was detected. Of course, there could be more kernels with
>>> the same problem but that is more work to do.
>>>
>B
> Ping, this patch is stalled.
> 

I am for rejecting this patch.
The patch is useful for tester / developers who use both vfio and igb_uio.
But if end user has environment support to use vfio, she should use vfio
instead of having workaround to use both.

Thanks,
ferruh
  

Patch

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 45a5720..3fa88b0 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -327,6 +327,18 @@  igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	struct rte_uio_pci_dev *udev;
 	struct msix_entry msix_entry;
 	int err;
+	struct page *page;
+	/*
+	 * work around for Intel IOMMU implemation for SI doamin
+	 */
+
+	page = alloc_page(GFP_ATOMIC);
+	if (!page) {
+		dev_err(&dev->dev, "Cannot alloc page\n");
+	} else {
+		dma_map_page(&dev->dev, page, 0, PAGE_SIZE, DMA_FROM_DEVICE);
+		__free_page(page);
+	}
 
 	udev = kzalloc(sizeof(struct rte_uio_pci_dev), GFP_KERNEL);
 	if (!udev)