[dpdk-dev] bus/pci: forbid VA as IOVA mode if IOMMU address width too small

Message ID 20180108135127.25869-1-maxime.coquelin@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Maxime Coquelin Jan. 8, 2018, 1:51 p.m. UTC
  Intel VT-d supports different address widths for the IOVAs, from
39 bits to 56 bits.

While recent processors support at least 48 bits, VT-d emulation
currently only supports 39 bits. It makes DMA mapping to fail in this
case when using VA as IOVA mode, as user-space virtual addresses uses
up to 47 bits (see kernel's Documentation/x86/x86_64/mm.txt).

This patch parses VT-d CAP register value available in sysfs, and
forbid VA as IOVA mode if the GAW is 39 bits or unknown.

Fixes: f37dfab21c98 ("drivers/net: enable IOVA mode for Intel PMDs")

Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
Hi,

I'm not super happy with the patch as it does platform specific things in
generic code, but there are no placeholder for IOMMU/VT-d at the moment.

As this patch is to be backported to v17.11 LTS, it cannot be a big rework.

If you have some suggestion to improve it, please let me know.

The fix is quite urgent, as guest device assignment with vIOMMU is broken in
mainline & v17.11 LTS.

Advantage of this fix over forbidding VA as IOVA when running in emulation is
that VT-d emulation will soon support 48 bits, so this is future proof. Also,
VT-d spec supports 39 bits, so we could have physical CPUs supporting it, even
if I don't know any.

Thanks,
Maxime

 drivers/bus/pci/linux/pci.c | 98 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 89 insertions(+), 9 deletions(-)
  

Comments

Stephen Hemminger Jan. 8, 2018, 3:34 p.m. UTC | #1
On Mon,  8 Jan 2018 14:51:27 +0100
Maxime Coquelin <maxime.coquelin@redhat.com> wrote:

> Intel VT-d supports different address widths for the IOVAs, from
> 39 bits to 56 bits.
> 
> While recent processors support at least 48 bits, VT-d emulation
> currently only supports 39 bits. It makes DMA mapping to fail in this
> case when using VA as IOVA mode, as user-space virtual addresses uses
> up to 47 bits (see kernel's Documentation/x86/x86_64/mm.txt).
> 
> This patch parses VT-d CAP register value available in sysfs, and
> forbid VA as IOVA mode if the GAW is 39 bits or unknown.
> 
> Fixes: f37dfab21c98 ("drivers/net: enable IOVA mode for Intel PMDs")
> 
> Cc: stable@dpdk.org
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> Hi,
> 
> I'm not super happy with the patch as it does platform specific things in
> generic code, but there are no placeholder for IOMMU/VT-d at the moment.
> 
> As this patch is to be backported to v17.11 LTS, it cannot be a big rework.
> 
> If you have some suggestion to improve it, please let me know.
> 
> The fix is quite urgent, as guest device assignment with vIOMMU is broken in
> mainline & v17.11 LTS.
> 
> Advantage of this fix over forbidding VA as IOVA when running in emulation is
> that VT-d emulation will soon support 48 bits, so this is future proof. Also,
> VT-d spec supports 39 bits, so we could have physical CPUs supporting it, even
> if I don't know any.
> 
> Thanks,
> Maxime

You are assumming that if IOMMU is present that it is being used (ie VFIO).
What about the case of direct access to PF device via IGB_UIO?

> +static inline bool
> +pci_one_device_iommu_support_va(struct rte_pci_device *dev)
> +{

This is not in fast path, there is no reason it should be inline
  
Stephen Hemminger Jan. 8, 2018, 3:38 p.m. UTC | #2
On Mon,  8 Jan 2018 14:51:27 +0100
Maxime Coquelin <maxime.coquelin@redhat.com> wrote:

> +static inline bool
> +pci_one_device_iommu_support_va(struct rte_pci_device *dev)
> +{
> +#if defined(RTE_ARCH_PPC_64)
> +	return false;
> +#elif defined(RTE_ARCH_X86)
> +

The cleaner way to handle this kind of ifdef is:

#ifdef RTE_ARCH_X86
static bool
pci_one_device_iommu_support_va(struct rte_pci_device *dev)
{
....
}
#elif defined(RTE_ARCH_PPC_64) 
static inline bool
pci_one_device_iommu_support_va(struct rte_pci_device *dev)
{
	return false;
}
#endif

What about AMD64?
Do all ARM processors have IOMMU, I think not.
  
Maxime Coquelin Jan. 8, 2018, 3:48 p.m. UTC | #3
On 01/08/2018 04:34 PM, Stephen Hemminger wrote:
> On Mon,  8 Jan 2018 14:51:27 +0100
> Maxime Coquelin <maxime.coquelin@redhat.com> wrote:
> 
>> Intel VT-d supports different address widths for the IOVAs, from
>> 39 bits to 56 bits.
>>
>> While recent processors support at least 48 bits, VT-d emulation
>> currently only supports 39 bits. It makes DMA mapping to fail in this
>> case when using VA as IOVA mode, as user-space virtual addresses uses
>> up to 47 bits (see kernel's Documentation/x86/x86_64/mm.txt).
>>
>> This patch parses VT-d CAP register value available in sysfs, and
>> forbid VA as IOVA mode if the GAW is 39 bits or unknown.
>>
>> Fixes: f37dfab21c98 ("drivers/net: enable IOVA mode for Intel PMDs")
>>
>> Cc: stable@dpdk.org
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>> Hi,
>>
>> I'm not super happy with the patch as it does platform specific things in
>> generic code, but there are no placeholder for IOMMU/VT-d at the moment.
>>
>> As this patch is to be backported to v17.11 LTS, it cannot be a big rework.
>>
>> If you have some suggestion to improve it, please let me know.
>>
>> The fix is quite urgent, as guest device assignment with vIOMMU is broken in
>> mainline & v17.11 LTS.
>>
>> Advantage of this fix over forbidding VA as IOVA when running in emulation is
>> that VT-d emulation will soon support 48 bits, so this is future proof. Also,
>> VT-d spec supports 39 bits, so we could have physical CPUs supporting it, even
>> if I don't know any.
>>
>> Thanks,
>> Maxime
> 
> You are assumming that if IOMMU is present that it is being used (ie VFIO).
> What about the case of direct access to PF device via IGB_UIO?

As soon as one device is bound to UIO or VFIO in noiomu mode, PA as IOVA
mode will be selected.

This is done in rte_pci_get_iommu_class(), by calling
pci_one_device_bound_uio() and rte_vfio_noiommu_is_enabled().

>> +static inline bool
>> +pci_one_device_iommu_support_va(struct rte_pci_device *dev)
>> +{
> 
> This is not in fast path, there is no reason it should be inline
> 

Ok, I will remove inlining in v2. I added it for consistency with the
other functions declared above.

Thanks,
Maxime
  
Maxime Coquelin Jan. 8, 2018, 3:54 p.m. UTC | #4
On 01/08/2018 04:38 PM, Stephen Hemminger wrote:
> On Mon,  8 Jan 2018 14:51:27 +0100
> Maxime Coquelin <maxime.coquelin@redhat.com> wrote:
> 
>> +static inline bool
>> +pci_one_device_iommu_support_va(struct rte_pci_device *dev)
>> +{
>> +#if defined(RTE_ARCH_PPC_64)
>> +	return false;
>> +#elif defined(RTE_ARCH_X86)
>> +
> 
> The cleaner way to handle this kind of ifdef is:
> 
> #ifdef RTE_ARCH_X86
> static bool
> pci_one_device_iommu_support_va(struct rte_pci_device *dev)
> {
> ....
> }
> #elif defined(RTE_ARCH_PPC_64)
> static inline bool
> pci_one_device_iommu_support_va(struct rte_pci_device *dev)
> {
> 	return false;
> }
> #endif

Ok, thanks. I do this in v2.

> What about AMD64?

I haven't checked AMD64 spec yet.

> Do all ARM processors have IOMMU, I think not.

No, not all have an IOMMU, and I don't know if those which have one have 
such limitations.
But if they don't, they cannot use VFIO without noiommu enabled.

This patch only change behavior for Intel, and could be extended to
other HW if needed.

Regards,
Maxime
  
Maxime Coquelin Jan. 8, 2018, 4:48 p.m. UTC | #5
On 01/08/2018 04:54 PM, Maxime Coquelin wrote:
> 
> 
> On 01/08/2018 04:38 PM, Stephen Hemminger wrote:
>> On Mon,  8 Jan 2018 14:51:27 +0100
>> Maxime Coquelin <maxime.coquelin@redhat.com> wrote:
>>
>>> +static inline bool
>>> +pci_one_device_iommu_support_va(struct rte_pci_device *dev)
>>> +{
>>> +#if defined(RTE_ARCH_PPC_64)
>>> +    return false;
>>> +#elif defined(RTE_ARCH_X86)
>>> +
>>
>> The cleaner way to handle this kind of ifdef is:
>>
>> #ifdef RTE_ARCH_X86
>> static bool
>> pci_one_device_iommu_support_va(struct rte_pci_device *dev)
>> {
>> ....
>> }
>> #elif defined(RTE_ARCH_PPC_64)
>> static inline bool
>> pci_one_device_iommu_support_va(struct rte_pci_device *dev)
>> {
>>     return false;
>> }
>> #endif
> 
> Ok, thanks. I do this in v2.
> 
>> What about AMD64?
> 
> I haven't checked AMD64 spec yet.

 From AMD IOMMU spec (see [0], page 178), the only supported Guest
Virtual Address size is 48bits, so above the 47 bits of user VA on x86.

So in this regard, AMD IOMMU is compatible with using VA as IOVA.

Cheers,
Maxime

[0]: https://support.amd.com/TechDocs/48882_IOMMU.pdf
  

Patch

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 5da6728fb..292633ee2 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -576,6 +576,90 @@  pci_one_device_has_iova_va(void)
 	return 0;
 }
 
+static inline bool
+pci_one_device_iommu_support_va(struct rte_pci_device *dev)
+{
+#if defined(RTE_ARCH_PPC_64)
+	return false;
+#elif defined(RTE_ARCH_X86)
+
+#define VTD_CAP_SAGAW_SHIFT         8
+#define VTD_CAP_SAGAW_MASK          (0x1fULL << VTD_CAP_SAGAW_SHIFT)
+#define X86_VA_WIDTH 47 /* From Documentation/x86/x86_64/mm.txt */
+	struct rte_pci_addr *addr = &dev->addr;
+	char filename[PATH_MAX];
+	FILE *fp;
+	uint64_t sagaw, vtd_cap_reg = 0;
+	int guest_addr_width = 0;
+
+	snprintf(filename, sizeof(filename),
+		 "%s/" PCI_PRI_FMT "/iommu/intel-iommu/cap",
+		 rte_pci_get_sysfs_path(), addr->domain, addr->bus, addr->devid,
+		 addr->function);
+	if (access(filename, F_OK) == -1) {
+		/* We don't have an Intel IOMMU, assume VA supported*/
+		return true;
+	}
+
+	/* We have an intel IOMMU */
+	fp = fopen(filename, "r");
+	if (fp == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): can't open %s\n", __func__, filename);
+		return false;
+	}
+
+	if (fscanf(fp, "%lx", &vtd_cap_reg) != 1) {
+		RTE_LOG(ERR, EAL, "%s(): can't read %s\n", __func__, filename);
+		fclose(fp);
+		return false;
+	}
+
+	fclose(fp);
+
+	sagaw = (vtd_cap_reg & VTD_CAP_SAGAW_MASK) >> VTD_CAP_SAGAW_SHIFT;
+
+	switch (sagaw) {
+	case 2:
+		guest_addr_width = 39;
+		break;
+	case 4:
+		guest_addr_width = 48;
+		break;
+	case 6:
+		guest_addr_width = 56;
+		break;
+	default:
+		RTE_LOG(ERR, EAL, "Unkwown Intel IOMMU SAGAW value (%lx)\n",
+				sagaw);
+		break;
+	}
+
+	if (guest_addr_width < X86_VA_WIDTH)
+		return false;
+#endif
+	return true;
+}
+
+/*
+ * All devices IOMMUs support VA as IOVA
+ */
+static inline bool
+pci_devices_iommu_support_va(void)
+{
+	struct rte_pci_device *dev = NULL;
+	struct rte_pci_driver *drv = NULL;
+
+	FOREACH_DRIVER_ON_PCIBUS(drv) {
+		FOREACH_DEVICE_ON_PCIBUS(dev) {
+			if (!rte_pci_match(drv, dev))
+				continue;
+			if (!pci_one_device_iommu_support_va(dev))
+				return false;
+		}
+	}
+	return true;
+}
+
 /*
  * Get iommu class of PCI devices on the bus.
  */
@@ -586,12 +670,7 @@  rte_pci_get_iommu_class(void)
 	bool is_vfio_noiommu_enabled = true;
 	bool has_iova_va;
 	bool is_bound_uio;
-	bool spapr_iommu =
-#if defined(RTE_ARCH_PPC_64)
-		true;
-#else
-		false;
-#endif
+	bool iommu_no_va;
 
 	is_bound = pci_one_device_is_bound();
 	if (!is_bound)
@@ -599,13 +678,14 @@  rte_pci_get_iommu_class(void)
 
 	has_iova_va = pci_one_device_has_iova_va();
 	is_bound_uio = pci_one_device_bound_uio();
+	iommu_no_va = !pci_devices_iommu_support_va();
 #ifdef VFIO_PRESENT
 	is_vfio_noiommu_enabled = rte_vfio_noiommu_is_enabled() == true ?
 					true : false;
 #endif
 
 	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled &&
-			!spapr_iommu)
+			!iommu_no_va)
 		return RTE_IOVA_VA;
 
 	if (has_iova_va) {
@@ -614,8 +694,8 @@  rte_pci_get_iommu_class(void)
 			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
 		if (is_bound_uio)
 			RTE_LOG(WARNING, EAL, "few device bound to UIO\n");
-		if (spapr_iommu)
-			RTE_LOG(WARNING, EAL, "sPAPR IOMMU does not support IOVA as VA\n");
+		if (iommu_no_va)
+			RTE_LOG(WARNING, EAL, "IOMMU does not support IOVA as VA\n");
 	}
 
 	return RTE_IOVA_PA;