[dpdk-dev,2/2] net: enable IOVA mode for PMDs

Message ID 1507718028-12943-3-git-send-email-jianfeng.tan@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Jianfeng Tan Oct. 11, 2017, 10:33 a.m. UTC
  If we want to enable IOVA mode, introduced by
commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
we need PMDs (for PCI devices) to expose this flag.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/e1000/em_ethdev.c     | 3 ++-
 drivers/net/e1000/igb_ethdev.c    | 5 +++--
 drivers/net/fm10k/fm10k_ethdev.c  | 3 ++-
 drivers/net/i40e/i40e_ethdev.c    | 3 ++-
 drivers/net/i40e/i40e_ethdev_vf.c | 2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c  | 5 +++--
 6 files changed, 13 insertions(+), 8 deletions(-)
  

Comments

Anatoly Burakov Oct. 11, 2017, 10:43 a.m. UTC | #1
On 11-Oct-17 11:33 AM, Jianfeng Tan wrote:
> If we want to enable IOVA mode, introduced by
> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
> we need PMDs (for PCI devices) to expose this flag.
> 
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---

Is this the complete list of drivers which need this flag? Do other 
devices (e.g. cryptodev?) need this flag?
  
Jianfeng Tan Oct. 11, 2017, 10:56 a.m. UTC | #2
On 10/11/2017 6:43 PM, Burakov, Anatoly wrote:
> On 11-Oct-17 11:33 AM, Jianfeng Tan wrote:
>> If we want to enable IOVA mode, introduced by
>> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
>> we need PMDs (for PCI devices) to expose this flag.
>>
>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>> ---
>
> Is this the complete list of drivers which need this flag? Do other 
> devices (e.g. cryptodev?) need this flag?

No, these are just NICs from Intel (as an example). If other NICs want 
to enable this, I'm more than happy to cover it in v2.

Thanks,
Jianfeng
  
Anatoly Burakov Oct. 11, 2017, 11:30 a.m. UTC | #3
On 11-Oct-17 11:33 AM, Jianfeng Tan wrote:
> If we want to enable IOVA mode, introduced by
> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
> we need PMDs (for PCI devices) to expose this flag.
> 
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
  
Santosh Shukla Oct. 11, 2017, 11:33 a.m. UTC | #4
On Wednesday 11 October 2017 04:03 PM, Jianfeng Tan wrote:
> If we want to enable IOVA mode, introduced by
> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
> we need PMDs (for PCI devices) to expose this flag.
>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---

Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
  
Maxime Coquelin Jan. 5, 2018, 10:32 a.m. UTC | #5
Hi Jianfeng,

On 10/11/2017 12:33 PM, Jianfeng Tan wrote:
> If we want to enable IOVA mode, introduced by
> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
> we need PMDs (for PCI devices) to expose this flag.
> 
> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com>
> ---
>   drivers/net/e1000/em_ethdev.c     | 3 ++-
>   drivers/net/e1000/igb_ethdev.c    | 5 +++--
>   drivers/net/fm10k/fm10k_ethdev.c  | 3 ++-
>   drivers/net/i40e/i40e_ethdev.c    | 3 ++-
>   drivers/net/i40e/i40e_ethdev_vf.c | 2 +-
>   drivers/net/ixgbe/ixgbe_ethdev.c  | 5 +++--
>   6 files changed, 13 insertions(+), 8 deletions(-)

This patch introduces a regression when doing device assignment in
guest, because current VT-d emulation only supports 39bits guest address
width [0].

In the Bz, Peter suggest we could have an IOVA allocator algorithm,
which could start to allocate IOVAs from 0. I think it could solve the
--no-huge case your series address, do you agree?

But it would be a long term solution, we need to fix this in stable.

Is the --no-huge option used in production, or is it only for testing?
If the latter do you think we could revert your patch while we find a
solution that makes all cases to work?

Ferruh, I see you also faced problems with KNI, how did you solved it?

Thanks,
Maxime

[0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3
  
Maxime Coquelin Jan. 5, 2018, 12:04 p.m. UTC | #6
On 01/05/2018 11:32 AM, Maxime Coquelin wrote:
> Hi Jianfeng,
> 
> On 10/11/2017 12:33 PM, Jianfeng Tan wrote:
>> If we want to enable IOVA mode, introduced by
>> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
>> we need PMDs (for PCI devices) to expose this flag.
>>
>> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com>
>> ---
>>   drivers/net/e1000/em_ethdev.c     | 3 ++-
>>   drivers/net/e1000/igb_ethdev.c    | 5 +++--
>>   drivers/net/fm10k/fm10k_ethdev.c  | 3 ++-
>>   drivers/net/i40e/i40e_ethdev.c    | 3 ++-
>>   drivers/net/i40e/i40e_ethdev_vf.c | 2 +-
>>   drivers/net/ixgbe/ixgbe_ethdev.c  | 5 +++--
>>   6 files changed, 13 insertions(+), 8 deletions(-)
> 
> This patch introduces a regression when doing device assignment in
> guest, because current VT-d emulation only supports 39bits guest address
> width [0].
> 
> In the Bz, Peter suggest we could have an IOVA allocator algorithm,
> which could start to allocate IOVAs from 0. I think it could solve the
> --no-huge case your series address, do you agree?
> 
> But it would be a long term solution, we need to fix this in stable.
> 
> Is the --no-huge option used in production, or is it only for testing?
> If the latter do you think we could revert your patch while we find a
> solution that makes all cases to work?

It seems that we can get Intel IOMMU's Guest Address Width from the
sysfs, as the CAP register is exposed.

So we can get the SAGAW value (see [1], page 217):

On Bare Metal:
# echo $(((0x`cat /sys/class/iommu/dmar0/intel-iommu/cap` >> 8) & 0x1f))
4
=> 48bits

In guest:
# echo $(((0x`cat /sys/class/iommu/dmar0/intel-iommu/cap` >> 8) & 0x1f))
2
=> 39bits

Using this, we could or not allow the VA mode when using Intel IOMMU.
Any thoughts?

Regards,
Maxime

[1]: 
https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf

> Ferruh, I see you also faced problems with KNI, how did you solved it?
> 
> Thanks,
> Maxime
> 
> [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3
  
Santosh Shukla Jan. 5, 2018, 12:10 p.m. UTC | #7
Hi Maxim,


On Friday 05 January 2018 04:02 PM, Maxime Coquelin wrote:
> Hi Jianfeng,
>
> On 10/11/2017 12:33 PM, Jianfeng Tan wrote:
>> If we want to enable IOVA mode, introduced by
>> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
>> we need PMDs (for PCI devices) to expose this flag.
>>
>> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com>
>> ---

[...]

> Ferruh, I see you also faced problems with KNI, how did you solved it?
>
By checking lsmod for rte_kni module and if found then set .iova_mode = _pa, refer [1].
You may follow similar approach.. meaning detect emulation mode Or if not then
other-way to introduce --iova-mode=<> eal arg.

[1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n810

Thanks.

> Thanks,
> Maxime
>
> [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3
  
Maxime Coquelin Jan. 5, 2018, 12:57 p.m. UTC | #8
Hi Santosh

On 01/05/2018 01:10 PM, santosh wrote:
> Hi Maxim,
> 
> 
> On Friday 05 January 2018 04:02 PM, Maxime Coquelin wrote:
>> Hi Jianfeng,
>>
>> On 10/11/2017 12:33 PM, Jianfeng Tan wrote:
>>> If we want to enable IOVA mode, introduced by
>>> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
>>> we need PMDs (for PCI devices) to expose this flag.
>>>
>>> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com>
>>> ---
> 
> [...]
> 
>> Ferruh, I see you also faced problems with KNI, how did you solved it?
>>
> By checking lsmod for rte_kni module and if found then set .iova_mode = _pa, refer [1].
> You may follow similar approach.. meaning detect emulation mode Or if not then
> other-way to introduce --iova-mode=<> eal arg.

Thanks for the information

Detecting whether we are in host or guest is not that trivial, and as 
Peter pointed me out, the VT-d specifies the 39bits guest address width
so there are certainly some processors in the wild using it.

And I don't think introducing a new EAL arg in -stable is a good idea.
If this is the only solution, then we should keep PA by default.

When using intel IOMMU, I think the best solution is to forbid VA mode
if GAW is 39 bits.

Regards,
Maxime

> [1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n810
> 
> Thanks.
> 
>> Thanks,
>> Maxime
>>
>> [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3
>
  

Patch

diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index a59947d..324f051 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -432,7 +432,8 @@  static int eth_em_pci_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_em_pmd = {
 	.id_table = pci_id_em_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+		     RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_em_pci_probe,
 	.remove = eth_em_pci_remove,
 };
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 040dd9f..a760011 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -1168,7 +1168,8 @@  static int eth_igb_pci_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_igb_pmd = {
 	.id_table = pci_id_igb_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+		     RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_igb_pci_probe,
 	.remove = eth_igb_pci_remove,
 };
@@ -1191,7 +1192,7 @@  static int eth_igbvf_pci_remove(struct rte_pci_device *pci_dev)
  */
 static struct rte_pci_driver rte_igbvf_pmd = {
 	.id_table = pci_id_igbvf_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_igbvf_pci_probe,
 	.remove = eth_igbvf_pci_remove,
 };
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 15ea2a5..bf36e71 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -3142,7 +3142,8 @@  static const struct rte_pci_id pci_id_fm10k_map[] = {
 
 static struct rte_pci_driver rte_pmd_fm10k = {
 	.id_table = pci_id_fm10k_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+		     RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_fm10k_pci_probe,
 	.remove = eth_fm10k_pci_remove,
 };
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 536365d..f6330d2 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -654,7 +654,8 @@  static int eth_i40e_pci_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_i40e_pmd = {
 	.id_table = pci_id_i40e_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+		     RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_i40e_pci_probe,
 	.remove = eth_i40e_pci_remove,
 };
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 111ac39..4cadf83 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1527,7 +1527,7 @@  static int eth_i40evf_pci_remove(struct rte_pci_device *pci_dev)
  */
 static struct rte_pci_driver rte_i40evf_pmd = {
 	.id_table = pci_id_i40evf_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_i40evf_pci_probe,
 	.remove = eth_i40evf_pci_remove,
 };
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index a7d7acc..6ad28b3 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1781,7 +1781,8 @@  static int eth_ixgbe_pci_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_ixgbe_pmd = {
 	.id_table = pci_id_ixgbe_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+		     RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_ixgbe_pci_probe,
 	.remove = eth_ixgbe_pci_remove,
 };
@@ -1803,7 +1804,7 @@  static int eth_ixgbevf_pci_remove(struct rte_pci_device *pci_dev)
  */
 static struct rte_pci_driver rte_ixgbevf_pmd = {
 	.id_table = pci_id_ixgbevf_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_ixgbevf_pci_probe,
 	.remove = eth_ixgbevf_pci_remove,
 };