[dpdk-dev,2/2] net: enable IOVA mode for PMDs
Checks
Commit Message
If we want to enable IOVA mode, introduced by
commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
we need PMDs (for PCI devices) to expose this flag.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
drivers/net/e1000/em_ethdev.c | 3 ++-
drivers/net/e1000/igb_ethdev.c | 5 +++--
drivers/net/fm10k/fm10k_ethdev.c | 3 ++-
drivers/net/i40e/i40e_ethdev.c | 3 ++-
drivers/net/i40e/i40e_ethdev_vf.c | 2 +-
drivers/net/ixgbe/ixgbe_ethdev.c | 5 +++--
6 files changed, 13 insertions(+), 8 deletions(-)
Comments
On 11-Oct-17 11:33 AM, Jianfeng Tan wrote:
> If we want to enable IOVA mode, introduced by
> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
> we need PMDs (for PCI devices) to expose this flag.
>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
Is this the complete list of drivers which need this flag? Do other
devices (e.g. cryptodev?) need this flag?
On 10/11/2017 6:43 PM, Burakov, Anatoly wrote:
> On 11-Oct-17 11:33 AM, Jianfeng Tan wrote:
>> If we want to enable IOVA mode, introduced by
>> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
>> we need PMDs (for PCI devices) to expose this flag.
>>
>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>> ---
>
> Is this the complete list of drivers which need this flag? Do other
> devices (e.g. cryptodev?) need this flag?
No, these are just NICs from Intel (as an example). If other NICs want
to enable this, I'm more than happy to cover it in v2.
Thanks,
Jianfeng
On 11-Oct-17 11:33 AM, Jianfeng Tan wrote:
> If we want to enable IOVA mode, introduced by
> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
> we need PMDs (for PCI devices) to expose this flag.
>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
On Wednesday 11 October 2017 04:03 PM, Jianfeng Tan wrote:
> If we want to enable IOVA mode, introduced by
> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
> we need PMDs (for PCI devices) to expose this flag.
>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Hi Jianfeng,
On 10/11/2017 12:33 PM, Jianfeng Tan wrote:
> If we want to enable IOVA mode, introduced by
> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
> we need PMDs (for PCI devices) to expose this flag.
>
> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com>
> ---
> drivers/net/e1000/em_ethdev.c | 3 ++-
> drivers/net/e1000/igb_ethdev.c | 5 +++--
> drivers/net/fm10k/fm10k_ethdev.c | 3 ++-
> drivers/net/i40e/i40e_ethdev.c | 3 ++-
> drivers/net/i40e/i40e_ethdev_vf.c | 2 +-
> drivers/net/ixgbe/ixgbe_ethdev.c | 5 +++--
> 6 files changed, 13 insertions(+), 8 deletions(-)
This patch introduces a regression when doing device assignment in
guest, because current VT-d emulation only supports 39bits guest address
width [0].
In the Bz, Peter suggest we could have an IOVA allocator algorithm,
which could start to allocate IOVAs from 0. I think it could solve the
--no-huge case your series address, do you agree?
But it would be a long term solution, we need to fix this in stable.
Is the --no-huge option used in production, or is it only for testing?
If the latter do you think we could revert your patch while we find a
solution that makes all cases to work?
Ferruh, I see you also faced problems with KNI, how did you solved it?
Thanks,
Maxime
[0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3
On 01/05/2018 11:32 AM, Maxime Coquelin wrote:
> Hi Jianfeng,
>
> On 10/11/2017 12:33 PM, Jianfeng Tan wrote:
>> If we want to enable IOVA mode, introduced by
>> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
>> we need PMDs (for PCI devices) to expose this flag.
>>
>> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com>
>> ---
>> drivers/net/e1000/em_ethdev.c | 3 ++-
>> drivers/net/e1000/igb_ethdev.c | 5 +++--
>> drivers/net/fm10k/fm10k_ethdev.c | 3 ++-
>> drivers/net/i40e/i40e_ethdev.c | 3 ++-
>> drivers/net/i40e/i40e_ethdev_vf.c | 2 +-
>> drivers/net/ixgbe/ixgbe_ethdev.c | 5 +++--
>> 6 files changed, 13 insertions(+), 8 deletions(-)
>
> This patch introduces a regression when doing device assignment in
> guest, because current VT-d emulation only supports 39bits guest address
> width [0].
>
> In the Bz, Peter suggest we could have an IOVA allocator algorithm,
> which could start to allocate IOVAs from 0. I think it could solve the
> --no-huge case your series address, do you agree?
>
> But it would be a long term solution, we need to fix this in stable.
>
> Is the --no-huge option used in production, or is it only for testing?
> If the latter do you think we could revert your patch while we find a
> solution that makes all cases to work?
It seems that we can get Intel IOMMU's Guest Address Width from the
sysfs, as the CAP register is exposed.
So we can get the SAGAW value (see [1], page 217):
On Bare Metal:
# echo $(((0x`cat /sys/class/iommu/dmar0/intel-iommu/cap` >> 8) & 0x1f))
4
=> 48bits
In guest:
# echo $(((0x`cat /sys/class/iommu/dmar0/intel-iommu/cap` >> 8) & 0x1f))
2
=> 39bits
Using this, we could or not allow the VA mode when using Intel IOMMU.
Any thoughts?
Regards,
Maxime
[1]:
https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf
> Ferruh, I see you also faced problems with KNI, how did you solved it?
>
> Thanks,
> Maxime
>
> [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3
Hi Maxim,
On Friday 05 January 2018 04:02 PM, Maxime Coquelin wrote:
> Hi Jianfeng,
>
> On 10/11/2017 12:33 PM, Jianfeng Tan wrote:
>> If we want to enable IOVA mode, introduced by
>> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
>> we need PMDs (for PCI devices) to expose this flag.
>>
>> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com>
>> ---
[...]
> Ferruh, I see you also faced problems with KNI, how did you solved it?
>
By checking lsmod for rte_kni module and if found then set .iova_mode = _pa, refer [1].
You may follow similar approach.. meaning detect emulation mode Or if not then
other-way to introduce --iova-mode=<> eal arg.
[1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n810
Thanks.
> Thanks,
> Maxime
>
> [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3
Hi Santosh
On 01/05/2018 01:10 PM, santosh wrote:
> Hi Maxim,
>
>
> On Friday 05 January 2018 04:02 PM, Maxime Coquelin wrote:
>> Hi Jianfeng,
>>
>> On 10/11/2017 12:33 PM, Jianfeng Tan wrote:
>>> If we want to enable IOVA mode, introduced by
>>> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
>>> we need PMDs (for PCI devices) to expose this flag.
>>>
>>> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com>
>>> ---
>
> [...]
>
>> Ferruh, I see you also faced problems with KNI, how did you solved it?
>>
> By checking lsmod for rte_kni module and if found then set .iova_mode = _pa, refer [1].
> You may follow similar approach.. meaning detect emulation mode Or if not then
> other-way to introduce --iova-mode=<> eal arg.
Thanks for the information
Detecting whether we are in host or guest is not that trivial, and as
Peter pointed me out, the VT-d specifies the 39bits guest address width
so there are certainly some processors in the wild using it.
And I don't think introducing a new EAL arg in -stable is a good idea.
If this is the only solution, then we should keep PA by default.
When using intel IOMMU, I think the best solution is to forbid VA mode
if GAW is 39 bits.
Regards,
Maxime
> [1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n810
>
> Thanks.
>
>> Thanks,
>> Maxime
>>
>> [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3
>
@@ -432,7 +432,8 @@ static int eth_em_pci_remove(struct rte_pci_device *pci_dev)
static struct rte_pci_driver rte_em_pmd = {
.id_table = pci_id_em_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+ RTE_PCI_DRV_IOVA_AS_VA,
.probe = eth_em_pci_probe,
.remove = eth_em_pci_remove,
};
@@ -1168,7 +1168,8 @@ static int eth_igb_pci_remove(struct rte_pci_device *pci_dev)
static struct rte_pci_driver rte_igb_pmd = {
.id_table = pci_id_igb_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+ RTE_PCI_DRV_IOVA_AS_VA,
.probe = eth_igb_pci_probe,
.remove = eth_igb_pci_remove,
};
@@ -1191,7 +1192,7 @@ static int eth_igbvf_pci_remove(struct rte_pci_device *pci_dev)
*/
static struct rte_pci_driver rte_igbvf_pmd = {
.id_table = pci_id_igbvf_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA,
.probe = eth_igbvf_pci_probe,
.remove = eth_igbvf_pci_remove,
};
@@ -3142,7 +3142,8 @@ static const struct rte_pci_id pci_id_fm10k_map[] = {
static struct rte_pci_driver rte_pmd_fm10k = {
.id_table = pci_id_fm10k_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+ RTE_PCI_DRV_IOVA_AS_VA,
.probe = eth_fm10k_pci_probe,
.remove = eth_fm10k_pci_remove,
};
@@ -654,7 +654,8 @@ static int eth_i40e_pci_remove(struct rte_pci_device *pci_dev)
static struct rte_pci_driver rte_i40e_pmd = {
.id_table = pci_id_i40e_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+ RTE_PCI_DRV_IOVA_AS_VA,
.probe = eth_i40e_pci_probe,
.remove = eth_i40e_pci_remove,
};
@@ -1527,7 +1527,7 @@ static int eth_i40evf_pci_remove(struct rte_pci_device *pci_dev)
*/
static struct rte_pci_driver rte_i40evf_pmd = {
.id_table = pci_id_i40evf_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA,
.probe = eth_i40evf_pci_probe,
.remove = eth_i40evf_pci_remove,
};
@@ -1781,7 +1781,8 @@ static int eth_ixgbe_pci_remove(struct rte_pci_device *pci_dev)
static struct rte_pci_driver rte_ixgbe_pmd = {
.id_table = pci_id_ixgbe_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+ RTE_PCI_DRV_IOVA_AS_VA,
.probe = eth_ixgbe_pci_probe,
.remove = eth_ixgbe_pci_remove,
};
@@ -1803,7 +1804,7 @@ static int eth_ixgbevf_pci_remove(struct rte_pci_device *pci_dev)
*/
static struct rte_pci_driver rte_ixgbevf_pmd = {
.id_table = pci_id_ixgbevf_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA,
.probe = eth_ixgbevf_pci_probe,
.remove = eth_ixgbevf_pci_remove,
};