[dpdk-dev,1/2] eal: honor IOVA mode for no-huge case

Message ID 1507718028-12943-2-git-send-email-jianfeng.tan@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Jianfeng Tan Oct. 11, 2017, 10:33 a.m. UTC
  With the introduction of IOVA mode, the only blocker to run
with 4KB pages for NICs binding to vfio-pci, is that
RTE_BAD_PHYS_ADDR is not a valid IOVA address.

We can refine this by using VA as IOVA if it's IOVA mode.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
  

Comments

Anatoly Burakov Oct. 11, 2017, 11:27 a.m. UTC | #1
On 11-Oct-17 11:33 AM, Jianfeng Tan wrote:
> With the introduction of IOVA mode, the only blocker to run
> with 4KB pages for NICs binding to vfio-pci, is that
> RTE_BAD_PHYS_ADDR is not a valid IOVA address.
> 
> We can refine this by using VA as IOVA if it's IOVA mode.
> 
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 28bca49..187d338 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1030,7 +1030,10 @@ rte_eal_hugepage_init(void)
>   					strerror(errno));
>   			return -1;
>   		}
> -		mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
> +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
> +			mcfg->memseg[0].phys_addr = (uintptr_t)addr;
> +		else
> +			mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
>   		mcfg->memseg[0].addr = addr;
>   		mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
>   		mcfg->memseg[0].len = internal_config.memory;
> 
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
  
Santosh Shukla Oct. 11, 2017, 11:30 a.m. UTC | #2
On Wednesday 11 October 2017 04:03 PM, Jianfeng Tan wrote:
> With the introduction of IOVA mode, the only blocker to run
> with 4KB pages for NICs binding to vfio-pci, is that
> RTE_BAD_PHYS_ADDR is not a valid IOVA address.
>
> We can refine this by using VA as IOVA if it's IOVA mode.
>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---

Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
  
Ferruh Yigit Oct. 31, 2017, 9:49 p.m. UTC | #3
On 10/11/2017 3:33 AM, Jianfeng Tan wrote:
> With the introduction of IOVA mode, the only blocker to run
> with 4KB pages for NICs binding to vfio-pci, is that
> RTE_BAD_PHYS_ADDR is not a valid IOVA address.
> 
> We can refine this by using VA as IOVA if it's IOVA mode.
> 
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
>  lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 28bca49..187d338 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1030,7 +1030,10 @@ rte_eal_hugepage_init(void)
>  					strerror(errno));
>  			return -1;
>  		}
> -		mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
> +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
> +			mcfg->memseg[0].phys_addr = (uintptr_t)addr;
> +		else
> +			mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;

This breaks KNI which requires physical address.

Any idea how to disable RTE_IOVA_VA when KNI used?

>  		mcfg->memseg[0].addr = addr;
>  		mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
>  		mcfg->memseg[0].len = internal_config.memory;
>
  
Ferruh Yigit Oct. 31, 2017, 10:37 p.m. UTC | #4
On 10/31/2017 2:49 PM, Ferruh Yigit wrote:
> On 10/11/2017 3:33 AM, Jianfeng Tan wrote:
>> With the introduction of IOVA mode, the only blocker to run
>> with 4KB pages for NICs binding to vfio-pci, is that
>> RTE_BAD_PHYS_ADDR is not a valid IOVA address.
>>
>> We can refine this by using VA as IOVA if it's IOVA mode.
>>
>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>> ---
>>  lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> index 28bca49..187d338 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> @@ -1030,7 +1030,10 @@ rte_eal_hugepage_init(void)
>>  					strerror(errno));
>>  			return -1;
>>  		}
>> -		mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
>> +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
>> +			mcfg->memseg[0].phys_addr = (uintptr_t)addr;
>> +		else
>> +			mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
> 
> This breaks KNI which requires physical address.

My bad, this patch is for no_hugetlbfs case.

Issue seen starting from next patch in the set [1], which enables IOVA mode for
Intel PMDs.

With IOVA mode enabled, KNI fails.

Does it make sense to add an API to set iova mode explicitly by application?
Application can set iova to PA and allocate memzones it requires.

[1]
http://dpdk.org/commit/f37dfab2

> 
> Any idea how to disable RTE_IOVA_VA when KNI used?
> 
>>  		mcfg->memseg[0].addr = addr;
>>  		mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
>>  		mcfg->memseg[0].len = internal_config.memory;
>>
>
  
Ferruh Yigit Nov. 1, 2017, 1:10 a.m. UTC | #5
On 10/31/2017 3:37 PM, Ferruh Yigit wrote:
> On 10/31/2017 2:49 PM, Ferruh Yigit wrote:
>> On 10/11/2017 3:33 AM, Jianfeng Tan wrote:
>>> With the introduction of IOVA mode, the only blocker to run
>>> with 4KB pages for NICs binding to vfio-pci, is that
>>> RTE_BAD_PHYS_ADDR is not a valid IOVA address.
>>>
>>> We can refine this by using VA as IOVA if it's IOVA mode.
>>>
>>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>>> ---
>>>  lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++-
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> index 28bca49..187d338 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> @@ -1030,7 +1030,10 @@ rte_eal_hugepage_init(void)
>>>  					strerror(errno));
>>>  			return -1;
>>>  		}
>>> -		mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
>>> +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
>>> +			mcfg->memseg[0].phys_addr = (uintptr_t)addr;
>>> +		else
>>> +			mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
>>
>> This breaks KNI which requires physical address.
> 
> My bad, this patch is for no_hugetlbfs case.
> 
> Issue seen starting from next patch in the set [1], which enables IOVA mode for
> Intel PMDs.
> 
> With IOVA mode enabled, KNI fails.
> 
> Does it make sense to add an API to set iova mode explicitly by application?
> Application can set iova to PA and allocate memzones it requires.

Added config option to disable IOVA mode detection:
http://dpdk.org/dev/patchwork/patch/31071/

Still concerned if this may hit someone, since the result for KNI is a kernel
crash it would be nice to have more solid protection here.

And suggestion welcome.

Thanks,
ferruh

> 
> [1]
> http://dpdk.org/commit/f37dfab2
> 
>>
>> Any idea how to disable RTE_IOVA_VA when KNI used?
>>
>>>  		mcfg->memseg[0].addr = addr;
>>>  		mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
>>>  		mcfg->memseg[0].len = internal_config.memory;
>>>
>>
>
  

Patch

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 28bca49..187d338 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1030,7 +1030,10 @@  rte_eal_hugepage_init(void)
 					strerror(errno));
 			return -1;
 		}
-		mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			mcfg->memseg[0].phys_addr = (uintptr_t)addr;
+		else
+			mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
 		mcfg->memseg[0].addr = addr;
 		mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
 		mcfg->memseg[0].len = internal_config.memory;