[dpdk-dev] eal: disable IOVA mode detection by default

Message ID 20171101010726.17781-1-ferruh.yigit@intel.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Ferruh Yigit Nov. 1, 2017, 1:07 a.m. UTC
  Fix kernel crash with KNI because KNI requires physical addresses.

A config option introduced to disable IOVA mode detection and to set it
to physical address by default. Disabling config option will enable IOVA
mode detection.

When there is no intension to use KNI, it is safe to enable detection.

Config option disable IOVA mode detection by default to be sure only who
is aware of result enable it.

Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
Cc: Jianfeng Tan <jianfeng.tan@intel.com>
Cc: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Cc: Thomas Monjalon <thomas@monjalon.net>
---
 config/common_base                | 5 +++++
 lib/librte_eal/bsdapp/eal/eal.c   | 4 ++++
 lib/librte_eal/linuxapp/eal/eal.c | 4 ++++
 3 files changed, 13 insertions(+)
  

Comments

Jianfeng Tan Nov. 1, 2017, 2:17 a.m. UTC | #1
> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Wednesday, November 1, 2017 9:07 AM
> To: Thomas Monjalon; Richardson, Bruce; Gonzalez Monroy, Sergio
> Cc: dev@dpdk.org; Yigit, Ferruh; Tan, Jianfeng; Santosh Shukla
> Subject: [PATCH] eal: disable IOVA mode detection by default
> 
> Fix kernel crash with KNI because KNI requires physical addresses.
> 
> A config option introduced to disable IOVA mode detection and to set it
> to physical address by default. Disabling config option will enable IOVA
> mode detection.
> 
> When there is no intension to use KNI, it is safe to enable detection.
> 
> Config option disable IOVA mode detection by default to be sure only who
> is aware of result enable it.
> 
> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> Cc: Jianfeng Tan <jianfeng.tan@intel.com>
> Cc: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>

Refer to how vhost-kernel works, we may leverage a memory region table to do the translation. The bad side is it's less efficient than current phys_to_virt.

Another side, we did not check the result of phys_to_virt, that's why kernel crashes.

Thanks,
Jianfeng
  
Santosh Shukla Nov. 1, 2017, 3:54 a.m. UTC | #2
On Wednesday 01 November 2017 06:37 AM, Ferruh Yigit wrote:
> Fix kernel crash with KNI because KNI requires physical addresses.
>
> A config option introduced to disable IOVA mode detection and to set it
> to physical address by default. Disabling config option will enable IOVA
> mode detection.
>
> When there is no intension to use KNI, it is safe to enable detection.
>
> Config option disable IOVA mode detection by default to be sure only who
> is aware of result enable it.
>
> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")
>
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---

Disabling _IOVA in config (static-way) defeats the dynamic iova autodetect purpose.

HW facilitates iova=pa/va address space so pmd's should able
to adapt for both the programming mode.. meaning Perhaps phy2virt
style translation might address the KNI crash. For that
KNI could check iova mode by referring to api rte_eal_iova_mode() and then
do the phy2virt translation.
  
Jerin Jacob Nov. 1, 2017, 4:02 a.m. UTC | #3
-----Original Message-----
> Date: Wed, 1 Nov 2017 01:07:26 +0000
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> To: Thomas Monjalon <thomas@monjalon.net>, Bruce Richardson
>  <bruce.richardson@intel.com>, Sergio Gonzalez Monroy
>  <sergio.gonzalez.monroy@intel.com>
> CC: dev@dpdk.org, Ferruh Yigit <ferruh.yigit@intel.com>, Jianfeng Tan
>  <jianfeng.tan@intel.com>, Santosh Shukla
>  <santosh.shukla@caviumnetworks.com>
> Subject: [dpdk-dev] [PATCH] eal: disable IOVA mode detection by default
> X-Mailer: git-send-email 2.13.6
> 
> Fix kernel crash with KNI because KNI requires physical addresses.

The actual fix would be to make KNI IOMMU aware based on the DPDK mode.

ie. On slow path,

/* Get iommu domain for iova to physical addr conversion */             
if (rte_eal_iova_mode() == RTE_IOVA_VA)
	kni->iommu_domain = iommu_get_domain_for_dev(dev);
else
	kni->iommu_domain = NULL;

On fast path,

static inline u64 kni_iova_to_phys(struct ... *kni, dma_addr_t dma_addr)    
{                                                                               
        /* Translation is installed only when IOMMU is present */               
        if (kni->iommu_domain)                                                  
                return iommu_iova_to_phys(kni->iommu_domain, dma_addr);         
        return dma_addr;                                                        
} 

> 
> A config option introduced to disable IOVA mode detection and to set it
> to physical address by default. Disabling config option will enable IOVA
> mode detection.
> 
> When there is no intension to use KNI, it is safe to enable detection.
> 
> Config option disable IOVA mode detection by default to be sure only who
> is aware of result enable it.
> 
> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> Cc: Jianfeng Tan <jianfeng.tan@intel.com>
> Cc: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>
> ---
>  config/common_base                | 5 +++++
>  lib/librte_eal/bsdapp/eal/eal.c   | 4 ++++
>  lib/librte_eal/linuxapp/eal/eal.c | 4 ++++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/config/common_base b/config/common_base
> index 82ee75456..903e7685b 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -107,6 +107,11 @@ CONFIG_RTE_MALLOC_DEBUG=n
>  CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
>  
>  #
> +# Disabling PHYS_IOVA may crash kernel for KNI, use with caution
> +#
> +CONFIG_RTE_EAL_USE_PHYS_IOVA=y

Defeat the purpose of all dynamic probing scheme.
Either we can fix the KNI or revert the following patch for this release.

http://dpdk.org/commit/f37dfab2
  
Ferruh Yigit Nov. 1, 2017, 6:21 a.m. UTC | #4
On 10/31/2017 9:02 PM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Wed, 1 Nov 2017 01:07:26 +0000
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> To: Thomas Monjalon <thomas@monjalon.net>, Bruce Richardson
>>  <bruce.richardson@intel.com>, Sergio Gonzalez Monroy
>>  <sergio.gonzalez.monroy@intel.com>
>> CC: dev@dpdk.org, Ferruh Yigit <ferruh.yigit@intel.com>, Jianfeng Tan
>>  <jianfeng.tan@intel.com>, Santosh Shukla
>>  <santosh.shukla@caviumnetworks.com>
>> Subject: [dpdk-dev] [PATCH] eal: disable IOVA mode detection by default
>> X-Mailer: git-send-email 2.13.6
>>
>> Fix kernel crash with KNI because KNI requires physical addresses.
> 
> The actual fix would be to make KNI IOMMU aware based on the DPDK mode.
> 
> ie. On slow path,
> 
> /* Get iommu domain for iova to physical addr conversion */             
> if (rte_eal_iova_mode() == RTE_IOVA_VA)
> 	kni->iommu_domain = iommu_get_domain_for_dev(dev);
> else
> 	kni->iommu_domain = NULL;
> 
> On fast path,
> 
> static inline u64 kni_iova_to_phys(struct ... *kni, dma_addr_t dma_addr)    
> {                                                                               
>         /* Translation is installed only when IOMMU is present */               
>         if (kni->iommu_domain)                                                  
>                 return iommu_iova_to_phys(kni->iommu_domain, dma_addr);         
>         return dma_addr;                                                        
> } 
> 
>>
>> A config option introduced to disable IOVA mode detection and to set it
>> to physical address by default. Disabling config option will enable IOVA
>> mode detection.
>>
>> When there is no intension to use KNI, it is safe to enable detection.
>>
>> Config option disable IOVA mode detection by default to be sure only who
>> is aware of result enable it.
>>
>> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")
>>
>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>> ---
>> Cc: Jianfeng Tan <jianfeng.tan@intel.com>
>> Cc: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Cc: Thomas Monjalon <thomas@monjalon.net>
>> ---
>>  config/common_base                | 5 +++++
>>  lib/librte_eal/bsdapp/eal/eal.c   | 4 ++++
>>  lib/librte_eal/linuxapp/eal/eal.c | 4 ++++
>>  3 files changed, 13 insertions(+)
>>
>> diff --git a/config/common_base b/config/common_base
>> index 82ee75456..903e7685b 100644
>> --- a/config/common_base
>> +++ b/config/common_base
>> @@ -107,6 +107,11 @@ CONFIG_RTE_MALLOC_DEBUG=n
>>  CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
>>  
>>  #
>> +# Disabling PHYS_IOVA may crash kernel for KNI, use with caution
>> +#
>> +CONFIG_RTE_EAL_USE_PHYS_IOVA=y
> 
> Defeat the purpose of all dynamic probing scheme.
> Either we can fix the KNI or revert the following patch for this release.
> 
> http://dpdk.org/commit/f37dfab2

This commit just enables IOVA VA mode for Intel drivers, that is how I can able
to observe the issue, but it is not the source of the problem. Reverting that
commit will not solve KNI crash with any other PMD that enables IOVA VA mode.

Related to the KNI, iommu is not involved at all, I am not clear with your above
suggestion, physical address is required for kernelspace - userspace communication.

KNI uses physical address for two things:

1- Creates a buffer in userspace, a memzone, and shares its physical address to
the kernel, so that both kernel and app can access same buffer.

The question is, even all devices supports IOVA VA mode, why creating a memzone
that has real physical address info is not possible anymore?
For KNI case this is not related at all with what devices supports.

2- For each mbuf that will be sent to kernel, dpdk app puts physical address of
that mbuf into shared buffer, so that kernel can access it. So that kernel can
access to mbuf data that can be coming from any mempool.
For this the physical address of the mbuf is required.


Overall KNI needs to know physical address of mbufs and memzone.

This patch provides a way to have old behavior with a config option, so that KNI
can work and anyone who needs to create memzone with physical address can a way
to have it, and all PMDs will work fine.

If there is no need for these, it is possible to disable config and dynamic
probing scheme is already there.

Thanks,
ferruh
  
Ferruh Yigit Nov. 1, 2017, 6:32 a.m. UTC | #5
On 10/31/2017 7:17 PM, Tan, Jianfeng wrote:
> 
> 
>> -----Original Message-----
>> From: Yigit, Ferruh
>> Sent: Wednesday, November 1, 2017 9:07 AM
>> To: Thomas Monjalon; Richardson, Bruce; Gonzalez Monroy, Sergio
>> Cc: dev@dpdk.org; Yigit, Ferruh; Tan, Jianfeng; Santosh Shukla
>> Subject: [PATCH] eal: disable IOVA mode detection by default
>>
>> Fix kernel crash with KNI because KNI requires physical addresses.
>>
>> A config option introduced to disable IOVA mode detection and to set it
>> to physical address by default. Disabling config option will enable IOVA
>> mode detection.
>>
>> When there is no intension to use KNI, it is safe to enable detection.
>>
>> Config option disable IOVA mode detection by default to be sure only who
>> is aware of result enable it.
>>
>> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")
>>
>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>> ---
>> Cc: Jianfeng Tan <jianfeng.tan@intel.com>
>> Cc: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Cc: Thomas Monjalon <thomas@monjalon.net>
> 
> Refer to how vhost-kernel works, we may leverage a memory region table to do the translation. The bad side is it's less efficient than current phys_to_virt.

Hi Jianfeng,

Can you please elaborate?

Thanks,
ferruh

> Another side, we did not check the result of phys_to_virt, that's why kernel crashes.
> 
> Thanks,
> Jianfeng
>
  
Jianfeng Tan Nov. 1, 2017, 6:37 a.m. UTC | #6
> -----Original Message-----

> From: Yigit, Ferruh

> Sent: Wednesday, November 1, 2017 2:33 PM

> To: Tan, Jianfeng; Thomas Monjalon; Richardson, Bruce; Gonzalez Monroy,

> Sergio

> Cc: dev@dpdk.org; Santosh Shukla

> Subject: Re: [PATCH] eal: disable IOVA mode detection by default

> 

> On 10/31/2017 7:17 PM, Tan, Jianfeng wrote:

> >

> >

> >> -----Original Message-----

> >> From: Yigit, Ferruh

> >> Sent: Wednesday, November 1, 2017 9:07 AM

> >> To: Thomas Monjalon; Richardson, Bruce; Gonzalez Monroy, Sergio

> >> Cc: dev@dpdk.org; Yigit, Ferruh; Tan, Jianfeng; Santosh Shukla

> >> Subject: [PATCH] eal: disable IOVA mode detection by default

> >>

> >> Fix kernel crash with KNI because KNI requires physical addresses.

> >>

> >> A config option introduced to disable IOVA mode detection and to set it

> >> to physical address by default. Disabling config option will enable IOVA

> >> mode detection.

> >>

> >> When there is no intension to use KNI, it is safe to enable detection.

> >>

> >> Config option disable IOVA mode detection by default to be sure only

> who

> >> is aware of result enable it.

> >>

> >> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")

> >>

> >> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>

> >> ---

> >> Cc: Jianfeng Tan <jianfeng.tan@intel.com>

> >> Cc: Santosh Shukla <santosh.shukla@caviumnetworks.com>

> >> Cc: Thomas Monjalon <thomas@monjalon.net>

> >

> > Refer to how vhost-kernel works, we may leverage a memory region table

> to do the translation. The bad side is it's less efficient than current

> phys_to_virt.

> 

> Hi Jianfeng,

> 

> Can you please elaborate?


It is very similar to what Jacob proposed (KNI IOMMU).
1. Share memsegs (memory regions) with KNI, some pairs of (userspace_addr, length).
2. KNI maintains an address translation table, (userspace_addr, length, kernel_addr).
3. Every time we need a kernel va, we search the table to obtain the addr.

Thanks,
Jianfeng


> 

> Thanks,

> ferruh

> 

> > Another side, we did not check the result of phys_to_virt, that's why

> kernel crashes.

> >

> > Thanks,

> > Jianfeng

> >
  
Jerin Jacob Nov. 1, 2017, 6:54 a.m. UTC | #7
-----Original Message-----
> Date: Tue, 31 Oct 2017 23:21:18 -0700
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: Thomas Monjalon <thomas@monjalon.net>, Bruce Richardson
>  <bruce.richardson@intel.com>, Sergio Gonzalez Monroy
>  <sergio.gonzalez.monroy@intel.com>, dev@dpdk.org, Jianfeng Tan
>  <jianfeng.tan@intel.com>, Santosh Shukla
>  <santosh.shukla@caviumnetworks.com>
> Subject: Re: [dpdk-dev] [PATCH] eal: disable IOVA mode detection by default
> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
>  Thunderbird/52.4.0
> 
> On 10/31/2017 9:02 PM, Jerin Jacob wrote:
> > -----Original Message-----
> >> Date: Wed, 1 Nov 2017 01:07:26 +0000
> >> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >> To: Thomas Monjalon <thomas@monjalon.net>, Bruce Richardson
> >>  <bruce.richardson@intel.com>, Sergio Gonzalez Monroy
> >>  <sergio.gonzalez.monroy@intel.com>
> >> CC: dev@dpdk.org, Ferruh Yigit <ferruh.yigit@intel.com>, Jianfeng Tan
> >>  <jianfeng.tan@intel.com>, Santosh Shukla
> >>  <santosh.shukla@caviumnetworks.com>
> >> Subject: [dpdk-dev] [PATCH] eal: disable IOVA mode detection by default
> >> X-Mailer: git-send-email 2.13.6
> >>
> >> Fix kernel crash with KNI because KNI requires physical addresses.
> > 
> > The actual fix would be to make KNI IOMMU aware based on the DPDK mode.
> > 
> > ie. On slow path,
> > 
> > /* Get iommu domain for iova to physical addr conversion */             
> > if (rte_eal_iova_mode() == RTE_IOVA_VA)
> > 	kni->iommu_domain = iommu_get_domain_for_dev(dev);
> > else
> > 	kni->iommu_domain = NULL;
> > 
> > On fast path,
> > 
> > static inline u64 kni_iova_to_phys(struct ... *kni, dma_addr_t dma_addr)    
> > {                                                                               
> >         /* Translation is installed only when IOMMU is present */               
> >         if (kni->iommu_domain)                                                  
> >                 return iommu_iova_to_phys(kni->iommu_domain, dma_addr);         
> >         return dma_addr;                                                        
> > } 
> > 
> >>
> >> A config option introduced to disable IOVA mode detection and to set it
> >> to physical address by default. Disabling config option will enable IOVA
> >> mode detection.
> >>
> >> When there is no intension to use KNI, it is safe to enable detection.
> >>
> >> Config option disable IOVA mode detection by default to be sure only who
> >> is aware of result enable it.
> >>
> >> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")
> >>
> >> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> >> ---
> >> Cc: Jianfeng Tan <jianfeng.tan@intel.com>
> >> Cc: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> >> Cc: Thomas Monjalon <thomas@monjalon.net>
> >> ---
> >>  config/common_base                | 5 +++++
> >>  lib/librte_eal/bsdapp/eal/eal.c   | 4 ++++
> >>  lib/librte_eal/linuxapp/eal/eal.c | 4 ++++
> >>  3 files changed, 13 insertions(+)
> >>
> >> diff --git a/config/common_base b/config/common_base
> >> index 82ee75456..903e7685b 100644
> >> --- a/config/common_base
> >> +++ b/config/common_base
> >> @@ -107,6 +107,11 @@ CONFIG_RTE_MALLOC_DEBUG=n
> >>  CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
> >>  
> >>  #
> >> +# Disabling PHYS_IOVA may crash kernel for KNI, use with caution
> >> +#
> >> +CONFIG_RTE_EAL_USE_PHYS_IOVA=y
> > 
> > Defeat the purpose of all dynamic probing scheme.
> > Either we can fix the KNI or revert the following patch for this release.
> > 
> > http://dpdk.org/commit/f37dfab2
> 
> This commit just enables IOVA VA mode for Intel drivers, that is how I can able
> to observe the issue, but it is not the source of the problem. Reverting that
> commit will not solve KNI crash with any other PMD that enables IOVA VA mode.

I don't understand why a PMD needs to enable IOVA_VA if it can support IOVA_PA.
IMO, IOVA_VA should be enabled only for those device it can WORK ONLY on
IOVA_VA mode. Forget about KNI, If we set CONFIG_RTE_EAL_USE_PHYS_IOVA
as y then the normal stuff wont work for if PMD can operate only in
IOVA_VA mode(like octeontx).

Regarding the KNI crash, it can be avoid by first checking the exiting
mode(rte_eal_iova_mode()). i.e since legacy driver like KNI need real
physical address to work "now", it can grace full exit on the init time if
mode == IOVA_VA;

> 
> Related to the KNI, iommu is not involved at all, I am not clear with your above
> suggestion, physical address is required for kernelspace - userspace communication.

vhost-kernel addressed this case with IOVA as VA. I need to spend cycles
on what takes to remove physical address dependency from KNI.
But someone can add that support if it is required in future.


> KNI uses physical address for two things:
> 
> 1- Creates a buffer in userspace, a memzone, and shares its physical address to
> the kernel, so that both kernel and app can access same buffer.
> 
> The question is, even all devices supports IOVA VA mode, why creating a memzone
> that has real physical address info is not possible anymore?
> For KNI case this is not related at all with what devices supports.
> 
> 2- For each mbuf that will be sent to kernel, dpdk app puts physical address of
> that mbuf into shared buffer, so that kernel can access it. So that kernel can
> access to mbuf data that can be coming from any mempool.
> For this the physical address of the mbuf is required.
> 
> 
> Overall KNI needs to know physical address of mbufs and memzone.
> 
> This patch provides a way to have old behavior with a config option, so that KNI
> can work and anyone who needs to create memzone with physical address can a way
> to have it, and all PMDs will work fine.
> 
> If there is no need for these, it is possible to disable config and dynamic
> probing scheme is already there.
> 
> Thanks,
> ferruh
  
Ferruh Yigit Nov. 1, 2017, 7:29 a.m. UTC | #8
On 10/31/2017 11:37 PM, Tan, Jianfeng wrote:
> 
> 
>> -----Original Message-----
>> From: Yigit, Ferruh
>> Sent: Wednesday, November 1, 2017 2:33 PM
>> To: Tan, Jianfeng; Thomas Monjalon; Richardson, Bruce; Gonzalez Monroy,
>> Sergio
>> Cc: dev@dpdk.org; Santosh Shukla
>> Subject: Re: [PATCH] eal: disable IOVA mode detection by default
>>
>> On 10/31/2017 7:17 PM, Tan, Jianfeng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Yigit, Ferruh
>>>> Sent: Wednesday, November 1, 2017 9:07 AM
>>>> To: Thomas Monjalon; Richardson, Bruce; Gonzalez Monroy, Sergio
>>>> Cc: dev@dpdk.org; Yigit, Ferruh; Tan, Jianfeng; Santosh Shukla
>>>> Subject: [PATCH] eal: disable IOVA mode detection by default
>>>>
>>>> Fix kernel crash with KNI because KNI requires physical addresses.
>>>>
>>>> A config option introduced to disable IOVA mode detection and to set it
>>>> to physical address by default. Disabling config option will enable IOVA
>>>> mode detection.
>>>>
>>>> When there is no intension to use KNI, it is safe to enable detection.
>>>>
>>>> Config option disable IOVA mode detection by default to be sure only
>> who
>>>> is aware of result enable it.
>>>>
>>>> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")
>>>>
>>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> ---
>>>> Cc: Jianfeng Tan <jianfeng.tan@intel.com>
>>>> Cc: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>> Cc: Thomas Monjalon <thomas@monjalon.net>
>>>
>>> Refer to how vhost-kernel works, we may leverage a memory region table
>> to do the translation. The bad side is it's less efficient than current
>> phys_to_virt.
>>
>> Hi Jianfeng,
>>
>> Can you please elaborate?
> 
> It is very similar to what Jacob proposed (KNI IOMMU).
> 1. Share memsegs (memory regions) with KNI, some pairs of (userspace_addr, length).
> 2. KNI maintains an address translation table, (userspace_addr, length, kernel_addr).
> 3. Every time we need a kernel va, we search the table to obtain the addr.

I see, this looks like can work.
But I concern about cost of this lookup, which needs to be done per packet.
Current address translations are basic arithmetic operations.

What do you think doing va -> pa translation in userspace via eal APIs, and keep
KNI kernel code intact?
Overall translation cost should be same, but the code complexity kept in eal
layer which already knows about memsegs.

> 
> Thanks,
> Jianfeng
> 
> 
>>
>> Thanks,
>> ferruh
>>
>>> Another side, we did not check the result of phys_to_virt, that's why
>> kernel crashes.
>>>
>>> Thanks,
>>> Jianfeng
>>>
>
  
Jianfeng Tan Nov. 1, 2017, 7:39 a.m. UTC | #9
On 11/1/2017 2:54 PM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Tue, 31 Oct 2017 23:21:18 -0700
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> CC: Thomas Monjalon <thomas@monjalon.net>, Bruce Richardson
>>   <bruce.richardson@intel.com>, Sergio Gonzalez Monroy
>>   <sergio.gonzalez.monroy@intel.com>, dev@dpdk.org, Jianfeng Tan
>>   <jianfeng.tan@intel.com>, Santosh Shukla
>>   <santosh.shukla@caviumnetworks.com>
>> Subject: Re: [dpdk-dev] [PATCH] eal: disable IOVA mode detection by default
>> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
>>   Thunderbird/52.4.0
>>
>> On 10/31/2017 9:02 PM, Jerin Jacob wrote:
>>> -----Original Message-----
>>>> Date: Wed, 1 Nov 2017 01:07:26 +0000
>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> To: Thomas Monjalon <thomas@monjalon.net>, Bruce Richardson
>>>>   <bruce.richardson@intel.com>, Sergio Gonzalez Monroy
>>>>   <sergio.gonzalez.monroy@intel.com>
>>>> CC: dev@dpdk.org, Ferruh Yigit <ferruh.yigit@intel.com>, Jianfeng Tan
>>>>   <jianfeng.tan@intel.com>, Santosh Shukla
>>>>   <santosh.shukla@caviumnetworks.com>
>>>> Subject: [dpdk-dev] [PATCH] eal: disable IOVA mode detection by default
>>>> X-Mailer: git-send-email 2.13.6
>>>>
>>>> Fix kernel crash with KNI because KNI requires physical addresses.
>>> The actual fix would be to make KNI IOMMU aware based on the DPDK mode.
>>>
>>> ie. On slow path,
>>>
>>> /* Get iommu domain for iova to physical addr conversion */
>>> if (rte_eal_iova_mode() == RTE_IOVA_VA)
>>> 	kni->iommu_domain = iommu_get_domain_for_dev(dev);
>>> else
>>> 	kni->iommu_domain = NULL;
>>>
>>> On fast path,
>>>
>>> static inline u64 kni_iova_to_phys(struct ... *kni, dma_addr_t dma_addr)
>>> {
>>>          /* Translation is installed only when IOMMU is present */
>>>          if (kni->iommu_domain)
>>>                  return iommu_iova_to_phys(kni->iommu_domain, dma_addr);
>>>          return dma_addr;
>>> }
>>>
>>>> A config option introduced to disable IOVA mode detection and to set it
>>>> to physical address by default. Disabling config option will enable IOVA
>>>> mode detection.
>>>>
>>>> When there is no intension to use KNI, it is safe to enable detection.
>>>>
>>>> Config option disable IOVA mode detection by default to be sure only who
>>>> is aware of result enable it.
>>>>
>>>> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")
>>>>
>>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> ---
>>>> Cc: Jianfeng Tan <jianfeng.tan@intel.com>
>>>> Cc: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>> Cc: Thomas Monjalon <thomas@monjalon.net>
>>>> ---
>>>>   config/common_base                | 5 +++++
>>>>   lib/librte_eal/bsdapp/eal/eal.c   | 4 ++++
>>>>   lib/librte_eal/linuxapp/eal/eal.c | 4 ++++
>>>>   3 files changed, 13 insertions(+)
>>>>
>>>> diff --git a/config/common_base b/config/common_base
>>>> index 82ee75456..903e7685b 100644
>>>> --- a/config/common_base
>>>> +++ b/config/common_base
>>>> @@ -107,6 +107,11 @@ CONFIG_RTE_MALLOC_DEBUG=n
>>>>   CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
>>>>   
>>>>   #
>>>> +# Disabling PHYS_IOVA may crash kernel for KNI, use with caution
>>>> +#
>>>> +CONFIG_RTE_EAL_USE_PHYS_IOVA=y
>>> Defeat the purpose of all dynamic probing scheme.
>>> Either we can fix the KNI or revert the following patch for this release.
>>>
>>> http://dpdk.org/commit/f37dfab2
>> This commit just enables IOVA VA mode for Intel drivers, that is how I can able
>> to observe the issue, but it is not the source of the problem. Reverting that
>> commit will not solve KNI crash with any other PMD that enables IOVA VA mode.
> I don't understand why a PMD needs to enable IOVA_VA if it can support IOVA_PA.
> IMO, IOVA_VA should be enabled only for those device it can WORK ONLY on
> IOVA_VA mode.

IOVA_VA is also necessary if we don't want/have root privilege to run 
DPDK app.

> Forget about KNI, If we set CONFIG_RTE_EAL_USE_PHYS_IOVA
> as y then the normal stuff wont work for if PMD can operate only in
> IOVA_VA mode(like octeontx).

Yes, agreed.

> Regarding the KNI crash, it can be avoid by first checking the exiting
> mode(rte_eal_iova_mode()). i.e since legacy driver like KNI need real
> physical address to work "now", it can grace full exit on the init time if
> mode == IOVA_VA;

For the KNI crash, do we really need to correct the case:  make those 
PMDs (requires IOVA_VA)  also work well with KNI?
- If no, we just need to report  RTE_PCI_DRV_IOVA_AS_VA smartly.
- If yes, we shall make KNI work with VA.

Thanks,
Jianfeng
  
Jianfeng Tan Nov. 1, 2017, 7:55 a.m. UTC | #10
> -----Original Message-----

> From: Yigit, Ferruh

> Sent: Wednesday, November 1, 2017 3:29 PM

> To: Tan, Jianfeng; Thomas Monjalon; Richardson, Bruce; Gonzalez Monroy,

> Sergio

> Cc: dev@dpdk.org; Santosh Shukla

> Subject: Re: [PATCH] eal: disable IOVA mode detection by default

> 

> On 10/31/2017 11:37 PM, Tan, Jianfeng wrote:

> >

> >

> >> -----Original Message-----

> >> From: Yigit, Ferruh

> >> Sent: Wednesday, November 1, 2017 2:33 PM

> >> To: Tan, Jianfeng; Thomas Monjalon; Richardson, Bruce; Gonzalez Monroy,

> >> Sergio

> >> Cc: dev@dpdk.org; Santosh Shukla

> >> Subject: Re: [PATCH] eal: disable IOVA mode detection by default

> >>

> >> On 10/31/2017 7:17 PM, Tan, Jianfeng wrote:

> >>>

> >>>

> >>>> -----Original Message-----

> >>>> From: Yigit, Ferruh

> >>>> Sent: Wednesday, November 1, 2017 9:07 AM

> >>>> To: Thomas Monjalon; Richardson, Bruce; Gonzalez Monroy, Sergio

> >>>> Cc: dev@dpdk.org; Yigit, Ferruh; Tan, Jianfeng; Santosh Shukla

> >>>> Subject: [PATCH] eal: disable IOVA mode detection by default

> >>>>

> >>>> Fix kernel crash with KNI because KNI requires physical addresses.

> >>>>

> >>>> A config option introduced to disable IOVA mode detection and to set it

> >>>> to physical address by default. Disabling config option will enable IOVA

> >>>> mode detection.

> >>>>

> >>>> When there is no intension to use KNI, it is safe to enable detection.

> >>>>

> >>>> Config option disable IOVA mode detection by default to be sure only

> >> who

> >>>> is aware of result enable it.

> >>>>

> >>>> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")

> >>>>

> >>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>

> >>>> ---

> >>>> Cc: Jianfeng Tan <jianfeng.tan@intel.com>

> >>>> Cc: Santosh Shukla <santosh.shukla@caviumnetworks.com>

> >>>> Cc: Thomas Monjalon <thomas@monjalon.net>

> >>>

> >>> Refer to how vhost-kernel works, we may leverage a memory region

> table

> >> to do the translation. The bad side is it's less efficient than current

> >> phys_to_virt.

> >>

> >> Hi Jianfeng,

> >>

> >> Can you please elaborate?

> >

> > It is very similar to what Jacob proposed (KNI IOMMU).

> > 1. Share memsegs (memory regions) with KNI, some pairs of

> (userspace_addr, length).

> > 2. KNI maintains an address translation table, (userspace_addr, length,

> kernel_addr).

> > 3. Every time we need a kernel va, we search the table to obtain the addr.

> 

> I see, this looks like can work.

> But I concern about cost of this lookup, which needs to be done per packet.

> Current address translations are basic arithmetic operations.


The cost depends on how memory distribution of DPDK is. It's LOG(N) complexity.

> 

> What do you think doing va -> pa translation in userspace via eal APIs, and

> keep

> KNI kernel code intact?


That works, like a tx callback or tx_prepare for KNI pmd.
But it modifies mbuf, when we switch back this mbuf to use on IOVA_VA PMD, we change back to va again.

> Overall translation cost should be same, but the code complexity kept in eal

> layer which already knows about memsegs.


This is more complex as I mentioned above.

Thanks,
Jianfeng

> 

> >

> > Thanks,

> > Jianfeng

> >

> >

> >>

> >> Thanks,

> >> ferruh

> >>

> >>> Another side, we did not check the result of phys_to_virt, that's why

> >> kernel crashes.

> >>>

> >>> Thanks,

> >>> Jianfeng

> >>>

> >
  
Ferruh Yigit Nov. 1, 2017, 8:12 a.m. UTC | #11
On 10/31/2017 11:54 PM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Tue, 31 Oct 2017 23:21:18 -0700
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> CC: Thomas Monjalon <thomas@monjalon.net>, Bruce Richardson
>>  <bruce.richardson@intel.com>, Sergio Gonzalez Monroy
>>  <sergio.gonzalez.monroy@intel.com>, dev@dpdk.org, Jianfeng Tan
>>  <jianfeng.tan@intel.com>, Santosh Shukla
>>  <santosh.shukla@caviumnetworks.com>
>> Subject: Re: [dpdk-dev] [PATCH] eal: disable IOVA mode detection by default
>> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
>>  Thunderbird/52.4.0
>>
>> On 10/31/2017 9:02 PM, Jerin Jacob wrote:
>>> -----Original Message-----
>>>> Date: Wed, 1 Nov 2017 01:07:26 +0000
>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> To: Thomas Monjalon <thomas@monjalon.net>, Bruce Richardson
>>>>  <bruce.richardson@intel.com>, Sergio Gonzalez Monroy
>>>>  <sergio.gonzalez.monroy@intel.com>
>>>> CC: dev@dpdk.org, Ferruh Yigit <ferruh.yigit@intel.com>, Jianfeng Tan
>>>>  <jianfeng.tan@intel.com>, Santosh Shukla
>>>>  <santosh.shukla@caviumnetworks.com>
>>>> Subject: [dpdk-dev] [PATCH] eal: disable IOVA mode detection by default
>>>> X-Mailer: git-send-email 2.13.6
>>>>
>>>> Fix kernel crash with KNI because KNI requires physical addresses.
>>>
>>> The actual fix would be to make KNI IOMMU aware based on the DPDK mode.
>>>
>>> ie. On slow path,
>>>
>>> /* Get iommu domain for iova to physical addr conversion */             
>>> if (rte_eal_iova_mode() == RTE_IOVA_VA)
>>> 	kni->iommu_domain = iommu_get_domain_for_dev(dev);
>>> else
>>> 	kni->iommu_domain = NULL;
>>>
>>> On fast path,
>>>
>>> static inline u64 kni_iova_to_phys(struct ... *kni, dma_addr_t dma_addr)    
>>> {                                                                               
>>>         /* Translation is installed only when IOMMU is present */               
>>>         if (kni->iommu_domain)                                                  
>>>                 return iommu_iova_to_phys(kni->iommu_domain, dma_addr);         
>>>         return dma_addr;                                                        
>>> } 
>>>
>>>>
>>>> A config option introduced to disable IOVA mode detection and to set it
>>>> to physical address by default. Disabling config option will enable IOVA
>>>> mode detection.
>>>>
>>>> When there is no intension to use KNI, it is safe to enable detection.
>>>>
>>>> Config option disable IOVA mode detection by default to be sure only who
>>>> is aware of result enable it.
>>>>
>>>> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")
>>>>
>>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> ---
>>>> Cc: Jianfeng Tan <jianfeng.tan@intel.com>
>>>> Cc: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>> Cc: Thomas Monjalon <thomas@monjalon.net>
>>>> ---
>>>>  config/common_base                | 5 +++++
>>>>  lib/librte_eal/bsdapp/eal/eal.c   | 4 ++++
>>>>  lib/librte_eal/linuxapp/eal/eal.c | 4 ++++
>>>>  3 files changed, 13 insertions(+)
>>>>
>>>> diff --git a/config/common_base b/config/common_base
>>>> index 82ee75456..903e7685b 100644
>>>> --- a/config/common_base
>>>> +++ b/config/common_base
>>>> @@ -107,6 +107,11 @@ CONFIG_RTE_MALLOC_DEBUG=n
>>>>  CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
>>>>  
>>>>  #
>>>> +# Disabling PHYS_IOVA may crash kernel for KNI, use with caution
>>>> +#
>>>> +CONFIG_RTE_EAL_USE_PHYS_IOVA=y
>>>
>>> Defeat the purpose of all dynamic probing scheme.
>>> Either we can fix the KNI or revert the following patch for this release.
>>>
>>> http://dpdk.org/commit/f37dfab2
>>
>> This commit just enables IOVA VA mode for Intel drivers, that is how I can able
>> to observe the issue, but it is not the source of the problem. Reverting that
>> commit will not solve KNI crash with any other PMD that enables IOVA VA mode.
> 
> I don't understand why a PMD needs to enable IOVA_VA if it can support IOVA_PA.
> IMO, IOVA_VA should be enabled only for those device it can WORK ONLY on
> IOVA_VA mode. Forget about KNI, If we set CONFIG_RTE_EAL_USE_PHYS_IOVA
> as y then the normal stuff wont work for if PMD can operate only in
> IOVA_VA mode(like octeontx).

Hmm, I wasn't aware that octeontx only operate on IOVA VA mode, I got your
concern now.

But still that config option still enables choosing between KNI and "IOVA_VA
mode only" devices until KNI gets fixed. Unfortunately both won't work at same
time for now.

> Regarding the KNI crash, it can be avoid by first checking the exiting
> mode(rte_eal_iova_mode()). i.e since legacy driver like KNI need real
> physical address to work "now", it can grace full exit on the init time if
> mode == IOVA_VA;

Definitely agree to gracefully exit, but I was looking to make KNI work more
than just exit without crash.
>> Related to the KNI, iommu is not involved at all, I am not clear with your above
>> suggestion, physical address is required for kernelspace - userspace communication.
> 
> vhost-kernel addressed this case with IOVA as VA. I need to spend cycles
> on what takes to remove physical address dependency from KNI.
> But someone can add that support if it is required in future.

Since the iova patchset breaks the existing KNI, I would like to see KNI fix
part of the iova work.
It doesn't look right to me that a new feature breaking the existing one.

>> KNI uses physical address for two things:
>>
>> 1- Creates a buffer in userspace, a memzone, and shares its physical address to
>> the kernel, so that both kernel and app can access same buffer.
>>
>> The question is, even all devices supports IOVA VA mode, why creating a memzone
>> that has real physical address info is not possible anymore?
>> For KNI case this is not related at all with what devices supports.
>>
>> 2- For each mbuf that will be sent to kernel, dpdk app puts physical address of
>> that mbuf into shared buffer, so that kernel can access it. So that kernel can
>> access to mbuf data that can be coming from any mempool.
>> For this the physical address of the mbuf is required.
>>
>>
>> Overall KNI needs to know physical address of mbufs and memzone.
>>
>> This patch provides a way to have old behavior with a config option, so that KNI
>> can work and anyone who needs to create memzone with physical address can a way
>> to have it, and all PMDs will work fine.
>>
>> If there is no need for these, it is possible to disable config and dynamic
>> probing scheme is already there.
>>
>> Thanks,
>> ferruh
  
Jerin Jacob Nov. 1, 2017, 8:26 a.m. UTC | #12
-----Original Message-----
> Date: Wed, 1 Nov 2017 15:39:01 +0800
> From: "Tan, Jianfeng" <jianfeng.tan@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Ferruh Yigit
>  <ferruh.yigit@intel.com>
> CC: Thomas Monjalon <thomas@monjalon.net>, Bruce Richardson
>  <bruce.richardson@intel.com>, Sergio Gonzalez Monroy
>  <sergio.gonzalez.monroy@intel.com>, dev@dpdk.org, Santosh Shukla
>  <santosh.shukla@caviumnetworks.com>
> Subject: Re: [dpdk-dev] [PATCH] eal: disable IOVA mode detection by default
> User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
>  Thunderbird/45.8.0
> 
> 
> 
> On 11/1/2017 2:54 PM, Jerin Jacob wrote:
> > -----Original Message-----
> > > Date: Tue, 31 Oct 2017 23:21:18 -0700
> > > From: Ferruh Yigit <ferruh.yigit@intel.com>
> > > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > CC: Thomas Monjalon <thomas@monjalon.net>, Bruce Richardson
> > >   <bruce.richardson@intel.com>, Sergio Gonzalez Monroy
> > >   <sergio.gonzalez.monroy@intel.com>, dev@dpdk.org, Jianfeng Tan
> > >   <jianfeng.tan@intel.com>, Santosh Shukla
> > >   <santosh.shukla@caviumnetworks.com>
> > > Subject: Re: [dpdk-dev] [PATCH] eal: disable IOVA mode detection by default
> > > User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
> > >   Thunderbird/52.4.0
> > > 
> > > On 10/31/2017 9:02 PM, Jerin Jacob wrote:
> > > > -----Original Message-----
> > > > > Date: Wed, 1 Nov 2017 01:07:26 +0000
> > > > > From: Ferruh Yigit <ferruh.yigit@intel.com>
> > > > > To: Thomas Monjalon <thomas@monjalon.net>, Bruce Richardson
> > > > >   <bruce.richardson@intel.com>, Sergio Gonzalez Monroy
> > > > >   <sergio.gonzalez.monroy@intel.com>
> > > > > CC: dev@dpdk.org, Ferruh Yigit <ferruh.yigit@intel.com>, Jianfeng Tan
> > > > >   <jianfeng.tan@intel.com>, Santosh Shukla
> > > > >   <santosh.shukla@caviumnetworks.com>
> > > > > Subject: [dpdk-dev] [PATCH] eal: disable IOVA mode detection by default
> > > > > X-Mailer: git-send-email 2.13.6
> > > > > 
> > > > > Fix kernel crash with KNI because KNI requires physical addresses.
> > > > The actual fix would be to make KNI IOMMU aware based on the DPDK mode.
> > > > 
> > > > ie. On slow path,
> > > > 
> > > > /* Get iommu domain for iova to physical addr conversion */
> > > > if (rte_eal_iova_mode() == RTE_IOVA_VA)
> > > > 	kni->iommu_domain = iommu_get_domain_for_dev(dev);
> > > > else
> > > > 	kni->iommu_domain = NULL;
> > > > 
> > > > On fast path,
> > > > 
> > > > static inline u64 kni_iova_to_phys(struct ... *kni, dma_addr_t dma_addr)
> > > > {
> > > >          /* Translation is installed only when IOMMU is present */
> > > >          if (kni->iommu_domain)
> > > >                  return iommu_iova_to_phys(kni->iommu_domain, dma_addr);
> > > >          return dma_addr;
> > > > }
> > > > 
> > > > > A config option introduced to disable IOVA mode detection and to set it
> > > > > to physical address by default. Disabling config option will enable IOVA
> > > > > mode detection.
> > > > > 
> > > > > When there is no intension to use KNI, it is safe to enable detection.
> > > > > 
> > > > > Config option disable IOVA mode detection by default to be sure only who
> > > > > is aware of result enable it.
> > > > > 
> > > > > Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")
> > > > > 
> > > > > Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > > > > ---
> > > > > Cc: Jianfeng Tan <jianfeng.tan@intel.com>
> > > > > Cc: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> > > > > Cc: Thomas Monjalon <thomas@monjalon.net>
> > > > > ---
> > > > >   config/common_base                | 5 +++++
> > > > >   lib/librte_eal/bsdapp/eal/eal.c   | 4 ++++
> > > > >   lib/librte_eal/linuxapp/eal/eal.c | 4 ++++
> > > > >   3 files changed, 13 insertions(+)
> > > > > 
> > > > > diff --git a/config/common_base b/config/common_base
> > > > > index 82ee75456..903e7685b 100644
> > > > > --- a/config/common_base
> > > > > +++ b/config/common_base
> > > > > @@ -107,6 +107,11 @@ CONFIG_RTE_MALLOC_DEBUG=n
> > > > >   CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
> > > > >   #
> > > > > +# Disabling PHYS_IOVA may crash kernel for KNI, use with caution
> > > > > +#
> > > > > +CONFIG_RTE_EAL_USE_PHYS_IOVA=y
> > > > Defeat the purpose of all dynamic probing scheme.
> > > > Either we can fix the KNI or revert the following patch for this release.
> > > > 
> > > > http://dpdk.org/commit/f37dfab2
> > > This commit just enables IOVA VA mode for Intel drivers, that is how I can able
> > > to observe the issue, but it is not the source of the problem. Reverting that
> > > commit will not solve KNI crash with any other PMD that enables IOVA VA mode.
> > I don't understand why a PMD needs to enable IOVA_VA if it can support IOVA_PA.
> > IMO, IOVA_VA should be enabled only for those device it can WORK ONLY on
> > IOVA_VA mode.
> 
> IOVA_VA is also necessary if we don't want/have root privilege to run DPDK
> app.

OK.

> 
> > Forget about KNI, If we set CONFIG_RTE_EAL_USE_PHYS_IOVA
> > as y then the normal stuff wont work for if PMD can operate only in
> > IOVA_VA mode(like octeontx).
> 
> Yes, agreed.
> 
> > Regarding the KNI crash, it can be avoid by first checking the exiting
> > mode(rte_eal_iova_mode()). i.e since legacy driver like KNI need real
> > physical address to work "now", it can grace full exit on the init time if
> > mode == IOVA_VA;
> 
> For the KNI crash, do we really need to correct the case:  make those PMDs
> (requires IOVA_VA)  also work well with KNI?
> - If no, we just need to report  RTE_PCI_DRV_IOVA_AS_VA smartly.
> - If yes, we shall make KNI work with VA.

I would say "yes". But for this immediate release, I would say "no".

> 
> Thanks,
> Jianfeng
>
  
Jianfeng Tan Nov. 1, 2017, 8:53 a.m. UTC | #13
> -----Original Message-----
...
> >
> > > Regarding the KNI crash, it can be avoid by first checking the exiting
> > > mode(rte_eal_iova_mode()). i.e since legacy driver like KNI need real
> > > physical address to work "now", it can grace full exit on the init time if
> > > mode == IOVA_VA;
> >
> > For the KNI crash, do we really need to correct the case:  make those PMDs
> > (requires IOVA_VA)  also work well with KNI?
> > - If no, we just need to report  RTE_PCI_DRV_IOVA_AS_VA smartly.
> > - If yes, we shall make KNI work with VA.
> 
> I would say "yes". But for this immediate release, I would say "no".

Ferruh, if you also agree on "no" for now, we might fix it like this:

For PMDs like ixgbe/i40e/..., we detect if phys_addrs_available to decide RTE_PCI_DRV_IOVA_AS_VA in a RTE_INIT function.

How do you think of that?

Thanks,
Jianfeng
  
Thomas Monjalon Nov. 1, 2017, 10:31 a.m. UTC | #14
01/11/2017 02:07, Ferruh Yigit:
> Fix kernel crash with KNI because KNI requires physical addresses.
> 
> A config option introduced to disable IOVA mode detection and to set it
> to physical address by default. Disabling config option will enable IOVA
> mode detection.
> 
> When there is no intension to use KNI, it is safe to enable detection.
> 
> Config option disable IOVA mode detection by default to be sure only who
> is aware of result enable it.
> 
> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> +#ifdef RTE_EAL_USE_PHYS_IOVA
> +	rte_eal_get_configuration()->iova_mode = RTE_IOVA_PA;
> +#else
>  	/* autodetect the iova mapping mode (default is iova_pa) */
>  	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
> +#endif

I don't understand why you are adding a compile-time option.
I think it should be an EAL option --use-phys-addr.
The opposite option may be required to force VA: --use-virt-addr.
And if there is no option given, we fallback to autodetect.
We can improve the autodetect by checking whether rte_kni.ko is loaded.

Opinions?
  
Santosh Shukla Nov. 1, 2017, 11:06 a.m. UTC | #15
On Wednesday 01 November 2017 04:01 PM, Thomas Monjalon wrote:
> 01/11/2017 02:07, Ferruh Yigit:
>> Fix kernel crash with KNI because KNI requires physical addresses.
>>
>> A config option introduced to disable IOVA mode detection and to set it
>> to physical address by default. Disabling config option will enable IOVA
>> mode detection.
>>
>> When there is no intension to use KNI, it is safe to enable detection.
>>
>> Config option disable IOVA mode detection by default to be sure only who
>> is aware of result enable it.
>>
>> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")
>>
>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>> ---
>> +#ifdef RTE_EAL_USE_PHYS_IOVA
>> +	rte_eal_get_configuration()->iova_mode = RTE_IOVA_PA;
>> +#else
>>  	/* autodetect the iova mapping mode (default is iova_pa) */
>>  	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
>> +#endif
> I don't understand why you are adding a compile-time option.
> I think it should be an EAL option --use-phys-addr.
> The opposite option may be required to force VA: --use-virt-addr.
> And if there is no option given, we fallback to autodetect.
> We can improve the autodetect by checking whether rte_kni.ko is loaded.

IMO, we could introduce a rule for kni in iova autodetection(/for linux case)
such away that: 

if cat /proc/modules has rte_kni entry 
then use default iova=pa mode.

so +1 to your proposition Thomas.

> Opinions?
  
Ferruh Yigit Nov. 1, 2017, 6:11 p.m. UTC | #16
On 11/1/2017 3:31 AM, Thomas Monjalon wrote:
> 01/11/2017 02:07, Ferruh Yigit:
>> Fix kernel crash with KNI because KNI requires physical addresses.
>>
>> A config option introduced to disable IOVA mode detection and to set it
>> to physical address by default. Disabling config option will enable IOVA
>> mode detection.
>>
>> When there is no intension to use KNI, it is safe to enable detection.
>>
>> Config option disable IOVA mode detection by default to be sure only who
>> is aware of result enable it.
>>
>> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")
>>
>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>> ---
>> +#ifdef RTE_EAL_USE_PHYS_IOVA
>> +	rte_eal_get_configuration()->iova_mode = RTE_IOVA_PA;
>> +#else
>>  	/* autodetect the iova mapping mode (default is iova_pa) */
>>  	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
>> +#endif
> 
> I don't understand why you are adding a compile-time option.
> I think it should be an EAL option --use-phys-addr.
> The opposite option may be required to force VA: --use-virt-addr.
> And if there is no option given, we fallback to autodetect.

It was to have old behavior by default without doing anything special, so that
previous KNI applications can continue to run as it is.

> We can improve the autodetect by checking whether rte_kni.ko is loaded.
> 
> Opinions?
>
  
Ferruh Yigit Nov. 1, 2017, 6:22 p.m. UTC | #17
On 11/1/2017 4:06 AM, santosh wrote:
> 
> On Wednesday 01 November 2017 04:01 PM, Thomas Monjalon wrote:
>> 01/11/2017 02:07, Ferruh Yigit:
>>> Fix kernel crash with KNI because KNI requires physical addresses.
>>>
>>> A config option introduced to disable IOVA mode detection and to set it
>>> to physical address by default. Disabling config option will enable IOVA
>>> mode detection.
>>>
>>> When there is no intension to use KNI, it is safe to enable detection.
>>>
>>> Config option disable IOVA mode detection by default to be sure only who
>>> is aware of result enable it.
>>>
>>> Fixes: 72d013644bd6 ("mem: honor IOVA mode in malloc virt2phy")
>>>
>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>> ---
>>> +#ifdef RTE_EAL_USE_PHYS_IOVA
>>> +	rte_eal_get_configuration()->iova_mode = RTE_IOVA_PA;
>>> +#else
>>>  	/* autodetect the iova mapping mode (default is iova_pa) */
>>>  	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
>>> +#endif
>> I don't understand why you are adding a compile-time option.
>> I think it should be an EAL option --use-phys-addr.
>> The opposite option may be required to force VA: --use-virt-addr.
>> And if there is no option given, we fallback to autodetect.
>> We can improve the autodetect by checking whether rte_kni.ko is loaded.
> 
> IMO, we could introduce a rule for kni in iova autodetection(/for linux case)
> such away that: 
> 
> if cat /proc/modules has rte_kni entry 
> then use default iova=pa mode.

This sounds good as a workaround for this release [1].

For long term I believe KNI should be updated to work with virtual addresses.


[1]
This won't work if both IOVA VA mode only device and KNI used together. But I
assume this is not common use case.


> 
> so +1 to your proposition Thomas.
> 
>> Opinions?
>
  
Ferruh Yigit Nov. 1, 2017, 6:31 p.m. UTC | #18
On 11/1/2017 1:53 AM, Tan, Jianfeng wrote:
> 
> 
>> -----Original Message-----
> ...
>>>
>>>> Regarding the KNI crash, it can be avoid by first checking the exiting
>>>> mode(rte_eal_iova_mode()). i.e since legacy driver like KNI need real
>>>> physical address to work "now", it can grace full exit on the init time if
>>>> mode == IOVA_VA;
>>>
>>> For the KNI crash, do we really need to correct the case:  make those PMDs
>>> (requires IOVA_VA)  also work well with KNI?
>>> - If no, we just need to report  RTE_PCI_DRV_IOVA_AS_VA smartly.
>>> - If yes, we shall make KNI work with VA.
>>
>> I would say "yes". But for this immediate release, I would say "no".
> 
> Ferruh, if you also agree on "no" for now, we might fix it like this:

I agree with Jerin, for this release "no", for long term "yes".

> 
> For PMDs like ixgbe/i40e/..., we detect if phys_addrs_available to decide RTE_PCI_DRV_IOVA_AS_VA in a RTE_INIT function.
> 
> How do you think of that?

Instead Santosh's suggestion looks simpler.
Detect if kni module inserted, if so set IOVA PA mode.

I guess rte_bus_get_iommu_class() can updated for this check as workaround.

> 
> Thanks,
> Jianfeng
>
  

Patch

diff --git a/config/common_base b/config/common_base
index 82ee75456..903e7685b 100644
--- a/config/common_base
+++ b/config/common_base
@@ -107,6 +107,11 @@  CONFIG_RTE_MALLOC_DEBUG=n
 CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
 
 #
+# Disabling PHYS_IOVA may crash kernel for KNI, use with caution
+#
+CONFIG_RTE_EAL_USE_PHYS_IOVA=y
+
+#
 # Recognize/ignore the AVX/AVX512 CPU flags for performance/power testing.
 # AVX512 is marked as experimental for now, will enable it after enough
 # field test and possible optimization.
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 09112b6f9..db485951b 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -570,8 +570,12 @@  rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+#ifdef RTE_EAL_USE_PHYS_IOVA
+	rte_eal_get_configuration()->iova_mode = RTE_IOVA_PA;
+#else
 	/* autodetect the iova mapping mode (default is iova_pa) */
 	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+#endif
 
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 017c402ed..1775bb33f 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -805,8 +805,12 @@  rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+#ifdef RTE_EAL_USE_PHYS_IOVA
+	rte_eal_get_configuration()->iova_mode = RTE_IOVA_PA;
+#else
 	/* autodetect the iova mapping mode (default is iova_pa) */
 	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+#endif
 
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&