[dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping

Maxime Coquelin maxime.coquelin at redhat.com
Thu Jul 6 15:11:04 CEST 2017



On 07/06/2017 03:08 PM, Maxime Coquelin wrote:
> 
> 
> On 07/06/2017 01:19 PM, santosh wrote:
>> On Thursday 06 July 2017 04:29 PM, Maxime Coquelin wrote:
>>
>>>
>>> On 07/06/2017 11:49 AM, Jerin Jacob wrote:
>>>> -----Original Message-----
>>>>> Date: Thu, 6 Jul 2017 09:58:41 +0200
>>>>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>>>>> To: Jerin Jacob <jerin.jacob at caviumnetworks.com>
>>>>> CC: Santosh Shukla <santosh.shukla at caviumnetworks.com>,
>>>>>    thomas at monjalon.net, bruce.richardson at intel.com, dev at dpdk.org,
>>>>>    hemant.agrawal at nxp.com, shreyansh.jain at nxp.com, 
>>>>> gaetan.rivet at 6wind.com
>>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova 
>>>>> mode
>>>>>    before mapping
>>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>>>    Thunderbird/52.1.0
>>>>>
>>>>>
>>>>>
>>>>> On 07/05/2017 05:43 PM, Jerin Jacob wrote:
>>>>>> -----Original Message-----
>>>>>>> Date: Wed, 5 Jul 2017 11:14:01 +0200
>>>>>>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>>>>>>> To: Santosh Shukla <santosh.shukla at caviumnetworks.com>,
>>>>>>>     thomas at monjalon.net, bruce.richardson at intel.com, dev at dpdk.org
>>>>>>> CC: jerin.jacob at caviumnetworks.com, hemant.agrawal at nxp.com,
>>>>>>>     shreyansh.jain at nxp.com, gaetan.rivet at 6wind.com
>>>>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor 
>>>>>>> iova mode
>>>>>>>     before mapping
>>>>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>>>>>     Thunderbird/52.1.0
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 06/08/2017 01:05 PM, Santosh Shukla wrote:
>>>>>>>> Check iova mode and accordingly map iova to pa or va.
>>>>>>>>
>>>>>>>> Signed-off-by: Santosh Shukla<santosh.shukla at caviumnetworks.com>
>>>>>>>> Signed-off-by: Jerin Jacob<jerin.jacob at caviumnetworks.com>
>>>>>>>> ---
>>>>>>>>      lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
>>>>>>>>      1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c 
>>>>>>>> b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>> index 04914406f..348b7a7f4 100644
>>>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>> @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
>>>>>>>>              dma_map.argsz = sizeof(struct 
>>>>>>>> vfio_iommu_type1_dma_map);
>>>>>>>>              dma_map.vaddr = ms[i].addr_64;
>>>>>>>>              dma_map.size = ms[i].len;
>>>>>>>> -        dma_map.iova = ms[i].phys_addr;
>>>>>>>> +        if (rte_eal_iova_mode() == RTE_IOVA_VA)
>>>>>>>> +            dma_map.iova = dma_map.vaddr;
>>>>>>>> +        else
>>>>>>>> +            dma_map.iova = ms[i].phys_addr;
>>>>>>>>              dma_map.flags = VFIO_DMA_MAP_FLAG_READ | 
>>>>>>>> VFIO_DMA_MAP_FLAG_WRITE;
>>>>>>>
>>>>>>> IIUC, it is changing default behavior for VFIO devices.
>>>>>>>
>>>>>>> I see a possible problem, but I'm not sure the case is valid.
>>>>>>>
>>>>>>> Imagine you have two devices in the iommu group, and the two 
>>>>>>> devices are
>>>>>>> used in separate processes. Each process could try two different
>>>>>>> physical addresses at the same virtual address, and so the second 
>>>>>>> map
>>>>>>> would fail.
>>>>>>
>>>>>> IMO, Doesn't look like a problem. Here is the data flow
>>>>>>
>>>>>> 1) The vfio DMA map function(vfio_type1_dma_map()) will be called 
>>>>>> only
>>>>>> on primary process
>>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359 
>>>>>>
>>>>>>
>>>>>> 2) On secondary process, DPDK rte_eal_huge_page_attach() will make 
>>>>>> sure
>>>>>> that, the Secondary process has the _same_ virtual address as 
>>>>>> primary or
>>>>>> exit from on attach.
>>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452 
>>>>>>
>>>>>>
>>>>>> 3) Since secondary process adds the mapped the virtual address in 
>>>>>> step (2).
>>>>>> in the page table in OS. On SMMU entry miss(When device
>>>>>> request from I/O transaction), OS will load the mapping and update 
>>>>>> the SMMU
>>>>>> "context" with page tables from MMU.
>>>>>
>>>>> Ok thanks for the detailed info, but what about the case where the 
>>>>> same
>>>>> iommu group is used by two primary processes?
>>>>
>>>> Does that case exist with DPDK? We always need to blacklist same BDF in
>>>> the secondary process to make things work with existing DPDK setup. 
>>>> Which
>>>> make sense as well. Only primary process configures the HW blocks.
>>>
>>> I meant the case when two BDF are in the same IOMMU group (if ACS is not
>>> supported at some point in the hierarchy). And I meant two primary
>>> processes running, like for example two containers running each a DPDK
>>> application.
>>>
>>> Maybe this is not a valid use-case (it is not secure, as it would break
>>> isolation between the two containers), but it seems that it is something
>>> DPDK allows today, if I'm not mistaken.
>>>
>> I'm not sure how two primary process could run, as because latter 
>> primary process
>> would try accessing /var/run/.rte_config and would fail at this [1] 
>> point.
>>
>> It's not valid use-case for dpdk (imo).
>> [1] 
>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n204
> 
> Yes this is possible. I had never used it before, but Thomas told me it
> is supported by setting--file-prefix option. I had a trial, and I
> confirm it works:
> session 1> ./install/bin/testpmd -l 0,2 --socket-mem=1024 -w 
> 0000:05:00.0 --proc-type=primary --file-prefix=app1 -- --disable-hw-vlan 
> -i --rxq=1 --txq=1         --nb-cores=1 --forward-mode=io
> session 2> ./install/bin/testpmd -l 0,3 --socket-mem=1024 -w 
> 0000:05:00.1 --proc-type=primary --file-prefix=app2 -- --disable-hw-vlan 
> -i --rxq=1 --txq=1         --nb-cores=1 --forward-mode=io
> 
> In the above example, two ports of the same card is used by two
> processes. Note that in this case, ACS is supproted and both ports have
> their own iommu group.

# ls -al /var/run/.app*
-rw-r-----. 1 root root 208420 Jul  6 09:08 /var/run/.app1_config
-rw-r--r--. 1 root root  49728 Jul  6 09:08 /var/run/.app1_hugepage_info
srwxr-xr-x. 1 root root      0 Jul  6 09:08 /var/run/.app1_mp_socket
-rw-r-----. 1 root root 208420 Jul  6 09:08 /var/run/.app2_config
-rw-r--r--. 1 root root  45584 Jul  6 09:08 /var/run/.app2_hugepage_info
srwxr-xr-x. 1 root root      0 Jul  6 09:08 /var/run/.app2_mp_socket



More information about the dev mailing list