[dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping

Jerin Jacob jerin.jacob at caviumnetworks.com
Thu Jul 6 13:12:36 CEST 2017


-----Original Message-----
> Date: Thu, 6 Jul 2017 12:59:04 +0200
> From: Maxime Coquelin <maxime.coquelin at redhat.com>
> To: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> CC: Santosh Shukla <santosh.shukla at caviumnetworks.com>,
>  thomas at monjalon.net, bruce.richardson at intel.com, dev at dpdk.org,
>  hemant.agrawal at nxp.com, shreyansh.jain at nxp.com, gaetan.rivet at 6wind.com
> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>  before mapping
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>  Thunderbird/52.1.0
> 
> 
> 
> On 07/06/2017 11:49 AM, Jerin Jacob wrote:
> > -----Original Message-----
> > > Date: Thu, 6 Jul 2017 09:58:41 +0200
> > > From: Maxime Coquelin <maxime.coquelin at redhat.com>
> > > To: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> > > CC: Santosh Shukla <santosh.shukla at caviumnetworks.com>,
> > >   thomas at monjalon.net, bruce.richardson at intel.com, dev at dpdk.org,
> > >   hemant.agrawal at nxp.com, shreyansh.jain at nxp.com, gaetan.rivet at 6wind.com
> > > Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
> > >   before mapping
> > > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
> > >   Thunderbird/52.1.0
> > > 
> > > 
> > > 
> > > > > 
> > > > > 
> > > > > On 06/08/2017 01:05 PM, Santosh Shukla wrote:
> > > > > > Check iova mode and accordingly map iova to pa or va.
> > > > > > 
> > > > > > Signed-off-by: Santosh Shukla<santosh.shukla at caviumnetworks.com>
> > > > > > Signed-off-by: Jerin Jacob<jerin.jacob at caviumnetworks.com>
> > > > > > ---
> > > > > >     lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
> > > > > >     1 file changed, 8 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > > > index 04914406f..348b7a7f4 100644
> > > > > > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > > > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > > > @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
> > > > > >     		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
> > > > > >     		dma_map.vaddr = ms[i].addr_64;
> > > > > >     		dma_map.size = ms[i].len;
> > > > > > -		dma_map.iova = ms[i].phys_addr;
> > > > > > +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
> > > > > > +			dma_map.iova = dma_map.vaddr;
> > > > > > +		else
> > > > > > +			dma_map.iova = ms[i].phys_addr;
> > > > > >     		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
> > > > > 
> > > > > IIUC, it is changing default behavior for VFIO devices.
> > > > > 
> > > > > I see a possible problem, but I'm not sure the case is valid.
> > > > > 
> > > > > Imagine you have two devices in the iommu group, and the two devices are
> > > > > used in separate processes. Each process could try two different
> > > > > physical addresses at the same virtual address, and so the second map
> > > > > would fail.
> > > > 
> > > > IMO, Doesn't look like a problem. Here is the data flow
> > > > 
> > > > 1) The vfio DMA map function(vfio_type1_dma_map()) will be called only
> > > > on primary process
> > > > http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359
> > > > 
> > > > 2) On secondary process, DPDK rte_eal_huge_page_attach() will make sure
> > > > that, the Secondary process has the _same_ virtual address as primary or
> > > > exit from on attach.
> > > > http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452
> > > > 
> > > > 3) Since secondary process adds the mapped the virtual address in step (2).
> > > > in the page table in OS. On SMMU entry miss(When device
> > > > request from I/O transaction), OS will load the mapping and update the SMMU
> > > > "context" with page tables from MMU.
> > > 
> > > Ok thanks for the detailed info, but what about the case where the same
> > > iommu group is used by two primary processes?
> > 
> > Does that case exist with DPDK? We always need to blacklist same BDF in
> > the secondary process to make things work with existing DPDK setup. Which
> > make sense as well. Only primary process configures the HW blocks.
> 
> I meant the case when two BDF are in the same IOMMU group (if ACS is not
> supported at some point in the hierarchy). And I meant two primary
> processes running, like for example two containers running each a DPDK
> application.
> 
> Maybe this is not a valid use-case (it is not secure, as it would break
> isolation between the two containers), but it seems that it is something
> DPDK allows today, if I'm not mistaken.

Not sure. Doesn't seems to valid case with VFIO as without ACS anyway
it will break security (the all point of IOMMU protection == VFIO)

> 
> > > 
> > > I don't know how frequent it is, but if ACS is not supported by either the
> > > endpoint or the the root port, then you would have to share the same IOMMU
> > > group for all the ports of your card. Right?
> > 
> > ACS is supported in our card(it not in bypass mode) and one mempool PCI BDF
> > comes as a IOMMU group.
> > 
> > If it in bypass mode anyway you use in vfio-noiommu mode as
> > there is no protection anyway.
> > > What concerns me is that we change the default behavior for all other
> > > devices. Having an option to override is fine to me, but the default
> > > mode should remain the same IMHO.
> > 
> > Doesn't seems to be a technical point. But I agree with your concern.
> > we will address it.
> > I think, we have two ways to address it.
> > 
> > option 1:
> > - In existing patch,
> > a) we are currently setting(internal_cfg->iova_mode = RTE_IOVA_PA)
> >    http://dpdk.org/dev/patchwork/patch/25192
> > b) only when with eal argument sets to RTE_IOVA_VA and then bus probed
> > value == RTE_IOVA_VA the final mode will be RTE_IOVA_VA
> > http://dpdk.org/dev/patchwork/patch/25193/
> > check the code after rte_bus_scan()
> > 
> > option 2:
> > On rte_pci_get_iommu_class() in http://dpdk.org/dev/patchwork/patch/25190/
> > we can check the rte_pci_device.id.vendor_id == CAVIUM to select the
> > mode so other type of devices safe.
> > 
> > I think, option 2 makes sense, as it gives foolproof auto detection scheme and
> > without effecting any other devices that not interested in this scheme
> > 
> > Does that address your concern about the patchset?
> 
> Yes it does, or maybe create a new flag in struct rte_pci_driver's
> drv_flags to provide a hint it prefers to use VA as IOVA?
> 
> It, of course, would just be a hint, and should be set only when other
> conditions are met.

Yes. Makes sense. We will roll out v2 with option2 + your rte_pci_driver
suggestion.

Thanks a lot for the review. Appreciated.


More information about the dev mailing list