[dpdk-dev] [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map

gowrishankar muthukrishnan gowrishankar.m at linux.vnet.ibm.com
Thu Apr 20 21:16:13 CEST 2017


On Thursday 20 April 2017 07:52 PM, Alexey Kardashevskiy wrote:
> On 20/04/17 23:25, Alexey Kardashevskiy wrote:
>> On 20/04/17 19:04, Jonas Pfefferle1 wrote:
>>> Alexey Kardashevskiy <aik at ozlabs.ru> wrote on 20/04/2017 09:24:02:
>>>
>>>> From: Alexey Kardashevskiy <aik at ozlabs.ru>
>>>> To: dev at dpdk.org
>>>> Cc: Alexey Kardashevskiy <aik at ozlabs.ru>, JPF at zurich.ibm.com,
>>>> Gowrishankar Muthukrishnan <gowrishankar.m at in.ibm.com>
>>>> Date: 20/04/2017 09:24
>>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
>>>> addresses for DMA map
>>>>
>>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
>>>> just created DMA window. It happens to start from zero because the default
>>>> window is removed (leaving no windows) and new window starts from zero.
>>>> However this is not guaranteed and the new window may start from another
>>>> address, this adds an error check.
>>>>
>>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI
>>>> bus address while in this case a physical address of a user page is used.
>>>> This changes IOVA to start from zero in a hope that the rest of DPDK
>>>> expects this.
>>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the
>>> phys_addr of the memory segment it got from /proc/self/pagemap cf.
>>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the
>>> actual iova which basically makes the whole virtual to phyiscal mapping
>>> with pagemap unnecessary which I believe should be the case for VFIO
>>> anyway. Pagemap should only be needed when using pci_uio.
>>
>> Ah, ok, makes sense now. But it sure needs a big fat comment there as it is
>> not obvious why host RAM address is used there as DMA window start is not
>> guaranteed.
> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64 both
> have exact same value, in my setup it is 3fffb33c0000 which is a userspace
> address - at least ms[i].phys_addr must be physical address.

This patch breaks i40e_dev_init() in my server.

EAL: PCI device 0004:01:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1583 net_i40e
EAL:   using IOMMU type 7 (sPAPR)
eth_i40e_dev_init(): Failed to init adminq: -32
EAL: Releasing pci mapped resource for 0004:01:00.0
EAL: Calling pci_unmap_resource for 0004:01:00.0 at 0x3fff82aa0000
EAL: Requested device 0004:01:00.0 cannot be used
EAL: PCI device 0004:01:00.1 on NUMA socket 1
EAL:   probe driver: 8086:1583 net_i40e
EAL:   using IOMMU type 7 (sPAPR)
eth_i40e_dev_init(): Failed to init adminq: -32
EAL: Releasing pci mapped resource for 0004:01:00.1
EAL: Calling pci_unmap_resource for 0004:01:00.1 at 0x3fff82aa0000
EAL: Requested device 0004:01:00.1 cannot be used
EAL: No probed ethernet devices

I have two memseg each of 1G size. Their mapped PA and VA are also 
different.

(gdb) p /x ms[0]
$3 = {phys_addr = 0x1e0b000000, {addr = 0x3effaf000000, addr_64 = 
0x3effaf000000},
   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x1, nchannel 
= 0x0, nrank = 0x0}
(gdb) p /x ms[1]
$4 = {phys_addr = 0xf6d000000, {addr = 0x3efbaf000000, addr_64 = 
0x3efbaf000000},
   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x0, nchannel 
= 0x0, nrank = 0x0}

Could you please recheck this. May be, if new DMA window does not start 
from bus address 0,
only then you reset dma_map.iova for this offset ?


Thanks,
Gowrishankar

>
>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>>>> ---
>>>>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
>>>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
>>>> librte_eal/linuxapp/eal/eal_vfio.c
>>>> index 46f951f4d..8b8e75c4f 100644
>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>   {
>>>>      const struct rte_memseg *ms = rte_eal_get_physmem_layout();
>>>>      int i, ret;
>>>> -
>>>> +   phys_addr_t io_offset;
>>>>      struct vfio_iommu_spapr_register_memory reg = {
>>>>         .argsz = sizeof(reg),
>>>>         .flags = 0
>>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>         return -1;
>>>>      }
>>>>   
>>>> +   io_offset = create.start_addr;
>>>> +   if (io_offset) {
>>>> +      RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not supported, "
>>>> +            "new window is created at %lx\n", io_offset);
>>>> +      return -1;
>>>> +   }
>>>> +
>>>>      /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>>>>      for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>>>         struct vfio_iommu_type1_dma_map dma_map;
>>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>         dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>>>         dma_map.vaddr = ms[i].addr_64;
>>>>         dma_map.size = ms[i].len;
>>>> -      dma_map.iova = ms[i].phys_addr;
>>>> +      dma_map.iova = io_offset;
>>>>         dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>>>>                VFIO_DMA_MAP_FLAG_WRITE;
>>>>   
>>>> @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>            return -1;
>>>>         }
>>>>   
>>>> +      io_offset += dma_map.size;
>>>>      }
>>>>   
>>>>      return 0;
>>>> --
>>>> 2.11.0
>>>>
>>
>




More information about the dev mailing list