[RFC 0/4] Support VFIO sparse mmap in PCI bus

David Marchand david.marchand at redhat.com
Tue Apr 18 09:46:39 CEST 2023


Hello Chenbo,

On Tue, Apr 18, 2023 at 7:49 AM Chenbo Xia <chenbo.xia at intel.com> wrote:
>
> This series introduces a VFIO standard capability, called sparse
> mmap to PCI bus. In linux kernel, it's defined as
> VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> mmap whole BAR region into DPDK process, only mmap part of the
> BAR region after getting sparse mmap information from kernel.
> For the rest of BAR region that is not mmap-ed, DPDK process
> can use pread/pwrite system calls to access. Sparse mmap is
> useful when kernel does not want userspace to mmap whole BAR
> region, or kernel wants to control over access to specific BAR
> region. Vendors can choose to enable this feature or not for
> their devices in their specific kernel modules.

Sorry, I did not take the time to look into the details.
Could you summarize what would be the benefit of this series?


>
> In this patchset:
>
> Patch 1-3 is mainly for introducing BAR access APIs so that
> driver could use them to access specific BAR using pread/pwrite
> system calls when part of the BAR is not mmap-able.
>
> Patch 4 adds the VFIO sparse mmap support finally. A question
> is for all sparse mmap regions, should they be mapped to a
> continuous virtual address region that follows device-specific
> BAR layout or not. In theory, there could be three options to
> support this feature.
>
> Option 1: Map sparse mmap regions independently
> ======================================================
> In this approach, we mmap each sparse mmap region one by one
> and each region could be located anywhere in process address
> space. But accessing the mmaped BAR will not be as easy as
> 'bar_base_address + bar_offset', driver needs to check the
> sparse mmap information to access specific BAR register.
>
> Patch 4 in this patchset adopts this option. Driver API change
> is introduced in bus_pci_driver.h. Corresponding changes in
> all drivers are also done and currently I am assuming drivers
> do not support this feature so they will not check the
> 'is_sparse' flag but assumes it to be false. Note that it will
> not break any driver and each vendor can add related logic when
> they start to support this feature. This is only because I don't
> want to introduce complexity to drivers that do not want to
> support this feature.
>
> Option 2: Map sparse mmap regions based on device-specific BAR layout
> ======================================================================
> In this approach, the sparse mmap regions are mapped to continuous
> virtual address region that follows device-specific BAR layout.
> For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
> mmaped. Region #1 will be mapped at 'base_addr' and region #2
> will be mapped at 'base_addr + 0x3000'. The good thing is if
> we implement like this, driver can still access all BAR registers
> using 'bar_base_address + bar_offset' way and we don't need
> to introduce any driver API change. But the address space
> range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
> be reserved so it could result in waste of address space or memory
> (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
> range). Meanwhile, driver needs to know which part of BAR is
> mmaped (this is possible since the range is defined by vendor's
> specific kernel module).
>
> Option 3: Support both option 1 & 2
> ===================================
> We could define a driver flag to let driver choose which way it
> perfers since either option has its own Pros & Cons.
>
> Please share your comments, Thanks!
>
>
> Chenbo Xia (4):
>   bus/pci: introduce an internal representation of PCI device

I think this first patch main motivation was to avoid ABI issues.
Since v22.11, the rte_pci_device object is opaque to applications.

So, do we still need this patch?


>   bus/pci: avoid depending on private value in kernel source
>   bus/pci: introduce helper for MMIO read and write
>   bus/pci: add VFIO sparse mmap support
>
>  drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
>  drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
>  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
>  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
>  drivers/bus/pci/bsd/pci.c                     |  43 +-
>  drivers/bus/pci/bus_pci_driver.h              |  24 +-
>  drivers/bus/pci/linux/pci.c                   |  91 +++-
>  drivers/bus/pci/linux/pci_init.h              |  14 +-
>  drivers/bus/pci/linux/pci_uio.c               |  34 +-
>  drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
>  drivers/bus/pci/pci_common.c                  |  57 ++-
>  drivers/bus/pci/pci_common_uio.c              |  12 +-
>  drivers/bus/pci/private.h                     |  25 +-
>  drivers/bus/pci/rte_bus_pci.h                 |  48 ++
>  drivers/bus/pci/version.map                   |   3 +
>  drivers/common/cnxk/roc_dev.c                 |   4 +-
>  drivers/common/cnxk/roc_dpi.c                 |   2 +-
>  drivers/common/cnxk/roc_ml.c                  |  22 +-
>  drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
>  drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
>  drivers/common/sfc_efx/sfc_efx.c              |   2 +-
>  drivers/compress/octeontx/otx_zip.c           |   4 +-
>  drivers/crypto/ccp/ccp_dev.c                  |   4 +-
>  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
>  drivers/crypto/nitrox/nitrox_device.c         |   4 +-
>  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
>  drivers/crypto/virtio/virtio_pci.c            |   6 +-
>  drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
>  drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
>  drivers/dma/idxd/idxd_pci.c                   |   4 +-
>  drivers/dma/ioat/ioat_dmadev.c                |   2 +-
>  drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
>  drivers/event/octeontx/ssovf_probe.c          |  38 +-
>  drivers/event/octeontx/timvf_probe.c          |  18 +-
>  drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
>  drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
>  drivers/net/ark/ark_ethdev.c                  |   4 +-
>  drivers/net/atlantic/atl_ethdev.c             |   2 +-
>  drivers/net/avp/avp_ethdev.c                  |  20 +-
>  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
>  drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
>  drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
>  drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
>  drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
>  drivers/net/cxgbe/cxgbe_main.c                |   2 +-
>  drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
>  drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
>  drivers/net/e1000/em_ethdev.c                 |   4 +-
>  drivers/net/e1000/igb_ethdev.c                |   4 +-
>  drivers/net/ena/ena_ethdev.c                  |   4 +-
>  drivers/net/enetc/enetc_ethdev.c              |   2 +-
>  drivers/net/enic/enic_main.c                  |   4 +-
>  drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
>  drivers/net/gve/gve_ethdev.c                  |   4 +-
>  drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
>  drivers/net/hns3/hns3_ethdev.c                |   2 +-
>  drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
>  drivers/net/hns3/hns3_rxtx.c                  |   4 +-
>  drivers/net/i40e/i40e_ethdev.c                |   2 +-
>  drivers/net/iavf/iavf_ethdev.c                |   2 +-
>  drivers/net/ice/ice_dcf.c                     |   2 +-
>  drivers/net/ice/ice_ethdev.c                  |   2 +-
>  drivers/net/idpf/idpf_ethdev.c                |   4 +-
>  drivers/net/igc/igc_ethdev.c                  |   2 +-
>  drivers/net/ionic/ionic_dev_pci.c             |   2 +-
>  drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
>  drivers/net/liquidio/lio_ethdev.c             |   4 +-
>  drivers/net/nfp/nfp_ethdev.c                  |   2 +-
>  drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
>  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
>  drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
>  drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
>  drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
>  drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
>  drivers/net/qede/qede_main.c                  |   6 +-
>  drivers/net/sfc/sfc.c                         |   2 +-
>  drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
>  drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
>  drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
>  drivers/net/virtio/virtio_pci.c               |   6 +-
>  drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
>  drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
>  drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
>  drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
>  drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
>  drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
>  drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
>  drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
>  drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
>  lib/eal/include/rte_vfio.h                    |   1 -
>  90 files changed, 853 insertions(+), 352 deletions(-)


-- 
David Marchand



More information about the dev mailing list