[v3,1/1] bus/pci: optimise scanning with whitelist/blacklist

Message ID 20200420065554.20138-1-skori@marvell.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series [v3,1/1] bus/pci: optimise scanning with whitelist/blacklist |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-nxp-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-testing fail Testing issues
ci/travis-robot success Travis build: passed
ci/Intel-compilation success Compilation OK

Commit Message

Sunil Kumar Kori April 20, 2020, 6:55 a.m. UTC
  rte_bus_scan API scans all the available PCI devices irrespective of white
or black listing parameters then further devices are probed based on white
or black listing parameters. So unnecessary CPU cycles are wasted during
rte_pci_scan.

For Octeontx2 platform with core frequency 2.4 Ghz, rte_bus_scan consumes
around 26ms to scan around 90 PCI devices but all may not be used by the
application. So for the application which uses 2 NICs, rte_bus_scan
consumes few microseconds and rest time is saved with this patch.

Patch restricts devices to be scanned as per below mentioned conditions:
 - All devices will be scanned if no parameters are passed.
 - Only white listed devices will be scanned if white list is available.
 - All devices, except black listed, will be scanned if black list is
   available.

Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
---
v3:
 - remove __rte_experimental from private function.
 - remove entry from map file too.
v2:
 - Added function to validate ignorance of device based on PCI address.
 - Marked device validation function as experimental.

 drivers/bus/pci/bsd/pci.c    | 13 ++++++++++++-
 drivers/bus/pci/linux/pci.c  |  3 +++
 drivers/bus/pci/pci_common.c | 34 ++++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h    | 11 +++++++++++
 4 files changed, 60 insertions(+), 1 deletion(-)
  

Comments

Gaëtan Rivet April 21, 2020, 3:18 p.m. UTC | #1
On 20/04/20 12:25 +0530, Sunil Kumar Kori wrote:
> rte_bus_scan API scans all the available PCI devices irrespective of white
> or black listing parameters then further devices are probed based on white
> or black listing parameters. So unnecessary CPU cycles are wasted during
> rte_pci_scan.
> 
> For Octeontx2 platform with core frequency 2.4 Ghz, rte_bus_scan consumes
> around 26ms to scan around 90 PCI devices but all may not be used by the
> application. So for the application which uses 2 NICs, rte_bus_scan
> consumes few microseconds and rest time is saved with this patch.
> 

Hi Sunil,

The PCI bus was written at first with the understanding that all PCI
devices were scanned and made available on the bus -- the probe will filter
afterward.

Device hotplug and iteration were written with this in mind. Changing
this principle might have unintended consequences in other EAL parts.
I'm not fundamentally against it, but it is not how buses are currently
designed in DPDK.

To me, a one-time 26ms gain is not enough justification to change this
principle. How problematic is this for you? Do you encounter specific
issues due to this delay?

Thanks,
  
Sunil Kumar Kori April 22, 2020, 6:17 a.m. UTC | #2
>-----Original Message-----
>From: Gaëtan Rivet <grive@u256.net>
>Sent: Tuesday, April 21, 2020 8:48 PM
>To: Sunil Kumar Kori <skori@marvell.com>
>Cc: stephen@networkplumber.org; david.marchand@redhat.com; Jerin Jacob
>Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org
>Subject: [EXT] Re: [dpdk-dev] [PATCH v3 1/1] bus/pci: optimise scanning with
>whitelist/blacklist
>
>External Email
>
>----------------------------------------------------------------------
>On 20/04/20 12:25 +0530, Sunil Kumar Kori wrote:
>> rte_bus_scan API scans all the available PCI devices irrespective of
>> white or black listing parameters then further devices are probed
>> based on white or black listing parameters. So unnecessary CPU cycles
>> are wasted during rte_pci_scan.
>>
>> For Octeontx2 platform with core frequency 2.4 Ghz, rte_bus_scan
>> consumes around 26ms to scan around 90 PCI devices but all may not be
>> used by the application. So for the application which uses 2 NICs,
>> rte_bus_scan consumes few microseconds and rest time is saved with this
>patch.
>>
>
>Hi Sunil,
>
>The PCI bus was written at first with the understanding that all PCI devices
>were scanned and made available on the bus -- the probe will filter afterward.
>
>Device hotplug and iteration were written with this in mind. Changing this
>principle might have unintended consequences in other EAL parts.
>I'm not fundamentally against it, but it is not how buses are currently
>designed in DPDK.
>
I am also not sure about this. I would request you provide suggestion to ensure that there won't be
any negative consequences if any.  So that I can handle those too.

>To me, a one-time 26ms gain is not enough justification to change this
>principle. How problematic is this for you? Do you encounter specific issues
>due to this delay?
>
>Thanks,

Recently we observed this requirement to cater a use of having lowest bootup time for DPDK application.
One of the use-case for this to reduce the downtime as part of DPDK SW upgrade in the field. i.e
after the SW update, time to close the application and restart it again for packet processing.
Having this solution application will be up soon and lesser traffic impact will be there in a deployed system.

>--
>Gaëtan
  
Gaëtan Rivet April 22, 2020, 9:38 a.m. UTC | #3
On 22/04/20 06:17 +0000, Sunil Kumar Kori wrote:
> >-----Original Message-----
> >From: Gaëtan Rivet <grive@u256.net>
> >Sent: Tuesday, April 21, 2020 8:48 PM
> >To: Sunil Kumar Kori <skori@marvell.com>
> >Cc: stephen@networkplumber.org; david.marchand@redhat.com; Jerin Jacob
> >Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org
> >Subject: [EXT] Re: [dpdk-dev] [PATCH v3 1/1] bus/pci: optimise scanning with
> >whitelist/blacklist
> >
> >External Email
> >
> >----------------------------------------------------------------------
> >On 20/04/20 12:25 +0530, Sunil Kumar Kori wrote:
> >> rte_bus_scan API scans all the available PCI devices irrespective of
> >> white or black listing parameters then further devices are probed
> >> based on white or black listing parameters. So unnecessary CPU cycles
> >> are wasted during rte_pci_scan.
> >>
> >> For Octeontx2 platform with core frequency 2.4 Ghz, rte_bus_scan
> >> consumes around 26ms to scan around 90 PCI devices but all may not be
> >> used by the application. So for the application which uses 2 NICs,
> >> rte_bus_scan consumes few microseconds and rest time is saved with this
> >patch.
> >>
> >
> >Hi Sunil,
> >
> >The PCI bus was written at first with the understanding that all PCI devices
> >were scanned and made available on the bus -- the probe will filter afterward.
> >
> >Device hotplug and iteration were written with this in mind. Changing this
> >principle might have unintended consequences in other EAL parts.
> >I'm not fundamentally against it, but it is not how buses are currently
> >designed in DPDK.
> >
> I am also not sure about this. I would request you provide suggestion to ensure that there won't be
> any negative consequences if any.  So that I can handle those too.
> 

I would like also to hear from other stakeholders for the PCI bus.

Generally, as long as the blacklist mode is the default, behavior should
not change, but devil is in the details.

I would have some comments on the patch itself if everyone agrees with
this direction.

If the principle of the patch is accepted, it would be great for you to
test hotplug and device listing with testpmd:

   hotplug:
    * You can spawn VMs with virtual e1000 ports on PCI using QEMU for this,
      and within testpmd `port attach <pci-id>` -- of course, the
      port(s) should not be attached when starting testpmd. You might
      have to either blacklist them, or you could hotplug them in QEMU using the
      monitor. I don't recall the QEMU commands to do that, sorry.

   device list:
    * `show device info all` in testpmd. I thought I had added a command to
      test the device iterator, taking an arbitrary device string, but
      the patch has been dropped it seems.

If you have no segfault and no surprise, it is a good start.

> >To me, a one-time 26ms gain is not enough justification to change this
> >principle. How problematic is this for you? Do you encounter specific issues
> >due to this delay?
> >
> >Thanks,
> 
> Recently we observed this requirement to cater a use of having lowest bootup time for DPDK application.
> One of the use-case for this to reduce the downtime as part of DPDK SW upgrade in the field. i.e
> after the SW update, time to close the application and restart it again for packet processing.
> Having this solution application will be up soon and lesser traffic impact will be there in a deployed system.

DPDK startup was not written with low latency in mind. You will find here
and there minute improvements, but I think it is a pipedream to reduce
service disruption on binary upgrade.

People in the field would be better served with HA, not relying on a
critical apps restarting as fast as possible.

Cheers,
  
Sunil Kumar Kori April 23, 2020, 7:47 a.m. UTC | #4
>-----Original Message-----
>From: Gaëtan Rivet <grive@u256.net>
>Sent: Wednesday, April 22, 2020 3:09 PM
>To: Sunil Kumar Kori <skori@marvell.com>
>Cc: stephen@networkplumber.org; david.marchand@redhat.com; Jerin Jacob
>Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org
>Subject: Re: [EXT] Re: [dpdk-dev] [PATCH v3 1/1] bus/pci: optimise scanning
>with whitelist/blacklist
>
>On 22/04/20 06:17 +0000, Sunil Kumar Kori wrote:
>> >-----Original Message-----
>> >From: Gaëtan Rivet <grive@u256.net>
>> >Sent: Tuesday, April 21, 2020 8:48 PM
>> >To: Sunil Kumar Kori <skori@marvell.com>
>> >Cc: stephen@networkplumber.org; david.marchand@redhat.com; Jerin
>> >Jacob Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org
>> >Subject: [EXT] Re: [dpdk-dev] [PATCH v3 1/1] bus/pci: optimise
>> >scanning with whitelist/blacklist
>> >
>> >External Email
>> >
>> >---------------------------------------------------------------------
>> >- On 20/04/20 12:25 +0530, Sunil Kumar Kori wrote:
>> >> rte_bus_scan API scans all the available PCI devices irrespective
>> >> of white or black listing parameters then further devices are
>> >> probed based on white or black listing parameters. So unnecessary
>> >> CPU cycles are wasted during rte_pci_scan.
>> >>
>> >> For Octeontx2 platform with core frequency 2.4 Ghz, rte_bus_scan
>> >> consumes around 26ms to scan around 90 PCI devices but all may not
>> >> be used by the application. So for the application which uses 2
>> >> NICs, rte_bus_scan consumes few microseconds and rest time is saved
>> >> with this
>> >patch.
>> >>
>> >
>> >Hi Sunil,
>> >
>> >The PCI bus was written at first with the understanding that all PCI
>> >devices were scanned and made available on the bus -- the probe will filter
>afterward.
>> >
>> >Device hotplug and iteration were written with this in mind. Changing
>> >this principle might have unintended consequences in other EAL parts.
>> >I'm not fundamentally against it, but it is not how buses are
>> >currently designed in DPDK.
>> >
>> I am also not sure about this. I would request you provide suggestion
>> to ensure that there won't be any negative consequences if any.  So that I
>can handle those too.
>>
>
>I would like also to hear from other stakeholders for the PCI bus.
>
>Generally, as long as the blacklist mode is the default, behavior should not
>change, but devil is in the details.
>
>I would have some comments on the patch itself if everyone agrees with this
>direction.
>
>If the principle of the patch is accepted, it would be great for you to test
>hotplug and device listing with testpmd:
>
>   hotplug:
>    * You can spawn VMs with virtual e1000 ports on PCI using QEMU for this,
>      and within testpmd `port attach <pci-id>` -- of course, the
>      port(s) should not be attached when starting testpmd. You might
>      have to either blacklist them, or you could hotplug them in QEMU using the
>      monitor. I don't recall the QEMU commands to do that, sorry.
>
>   device list:
>    * `show device info all` in testpmd. I thought I had added a command to
>      test the device iterator, taking an arbitrary device string, but
>      the patch has been dropped it seems.
>
>If you have no segfault and no surprise, it is a good start.
>
Okay but before verification I would appreciate to have more comments and
closure on principle from other PCI stakeholders. If there is no objection on
principle then I will invest energy in testing.
I will wait for inputs by next week and if there are no inputs then
 assuming it fundamentally correct, I will start verification of above test cases. 

Also If anyone has already validated above mentioned test cases then
Suggestions, about the impact of this patch on PCI bus design, will be very
helpful to understand the real issues with this. 

>> >To me, a one-time 26ms gain is not enough justification to change
>> >this principle. How problematic is this for you? Do you encounter
>> >specific issues due to this delay?
>> >
>> >Thanks,
>>
>> Recently we observed this requirement to cater a use of having lowest
>bootup time for DPDK application.
>> One of the use-case for this to reduce the downtime as part of DPDK SW
>> upgrade in the field. i.e after the SW update, time to close the application
>and restart it again for packet processing.
>> Having this solution application will be up soon and lesser traffic impact will
>be there in a deployed system.
>
>DPDK startup was not written with low latency in mind. You will find here and
>there minute improvements, but I think it is a pipedream to reduce service
>disruption on binary upgrade.
>
>People in the field would be better served with HA, not relying on a critical
>apps restarting as fast as possible.
>
Recently we had a requirement to have bootup time <= 500ms and find
it as one of the candidate to be improved. So thought of to upstream it. 
Also having mechanism to improve bootup time is  good. I think, there is
no harm in this.

>Cheers,
>--
>Gaëtan
  
Gaëtan Rivet April 27, 2020, 6:43 p.m. UTC | #5
Hello Sunil,

As it seems that this patch does not overly alarm people using the PCI
bus, I have a few comments on this version. Sending those a little
earlier will allow you to change as needed before proceeding with your
tests.

On 20/04/20 12:25 +0530, Sunil Kumar Kori wrote:
> rte_bus_scan API scans all the available PCI devices irrespective of white
> or black listing parameters then further devices are probed based on white
> or black listing parameters. So unnecessary CPU cycles are wasted during
> rte_pci_scan.
> 
> For Octeontx2 platform with core frequency 2.4 Ghz, rte_bus_scan consumes
> around 26ms to scan around 90 PCI devices but all may not be used by the
> application. So for the application which uses 2 NICs, rte_bus_scan
> consumes few microseconds and rest time is saved with this patch.
> 
> Patch restricts devices to be scanned as per below mentioned conditions:
>  - All devices will be scanned if no parameters are passed.
>  - Only white listed devices will be scanned if white list is available.
>  - All devices, except black listed, will be scanned if black list is
>    available.
> 
> Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
> ---
> v3:
>  - remove __rte_experimental from private function.
>  - remove entry from map file too.
> v2:
>  - Added function to validate ignorance of device based on PCI address.
>  - Marked device validation function as experimental.
> 
>  drivers/bus/pci/bsd/pci.c    | 13 ++++++++++++-
>  drivers/bus/pci/linux/pci.c  |  3 +++
>  drivers/bus/pci/pci_common.c | 34 ++++++++++++++++++++++++++++++++++
>  drivers/bus/pci/private.h    | 11 +++++++++++
>  4 files changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
> index ebbfeb13a..c8d954751 100644
> --- a/drivers/bus/pci/bsd/pci.c
> +++ b/drivers/bus/pci/bsd/pci.c
> @@ -338,6 +338,7 @@ rte_pci_scan(void)
>  			.match_buf_len = sizeof(matches),
>  			.matches = &matches[0],
>  	};
> +	struct rte_pci_addr pci_addr;
>  
>  	/* for debug purposes, PCI can be disabled */
>  	if (!rte_eal_has_pci())
> @@ -357,9 +358,19 @@ rte_pci_scan(void)
>  			goto error;
>  		}
>  
> -		for (i = 0; i < conf_io.num_matches; i++)
> +		for (i = 0; i < conf_io.num_matches; i++) {
> +			pci_addr.domain = matches[i].pc_sel.pc_domain;
> +			pci_addr.bus = matches[i].pc_sel.pc_bus;
> +			pci_addr.devid = matches[i].pc_sel.pc_dev;
> +			pci_addr.function = matches[i].pc_sel.pc_func;
> +
> +			/* Check that device should be ignored or not  */

This comment is unnecessary, the function name should be sufficient to
describe the check done.

> +			if (pci_addr_ignore_device(&pci_addr))
> +				continue;
> +

As this function is almost a copy of pci_ignore_device(), with a twist
on the addr, I think the name pci_ignore_device_addr() would be more
helpful.

I think in general however, that exposed symbols, even internals,
should be prefixed with rte_. It was (almost) ok for
pci_ignore_device() to forego the namespace as it is a static function
that will be mangled on export, but that won't be the case for your
function.

Please add rte_ prefix.

>  			if (pci_scan_one(fd, &matches[i]) < 0)
>  				goto error;
> +		}
>  
>  		dev_count += conf_io.num_matches;
>  	} while(conf_io.status == PCI_GETCONF_MORE_DEVS);
> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
> index 71b0a3053..92bdad826 100644
> --- a/drivers/bus/pci/linux/pci.c
> +++ b/drivers/bus/pci/linux/pci.c
> @@ -487,6 +487,9 @@ rte_pci_scan(void)
>  		if (parse_pci_addr_format(e->d_name, sizeof(e->d_name), &addr) != 0)
>  			continue;
>  
> +		if (pci_addr_ignore_device(&addr))
> +			continue;
> +
>  		snprintf(dirname, sizeof(dirname), "%s/%s",
>  				rte_pci_get_sysfs_path(), e->d_name);
>  
> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> index 3f5542076..a350a1993 100644
> --- a/drivers/bus/pci/pci_common.c
> +++ b/drivers/bus/pci/pci_common.c
> @@ -589,6 +589,40 @@ pci_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
>  	return -1;
>  }
>  
> +static struct rte_devargs *
> +pci_addr_devargs_lookup(const struct rte_pci_addr *pci_addr)
> +{
> +	struct rte_devargs *devargs;
> +	struct rte_pci_addr addr;
> +
> +	RTE_EAL_DEVARGS_FOREACH("pci", devargs) {
> +		devargs->bus->parse(devargs->name, &addr);

Why not use rte_pci_addr_parse directly there? The bus->parse() API,
while stable, is one-level of indirection removed from what's done,
it's simpler for the reader to see the intent by using the proper function.

Return value should be checked. If the devargs name is not parseable,
there are other issues at hand (memory corruption), we should skip the
device as well or crash, but not proceed with comparison.

> +		if (!rte_pci_addr_cmp(pci_addr, &addr))
> +			return devargs;
> +	}
> +	return NULL;
> +}
> +
> +bool
> +pci_addr_ignore_device(const struct rte_pci_addr *pci_addr)
> +{
> +	struct rte_devargs *devargs = pci_addr_devargs_lookup(pci_addr);
> +
> +	switch (rte_pci_bus.bus.conf.scan_mode) {
> +	case RTE_BUS_SCAN_WHITELIST:
> +		if (devargs && devargs->policy == RTE_DEV_WHITELISTED)
> +			return false;
> +		break;
> +	case RTE_BUS_SCAN_UNDEFINED:
> +	case RTE_BUS_SCAN_BLACKLIST:
> +		if (devargs == NULL ||
> +		    devargs->policy != RTE_DEV_BLACKLISTED)
> +			return false;
> +		break;
> +	}
> +	return true;
> +}
> +
>  static bool
>  pci_ignore_device(const struct rte_pci_device *dev)
>  {

The logic seems ok to me.

However, the logic is the same as the one in rte_pci_probe(). During
probe, any device on the bus would have already been vetted during scan.
It should be ok to probe all existing rte_pci_device.

The same argument can be made for rte_pci_get_iommu_class() then, no
need to use pci_ignore_device(). It is done after the scan() so it
should be ok.

And if pci_ignore_device() can be completely removed, then you should
rename your function from rte_pci_ignore_device_addr() to
rte_pci_ignore_device() altogether.

> diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
> index a205d4d9f..75af786f7 100644
> --- a/drivers/bus/pci/private.h
> +++ b/drivers/bus/pci/private.h
> @@ -42,6 +42,17 @@ int rte_pci_scan(void);
>  void
>  pci_name_set(struct rte_pci_device *dev);
>  
> +/**
> + * Validate whether a device with given pci address should be ignored or not.
> + *
> + * @param pci_addr
> + *	PCI address of device to be validated
> + * @return
> + *	1: if device is to be ignored,
> + *	0: if device is to be scanned,
> + */
> +bool pci_addr_ignore_device(const struct rte_pci_addr *pci_addr);
> +
>  /**
>   * Add a PCI device to the PCI Bus (append to PCI Device list). This function
>   * also updates the bus references of the PCI Device (and the generic device
> -- 
> 2.17.1

Best regards,
  
Sunil Kumar Kori April 28, 2020, 1:52 p.m. UTC | #6
>-----Original Message-----
>From: Gaëtan Rivet <grive@u256.net>
>Sent: Tuesday, April 28, 2020 12:14 AM
>To: Sunil Kumar Kori <skori@marvell.com>
>Cc: stephen@networkplumber.org; david.marchand@redhat.com; Jerin Jacob
>Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org
>Subject: [EXT] Re: [dpdk-dev] [PATCH v3 1/1] bus/pci: optimise scanning with
>whitelist/blacklist
>
>External Email
>
>----------------------------------------------------------------------
>Hello Sunil,
>
>As it seems that this patch does not overly alarm people using the PCI
>bus, I have a few comments on this version. Sending those a little
>earlier will allow you to change as needed before proceeding with your
>tests.
>
>On 20/04/20 12:25 +0530, Sunil Kumar Kori wrote:
>> rte_bus_scan API scans all the available PCI devices irrespective of white
>> or black listing parameters then further devices are probed based on white
>> or black listing parameters. So unnecessary CPU cycles are wasted during
>> rte_pci_scan.
>>
>> For Octeontx2 platform with core frequency 2.4 Ghz, rte_bus_scan
>consumes
>> around 26ms to scan around 90 PCI devices but all may not be used by the
>> application. So for the application which uses 2 NICs, rte_bus_scan
>> consumes few microseconds and rest time is saved with this patch.
>>
>> Patch restricts devices to be scanned as per below mentioned conditions:
>>  - All devices will be scanned if no parameters are passed.
>>  - Only white listed devices will be scanned if white list is available.
>>  - All devices, except black listed, will be scanned if black list is
>>    available.
>>
>> Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
>> ---
>> v3:
>>  - remove __rte_experimental from private function.
>>  - remove entry from map file too.
>> v2:
>>  - Added function to validate ignorance of device based on PCI address.
>>  - Marked device validation function as experimental.
>>
>>  drivers/bus/pci/bsd/pci.c    | 13 ++++++++++++-
>>  drivers/bus/pci/linux/pci.c  |  3 +++
>>  drivers/bus/pci/pci_common.c | 34
>++++++++++++++++++++++++++++++++++
>>  drivers/bus/pci/private.h    | 11 +++++++++++
>>  4 files changed, 60 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
>> index ebbfeb13a..c8d954751 100644
>> --- a/drivers/bus/pci/bsd/pci.c
>> +++ b/drivers/bus/pci/bsd/pci.c
>> @@ -338,6 +338,7 @@ rte_pci_scan(void)
>>  			.match_buf_len = sizeof(matches),
>>  			.matches = &matches[0],
>>  	};
>> +	struct rte_pci_addr pci_addr;
>>
>>  	/* for debug purposes, PCI can be disabled */
>>  	if (!rte_eal_has_pci())
>> @@ -357,9 +358,19 @@ rte_pci_scan(void)
>>  			goto error;
>>  		}
>>
>> -		for (i = 0; i < conf_io.num_matches; i++)
>> +		for (i = 0; i < conf_io.num_matches; i++) {
>> +			pci_addr.domain = matches[i].pc_sel.pc_domain;
>> +			pci_addr.bus = matches[i].pc_sel.pc_bus;
>> +			pci_addr.devid = matches[i].pc_sel.pc_dev;
>> +			pci_addr.function = matches[i].pc_sel.pc_func;
>> +
>> +			/* Check that device should be ignored or not  */
>
>This comment is unnecessary, the function name should be sufficient to
>describe the check done.
>
Ack

>> +			if (pci_addr_ignore_device(&pci_addr))
>> +				continue;
>> +
>
>As this function is almost a copy of pci_ignore_device(), with a twist
>on the addr, I think the name pci_ignore_device_addr() would be more
>helpful.
>
>I think in general however, that exposed symbols, even internals,
>should be prefixed with rte_. It was (almost) ok for
>pci_ignore_device() to forego the namespace as it is a static function
>that will be mangled on export, but that won't be the case for your
>function.
>
>Please add rte_ prefix.
>
Ack

>>  			if (pci_scan_one(fd, &matches[i]) < 0)
>>  				goto error;
>> +		}
>>
>>  		dev_count += conf_io.num_matches;
>>  	} while(conf_io.status == PCI_GETCONF_MORE_DEVS);
>> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
>> index 71b0a3053..92bdad826 100644
>> --- a/drivers/bus/pci/linux/pci.c
>> +++ b/drivers/bus/pci/linux/pci.c
>> @@ -487,6 +487,9 @@ rte_pci_scan(void)
>>  		if (parse_pci_addr_format(e->d_name, sizeof(e->d_name),
>&addr) != 0)
>>  			continue;
>>
>> +		if (pci_addr_ignore_device(&addr))
>> +			continue;
>> +
>>  		snprintf(dirname, sizeof(dirname), "%s/%s",
>>  				rte_pci_get_sysfs_path(), e->d_name);
>>
>> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
>> index 3f5542076..a350a1993 100644
>> --- a/drivers/bus/pci/pci_common.c
>> +++ b/drivers/bus/pci/pci_common.c
>> @@ -589,6 +589,40 @@ pci_dma_unmap(struct rte_device *dev, void
>*addr, uint64_t iova, size_t len)
>>  	return -1;
>>  }
>>
>> +static struct rte_devargs *
>> +pci_addr_devargs_lookup(const struct rte_pci_addr *pci_addr)
>> +{
>> +	struct rte_devargs *devargs;
>> +	struct rte_pci_addr addr;
>> +
>> +	RTE_EAL_DEVARGS_FOREACH("pci", devargs) {
>> +		devargs->bus->parse(devargs->name, &addr);
>
>Why not use rte_pci_addr_parse directly there? The bus->parse() API,
>while stable, is one-level of indirection removed from what's done,
>it's simpler for the reader to see the intent by using the proper function.
>
>Return value should be checked. If the devargs name is not parseable,
>there are other issues at hand (memory corruption), we should skip the
>device as well or crash, but not proceed with comparison.
>
Ack

>> +		if (!rte_pci_addr_cmp(pci_addr, &addr))
>> +			return devargs;
>> +	}
>> +	return NULL;
>> +}
>> +
>> +bool
>> +pci_addr_ignore_device(const struct rte_pci_addr *pci_addr)
>> +{
>> +	struct rte_devargs *devargs = pci_addr_devargs_lookup(pci_addr);
>> +
>> +	switch (rte_pci_bus.bus.conf.scan_mode) {
>> +	case RTE_BUS_SCAN_WHITELIST:
>> +		if (devargs && devargs->policy == RTE_DEV_WHITELISTED)
>> +			return false;
>> +		break;
>> +	case RTE_BUS_SCAN_UNDEFINED:
>> +	case RTE_BUS_SCAN_BLACKLIST:
>> +		if (devargs == NULL ||
>> +		    devargs->policy != RTE_DEV_BLACKLISTED)
>> +			return false;
>> +		break;
>> +	}
>> +	return true;
>> +}
>> +
>>  static bool
>>  pci_ignore_device(const struct rte_pci_device *dev)
>>  {
>
>The logic seems ok to me.
>
>However, the logic is the same as the one in rte_pci_probe(). During
>probe, any device on the bus would have already been vetted during scan.
>It should be ok to probe all existing rte_pci_device.
>
>The same argument can be made for rte_pci_get_iommu_class() then, no
>need to use pci_ignore_device(). It is done after the scan() so it
>should be ok.
>
>And if pci_ignore_device() can be completely removed, then you should
>rename your function from rte_pci_ignore_device_addr() to
>rte_pci_ignore_device() altogether.
>
Ack

>> diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
>> index a205d4d9f..75af786f7 100644
>> --- a/drivers/bus/pci/private.h
>> +++ b/drivers/bus/pci/private.h
>> @@ -42,6 +42,17 @@ int rte_pci_scan(void);
>>  void
>>  pci_name_set(struct rte_pci_device *dev);
>>
>> +/**
>> + * Validate whether a device with given pci address should be ignored or
>not.
>> + *
>> + * @param pci_addr
>> + *	PCI address of device to be validated
>> + * @return
>> + *	1: if device is to be ignored,
>> + *	0: if device is to be scanned,
>> + */
>> +bool pci_addr_ignore_device(const struct rte_pci_addr *pci_addr);
>> +
>>  /**
>>   * Add a PCI device to the PCI Bus (append to PCI Device list). This function
>>   * also updates the bus references of the PCI Device (and the generic device
>> --
>> 2.17.1
>
>Best regards,
>--
>Gaëtan
  

Patch

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index ebbfeb13a..c8d954751 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -338,6 +338,7 @@  rte_pci_scan(void)
 			.match_buf_len = sizeof(matches),
 			.matches = &matches[0],
 	};
+	struct rte_pci_addr pci_addr;
 
 	/* for debug purposes, PCI can be disabled */
 	if (!rte_eal_has_pci())
@@ -357,9 +358,19 @@  rte_pci_scan(void)
 			goto error;
 		}
 
-		for (i = 0; i < conf_io.num_matches; i++)
+		for (i = 0; i < conf_io.num_matches; i++) {
+			pci_addr.domain = matches[i].pc_sel.pc_domain;
+			pci_addr.bus = matches[i].pc_sel.pc_bus;
+			pci_addr.devid = matches[i].pc_sel.pc_dev;
+			pci_addr.function = matches[i].pc_sel.pc_func;
+
+			/* Check that device should be ignored or not  */
+			if (pci_addr_ignore_device(&pci_addr))
+				continue;
+
 			if (pci_scan_one(fd, &matches[i]) < 0)
 				goto error;
+		}
 
 		dev_count += conf_io.num_matches;
 	} while(conf_io.status == PCI_GETCONF_MORE_DEVS);
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 71b0a3053..92bdad826 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -487,6 +487,9 @@  rte_pci_scan(void)
 		if (parse_pci_addr_format(e->d_name, sizeof(e->d_name), &addr) != 0)
 			continue;
 
+		if (pci_addr_ignore_device(&addr))
+			continue;
+
 		snprintf(dirname, sizeof(dirname), "%s/%s",
 				rte_pci_get_sysfs_path(), e->d_name);
 
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 3f5542076..a350a1993 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -589,6 +589,40 @@  pci_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
 	return -1;
 }
 
+static struct rte_devargs *
+pci_addr_devargs_lookup(const struct rte_pci_addr *pci_addr)
+{
+	struct rte_devargs *devargs;
+	struct rte_pci_addr addr;
+
+	RTE_EAL_DEVARGS_FOREACH("pci", devargs) {
+		devargs->bus->parse(devargs->name, &addr);
+		if (!rte_pci_addr_cmp(pci_addr, &addr))
+			return devargs;
+	}
+	return NULL;
+}
+
+bool
+pci_addr_ignore_device(const struct rte_pci_addr *pci_addr)
+{
+	struct rte_devargs *devargs = pci_addr_devargs_lookup(pci_addr);
+
+	switch (rte_pci_bus.bus.conf.scan_mode) {
+	case RTE_BUS_SCAN_WHITELIST:
+		if (devargs && devargs->policy == RTE_DEV_WHITELISTED)
+			return false;
+		break;
+	case RTE_BUS_SCAN_UNDEFINED:
+	case RTE_BUS_SCAN_BLACKLIST:
+		if (devargs == NULL ||
+		    devargs->policy != RTE_DEV_BLACKLISTED)
+			return false;
+		break;
+	}
+	return true;
+}
+
 static bool
 pci_ignore_device(const struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index a205d4d9f..75af786f7 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -42,6 +42,17 @@  int rte_pci_scan(void);
 void
 pci_name_set(struct rte_pci_device *dev);
 
+/**
+ * Validate whether a device with given pci address should be ignored or not.
+ *
+ * @param pci_addr
+ *	PCI address of device to be validated
+ * @return
+ *	1: if device is to be ignored,
+ *	0: if device is to be scanned,
+ */
+bool pci_addr_ignore_device(const struct rte_pci_addr *pci_addr);
+
 /**
  * Add a PCI device to the PCI Bus (append to PCI Device list). This function
  * also updates the bus references of the PCI Device (and the generic device