[dpdk-dev] pci: limit default numa node to used devices

Message ID 20170721091119.15701-1-sergio.gonzalez.monroy@intel.com (mailing list archive)
State Accepted, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Sergio Gonzalez Monroy July 21, 2017, 9:11 a.m. UTC
  Commit 8a04cb612589 ("pci: set default numa node for broken systems")
added logic to default to NUMA node 0 when sysfs numa_node information
was wrong or not available.

Unfortunately there are many devices with wrong NUMA node information
that DPDK does not care about but still show warnings for them.

Instead, only check for invalid NUMA node information for devices
managed by the DPDK.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
---
 lib/librte_eal/common/eal_common_pci.c |  5 +++++
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 11 +++--------
 2 files changed, 8 insertions(+), 8 deletions(-)
  

Comments

Thomas Monjalon July 21, 2017, 2:53 p.m. UTC | #1
The title and the text below should explain that you move
the warning log from scan to probe, thanks to a temporary
negative value.

21/07/2017 12:11, Sergio Gonzalez Monroy:
> Commit 8a04cb612589 ("pci: set default numa node for broken systems")
> added logic to default to NUMA node 0 when sysfs numa_node information
> was wrong or not available.
> 
> Unfortunately there are many devices with wrong NUMA node information
> that DPDK does not care about but still show warnings for them.
> 
> Instead, only check for invalid NUMA node information for devices
> managed by the DPDK.
> 
> Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
[...]
> -	if (eal_parse_sysfs_value(filename, &tmp) == 0 &&
> -		tmp < RTE_MAX_NUMA_NODES)
> +	if (eal_parse_sysfs_value(filename, &tmp) == 0)
>  		dev->device.numa_node = tmp;

Why are you removing the check of the value?
Are you going to accept invalid high values?
This check was introduced on purpose by this commit:
	http://dpdk.org/commit/8a04cb6125
  
Sergio Gonzalez Monroy July 21, 2017, 3:03 p.m. UTC | #2
On 21/07/2017 15:53, Thomas Monjalon wrote:
> The title and the text below should explain that you move
> the warning log from scan to probe, thanks to a temporary
> negative value.

I thought that saying that I only check for devices managed by dpdk 
explains the purpose,
and the patch itself shows the change from one file to another.

> 21/07/2017 12:11, Sergio Gonzalez Monroy:
>> Commit 8a04cb612589 ("pci: set default numa node for broken systems")
>> added logic to default to NUMA node 0 when sysfs numa_node information
>> was wrong or not available.
>>
>> Unfortunately there are many devices with wrong NUMA node information
>> that DPDK does not care about but still show warnings for them.
>>
>> Instead, only check for invalid NUMA node information for devices
>> managed by the DPDK.
>>
>> Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
> [...]
>> -	if (eal_parse_sysfs_value(filename, &tmp) == 0 &&
>> -		tmp < RTE_MAX_NUMA_NODES)
>> +	if (eal_parse_sysfs_value(filename, &tmp) == 0)
>>   		dev->device.numa_node = tmp;
> Why are you removing the check of the value?
> Are you going to accept invalid high values?
> This check was introduced on purpose by this commit:
> 	http://dpdk.org/commit/8a04cb6125

tmp is unsigned long type, so -1 is going to be a large number.
My understanding was that it was basically checking for -1 as numa_node.

If we have valid numa_node greater than RTE_MAX_NUMA_NODES, defaulting 
to 0 is not a good idea, is it?

What I try to achieve with the patch is:
- if no numa_node avilable then parse is going to fail and we set -1.
- if numa_node is present but wrong, my understanding was that it would 
be -1.

Thanks,
Sergio
  
Thomas Monjalon July 21, 2017, 3:37 p.m. UTC | #3
21/07/2017 18:03, Sergio Gonzalez Monroy:
> On 21/07/2017 15:53, Thomas Monjalon wrote:
> > The title and the text below should explain that you move
> > the warning log from scan to probe, thanks to a temporary
> > negative value.
> 
> I thought that saying that I only check for devices managed by dpdk 
> explains the purpose,
> and the patch itself shows the change from one file to another.

It is obvious when you look carefully at the code, yes.
I was just giving my help to better explain :)
> 
> > 21/07/2017 12:11, Sergio Gonzalez Monroy:
> >> Commit 8a04cb612589 ("pci: set default numa node for broken systems")
> >> added logic to default to NUMA node 0 when sysfs numa_node information
> >> was wrong or not available.
> >>
> >> Unfortunately there are many devices with wrong NUMA node information
> >> that DPDK does not care about but still show warnings for them.
> >>
> >> Instead, only check for invalid NUMA node information for devices
> >> managed by the DPDK.
> >>
> >> Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
> > [...]
> >> -	if (eal_parse_sysfs_value(filename, &tmp) == 0 &&
> >> -		tmp < RTE_MAX_NUMA_NODES)
> >> +	if (eal_parse_sysfs_value(filename, &tmp) == 0)
> >>   		dev->device.numa_node = tmp;
> > 
> > Why are you removing the check of the value?
> > Are you going to accept invalid high values?
> > This check was introduced on purpose by this commit:
> > 	http://dpdk.org/commit/8a04cb6125
> 
> tmp is unsigned long type, so -1 is going to be a large number.

Oh yes, I missed it was unsigned!

> My understanding was that it was basically checking for -1 as numa_node.
> 
> If we have valid numa_node greater than RTE_MAX_NUMA_NODES, defaulting 
> to 0 is not a good idea, is it?
> 
> What I try to achieve with the patch is:
> - if no numa_node avilable then parse is going to fail and we set -1.
> - if numa_node is present but wrong, my understanding was that it would 
> be -1.

All your explanations make sense when you realize that it is unsigned.

I have one more question,
Does it work to check for a negative value like this?
	if (dev->device.numa_node < 0)
  
Sergio Gonzalez Monroy July 21, 2017, 3:47 p.m. UTC | #4
On 21/07/2017 16:37, Thomas Monjalon wrote:
> 21/07/2017 18:03, Sergio Gonzalez Monroy:
>> On 21/07/2017 15:53, Thomas Monjalon wrote:
>>> The title and the text below should explain that you move
>>> the warning log from scan to probe, thanks to a temporary
>>> negative value.
>> I thought that saying that I only check for devices managed by dpdk
>> explains the purpose,
>> and the patch itself shows the change from one file to another.
> It is obvious when you look carefully at the code, yes.
> I was just giving my help to better explain :)

Just giving my view of the commit message
If you think it can be improve,by all meansfeel free to change it :)

>>> 21/07/2017 12:11, Sergio Gonzalez Monroy:
>>>> Commit 8a04cb612589 ("pci: set default numa node for broken systems")
>>>> added logic to default to NUMA node 0 when sysfs numa_node information
>>>> was wrong or not available.
>>>>
>>>> Unfortunately there are many devices with wrong NUMA node information
>>>> that DPDK does not care about but still show warnings for them.
>>>>
>>>> Instead, only check for invalid NUMA node information for devices
>>>> managed by the DPDK.
>>>>
>>>> Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
>>> [...]
>>>> -	if (eal_parse_sysfs_value(filename, &tmp) == 0 &&
>>>> -		tmp < RTE_MAX_NUMA_NODES)
>>>> +	if (eal_parse_sysfs_value(filename, &tmp) == 0)
>>>>    		dev->device.numa_node = tmp;
>>> Why are you removing the check of the value?
>>> Are you going to accept invalid high values?
>>> This check was introduced on purpose by this commit:
>>> 	http://dpdk.org/commit/8a04cb6125
>> tmp is unsigned long type, so -1 is going to be a large number.
> Oh yes, I missed it was unsigned!
>
>> My understanding was that it was basically checking for -1 as numa_node.
>>
>> If we have valid numa_node greater than RTE_MAX_NUMA_NODES, defaulting
>> to 0 is not a good idea, is it?
>>
>> What I try to achieve with the patch is:
>> - if no numa_node avilable then parse is going to fail and we set -1.
>> - if numa_node is present but wrong, my understanding was that it would
>> be -1.
> All your explanations make sense when you realize that it is unsigned.
>
> I have one more question,
> Does it work to check for a negative value like this?
> 	if (dev->device.numa_node < 0)

numa_node is signed int type in struct rte_device, so it should work.

Regards,
Sergio
  
Thomas Monjalon July 21, 2017, 4:26 p.m. UTC | #5
21/07/2017 18:47, Sergio Gonzalez Monroy:
> On 21/07/2017 16:37, Thomas Monjalon wrote:
> > 21/07/2017 18:03, Sergio Gonzalez Monroy:
> >> On 21/07/2017 15:53, Thomas Monjalon wrote:
> >>> The title and the text below should explain that you move
> >>> the warning log from scan to probe, thanks to a temporary
> >>> negative value.
> >> I thought that saying that I only check for devices managed by dpdk
> >> explains the purpose,
> >> and the patch itself shows the change from one file to another.
> > It is obvious when you look carefully at the code, yes.
> > I was just giving my help to better explain :)
> 
> Just giving my view of the commit message
> If you think it can be improve,by all meansfeel free to change it :)
> 
> >>> 21/07/2017 12:11, Sergio Gonzalez Monroy:
> >>>> Commit 8a04cb612589 ("pci: set default numa node for broken systems")
> >>>> added logic to default to NUMA node 0 when sysfs numa_node information
> >>>> was wrong or not available.
> >>>>
> >>>> Unfortunately there are many devices with wrong NUMA node information
> >>>> that DPDK does not care about but still show warnings for them.
> >>>>
> >>>> Instead, only check for invalid NUMA node information for devices
> >>>> managed by the DPDK.
> >>>>
> >>>> Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
> >>> [...]
> >>>> -	if (eal_parse_sysfs_value(filename, &tmp) == 0 &&
> >>>> -		tmp < RTE_MAX_NUMA_NODES)
> >>>> +	if (eal_parse_sysfs_value(filename, &tmp) == 0)
> >>>>    		dev->device.numa_node = tmp;
> >>> Why are you removing the check of the value?
> >>> Are you going to accept invalid high values?
> >>> This check was introduced on purpose by this commit:
> >>> 	http://dpdk.org/commit/8a04cb6125
> >> tmp is unsigned long type, so -1 is going to be a large number.
> > Oh yes, I missed it was unsigned!
> >
> >> My understanding was that it was basically checking for -1 as numa_node.
> >>
> >> If we have valid numa_node greater than RTE_MAX_NUMA_NODES, defaulting
> >> to 0 is not a good idea, is it?
> >>
> >> What I try to achieve with the patch is:
> >> - if no numa_node avilable then parse is going to fail and we set -1.
> >> - if numa_node is present but wrong, my understanding was that it would
> >> be -1.
> > All your explanations make sense when you realize that it is unsigned.
> >
> > I have one more question,
> > Does it work to check for a negative value like this?
> > 	if (dev->device.numa_node < 0)
> 
> numa_node is signed int type in struct rte_device, so it should work.

OK, sorry I overlooked :)
  
Thomas Monjalon July 21, 2017, 4:29 p.m. UTC | #6
21/07/2017 12:11, Sergio Gonzalez Monroy:
> Commit 8a04cb612589 ("pci: set default numa node for broken systems")
> added logic to default to NUMA node 0 when sysfs numa_node information
> was wrong or not available.
> 
> Unfortunately there are many devices with wrong NUMA node information
> that DPDK does not care about but still show warnings for them.
> 
> Instead, only check for invalid NUMA node information for devices
> managed by the DPDK.
> 
> Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>

Applied with title below, thanks
	pci: move NUMA node check from scan to probe
  

Patch

diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index eaa041e..52fd38c 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -226,6 +226,11 @@  rte_pci_probe_one_driver(struct rte_pci_driver *dr,
 		return 1;
 	}
 
+	if (dev->device.numa_node < 0) {
+		RTE_LOG(WARNING, EAL, "  Invalid NUMA socket, default to 0\n");
+		dev->device.numa_node = 0;
+	}
+
 	RTE_LOG(INFO, EAL, "  probe driver: %x:%x %s\n", dev->id.vendor_id,
 		dev->id.device_id, dr->driver.name);
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 556ae2c..2041d5f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -314,15 +314,10 @@  pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/numa_node",
 		 dirname);
 
-	if (eal_parse_sysfs_value(filename, &tmp) == 0 &&
-		tmp < RTE_MAX_NUMA_NODES)
+	if (eal_parse_sysfs_value(filename, &tmp) == 0)
 		dev->device.numa_node = tmp;
-	else {
-		RTE_LOG(WARNING, EAL,
-			"numa_node is invalid or not present. "
-			"Set it 0 as default\n");
-		dev->device.numa_node = 0;
-	}
+	else
+		dev->device.numa_node = -1;
 
 	pci_name_set(dev);