[dpdk-dev,18.05,v4] eal: add function to return number of detected sockets

Message ID eb142226e1e62abe5a95bb135468732131580b71.1517925320.git.anatoly.burakov@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Anatoly Burakov Feb. 7, 2018, 9:58 a.m. UTC
  During lcore scan, find maximum socket ID and store it. This will
break the ABI, so bump ABI version.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v4:
    - Remove backwards ABI compatibility, bump ABI instead
    
    v3:
    - Added ABI compatibility
    
    v2:
    - checkpatch changes
    - check socket before deciding if the core is not to be used

 lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
 lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
 lib/librte_eal/common/include/rte_eal.h   |  1 +
 lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
 lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
 lib/librte_eal/rte_eal_version.map        |  9 +++++++-
 6 files changed, 44 insertions(+), 15 deletions(-)
  

Comments

Bruce Richardson March 8, 2018, 12:12 p.m. UTC | #1
On Wed, Feb 07, 2018 at 09:58:36AM +0000, Anatoly Burakov wrote:
> During lcore scan, find maximum socket ID and store it. This will
> break the ABI, so bump ABI version.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> 
> Notes:
>     v4:
>     - Remove backwards ABI compatibility, bump ABI instead
>     
>     v3:
>     - Added ABI compatibility
>     
>     v2:
>     - checkpatch changes
>     - check socket before deciding if the core is not to be used
> 
>  lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
>  lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
>  lib/librte_eal/common/include/rte_eal.h   |  1 +
>  lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>  lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
>  lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>  6 files changed, 44 insertions(+), 15 deletions(-)
> 
Breaking the ABI is the best way to implement this change, and given the
deprecation was previously announced I'm ok with that.

Question: we are ok assuming that the socket numbers are sequential, or
nearly so, and knowing the maximum socket number seen is a good
approximation of the actual physical sockets? I know in terms of cores
on a system, the core id's often jump - are there systems where the
socket numbers do too?

/Bruce
  
Anatoly Burakov March 8, 2018, 2:38 p.m. UTC | #2
On 08-Mar-18 12:12 PM, Bruce Richardson wrote:
> On Wed, Feb 07, 2018 at 09:58:36AM +0000, Anatoly Burakov wrote:
>> During lcore scan, find maximum socket ID and store it. This will
>> break the ABI, so bump ABI version.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>
>> Notes:
>>      v4:
>>      - Remove backwards ABI compatibility, bump ABI instead
>>      
>>      v3:
>>      - Added ABI compatibility
>>      
>>      v2:
>>      - checkpatch changes
>>      - check socket before deciding if the core is not to be used
>>
>>   lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
>>   lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
>>   lib/librte_eal/common/include/rte_eal.h   |  1 +
>>   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>>   lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
>>   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>>   6 files changed, 44 insertions(+), 15 deletions(-)
>>
> Breaking the ABI is the best way to implement this change, and given the
> deprecation was previously announced I'm ok with that.
> 
> Question: we are ok assuming that the socket numbers are sequential, or
> nearly so, and knowing the maximum socket number seen is a good
> approximation of the actual physical sockets? I know in terms of cores
> on a system, the core id's often jump - are there systems where the
> socket numbers do too?
> 
> /Bruce
> 

I am not aware of any system that would jump sockets like that. I'm open 
to corrections, however :)
  
Bruce Richardson March 9, 2018, 4:32 p.m. UTC | #3
On Thu, Mar 08, 2018 at 02:38:37PM +0000, Burakov, Anatoly wrote:
> On 08-Mar-18 12:12 PM, Bruce Richardson wrote:
> > On Wed, Feb 07, 2018 at 09:58:36AM +0000, Anatoly Burakov wrote:
> > > During lcore scan, find maximum socket ID and store it. This will
> > > break the ABI, so bump ABI version.
> > > 
> > > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > > ---
> > > 
> > > Notes:
> > >      v4:
> > >      - Remove backwards ABI compatibility, bump ABI instead
> > >      v3:
> > >      - Added ABI compatibility
> > >      v2:
> > >      - checkpatch changes
> > >      - check socket before deciding if the core is not to be used
> > > 
> > >   lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
> > >   lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
> > >   lib/librte_eal/common/include/rte_eal.h   |  1 +
> > >   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
> > >   lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
> > >   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
> > >   6 files changed, 44 insertions(+), 15 deletions(-)
> > > 
> > Breaking the ABI is the best way to implement this change, and given the
> > deprecation was previously announced I'm ok with that.
> > 
> > Question: we are ok assuming that the socket numbers are sequential, or
> > nearly so, and knowing the maximum socket number seen is a good
> > approximation of the actual physical sockets? I know in terms of cores
> > on a system, the core id's often jump - are there systems where the
> > socket numbers do too?
> > 
> > /Bruce
> > 
> 
> I am not aware of any system that would jump sockets like that. I'm open to
> corrections, however :)
> 
> -- 
In the absense of any corrections, I think this is fine to have.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>
  
Thomas Monjalon March 20, 2018, 10:43 p.m. UTC | #4
08/03/2018 15:38, Burakov, Anatoly:
> On 08-Mar-18 12:12 PM, Bruce Richardson wrote:
> > Question: we are ok assuming that the socket numbers are sequential, or
> > nearly so, and knowing the maximum socket number seen is a good
> > approximation of the actual physical sockets? I know in terms of cores
> > on a system, the core id's often jump - are there systems where the
> > socket numbers do too?
> 
> I am not aware of any system that would jump sockets like that. I'm open 
> to corrections, however :)

I think some IBM CPUs had this kind of jump in socket numbering.
Chao?
  
Gowrishankar March 21, 2018, 4:59 a.m. UTC | #5
On Wednesday 07 February 2018 03:28 PM, Anatoly Burakov wrote:
> During lcore scan, find maximum socket ID and store it. This will
> break the ABI, so bump ABI version.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>
> Notes:
>      v4:
>      - Remove backwards ABI compatibility, bump ABI instead
>      
>      v3:
>      - Added ABI compatibility
>      
>      v2:
>      - checkpatch changes
>      - check socket before deciding if the core is not to be used
>
>   lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
>   lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
>   lib/librte_eal/common/include/rte_eal.h   |  1 +
>   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>   lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
>   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>   6 files changed, 44 insertions(+), 15 deletions(-)
>
> diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
> index dd455e6..ed1d17b 100644
> --- a/lib/librte_eal/bsdapp/eal/Makefile
> +++ b/lib/librte_eal/bsdapp/eal/Makefile
> @@ -21,7 +21,7 @@ LDLIBS += -lgcc_s
>
>   EXPORT_MAP := ../../rte_eal_version.map
>
> -LIBABIVER := 6
> +LIBABIVER := 7
>
>   # specific to bsdapp exec-env
>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
> diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c
> index 7724fa4..827ddeb 100644
> --- a/lib/librte_eal/common/eal_common_lcore.c
> +++ b/lib/librte_eal/common/eal_common_lcore.c
> @@ -28,6 +28,7 @@ rte_eal_cpu_init(void)
>   	struct rte_config *config = rte_eal_get_configuration();
>   	unsigned lcore_id;
>   	unsigned count = 0;
> +	unsigned int socket_id, max_socket_id = 0;
>
>   	/*
>   	 * Parse the maximum set of logical cores, detect the subset of running
> @@ -39,6 +40,19 @@ rte_eal_cpu_init(void)
>   		/* init cpuset for per lcore config */
>   		CPU_ZERO(&lcore_config[lcore_id].cpuset);
>
> +		/* find socket first */
> +		socket_id = eal_cpu_socket_id(lcore_id);
> +		if (socket_id >= RTE_MAX_NUMA_NODES) {
> +#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
> +			socket_id = 0;
> +#else
> +			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than RTE_MAX_NUMA_NODES (%d)\n",
> +					socket_id, RTE_MAX_NUMA_NODES);
> +			return -1;
> +#endif
> +		}
> +		max_socket_id = RTE_MAX(max_socket_id, socket_id);
> +
>   		/* in 1:1 mapping, record related cpu detected state */
>   		lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
>   		if (lcore_config[lcore_id].detected == 0) {
> @@ -54,18 +68,7 @@ rte_eal_cpu_init(void)
>   		config->lcore_role[lcore_id] = ROLE_RTE;
>   		lcore_config[lcore_id].core_role = ROLE_RTE;
>   		lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
> -		lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
> -		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
> -#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
> -			lcore_config[lcore_id].socket_id = 0;
> -#else
> -			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
> -				"RTE_MAX_NUMA_NODES (%d)\n",
> -				lcore_config[lcore_id].socket_id,
> -				RTE_MAX_NUMA_NODES);
> -			return -1;
> -#endif
> -		}
> +		lcore_config[lcore_id].socket_id = socket_id;
>   		RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
>   				"core %u on socket %u\n",
>   				lcore_id, lcore_config[lcore_id].core_id,
> @@ -79,5 +82,15 @@ rte_eal_cpu_init(void)
>   		RTE_MAX_LCORE);
>   	RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
>
> +	config->numa_node_count = max_socket_id + 1;

In some IBM servers, socket ID number does not seem to be in sequence. 
For an instance, 0 and 8 for a 2 node server.

In this case, numa_node_count would mislead users if wrongly understood 
by its variable name IMO (see below)
> +	RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", config->numa_node_count);

For an instance, reading above message would tell 'EAL detected 8 nodes' 
in my server, but actually there are only two nodes.

Could its name better be 'numa_node_id_max' ?. Also, we store in actual 
count of numa nodes in _count variable.

Also, there could be a case when there is no local memory available to a 
numa node too.

Thanks,
Gowrishankar
> +
>   	return 0;
>   }
> +
> +unsigned int
> +rte_num_sockets(void)
> +{
> +	const struct rte_config *config = rte_eal_get_configuration();
> +	return config->numa_node_count;
> +}
> diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> index 08c6637..63fcc2e 100644
> --- a/lib/librte_eal/common/include/rte_eal.h
> +++ b/lib/librte_eal/common/include/rte_eal.h
> @@ -57,6 +57,7 @@ enum rte_proc_type_t {
>   struct rte_config {
>   	uint32_t master_lcore;       /**< Id of the master lcore */
>   	uint32_t lcore_count;        /**< Number of available logical cores. */
> +	uint32_t numa_node_count;    /**< Number of detected NUMA nodes. */
>   	uint32_t service_lcore_count;/**< Number of available service cores. */
>   	enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
>
> diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
> index d84bcff..ddf4c64 100644
> --- a/lib/librte_eal/common/include/rte_lcore.h
> +++ b/lib/librte_eal/common/include/rte_lcore.h
> @@ -120,6 +120,14 @@ rte_lcore_index(int lcore_id)
>   unsigned rte_socket_id(void);
>
>   /**
> + * Return number of physical sockets on the system.
> + * @return
> + *   the number of physical sockets as recognized by EAL
> + *
> + */
> +unsigned int rte_num_sockets(void);
> +
> +/**
>    * Get the ID of the physical socket of the specified lcore
>    *
>    * @param lcore_id
> diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
> index 7e5bbe8..b9c7727 100644
> --- a/lib/librte_eal/linuxapp/eal/Makefile
> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> @@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
>   EXPORT_MAP := ../../rte_eal_version.map
>   VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
>
> -LIBABIVER := 6
> +LIBABIVER := 7
>
>   VPATH += $(RTE_SDK)/lib/librte_eal/common
>
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index 4146907..fc83e74 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -211,6 +211,13 @@ DPDK_18.02 {
>
>   }  DPDK_17.11;
>
> +DPDK_18.05 {
> +	global:
> +
> +	rte_num_sockets;
> +
> +} DPDK_18.02;
> +
>   EXPERIMENTAL {
>   	global:
>
> @@ -255,4 +262,4 @@ EXPERIMENTAL {
>   	rte_service_set_stats_enable;
>   	rte_service_start_with_defaults;
>
> -} DPDK_18.02;
> +} DPDK_18.05;
  
Anatoly Burakov March 21, 2018, 10:24 a.m. UTC | #6
On 21-Mar-18 4:59 AM, gowrishankar muthukrishnan wrote:
> On Wednesday 07 February 2018 03:28 PM, Anatoly Burakov wrote:
>> During lcore scan, find maximum socket ID and store it. This will
>> break the ABI, so bump ABI version.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>
>> Notes:
>>      v4:
>>      - Remove backwards ABI compatibility, bump ABI instead
>>      v3:
>>      - Added ABI compatibility
>>      v2:
>>      - checkpatch changes
>>      - check socket before deciding if the core is not to be used
>>
>>   lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
>>   lib/librte_eal/common/eal_common_lcore.c  | 37 
>> +++++++++++++++++++++----------
>>   lib/librte_eal/common/include/rte_eal.h   |  1 +
>>   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>>   lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
>>   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>>   6 files changed, 44 insertions(+), 15 deletions(-)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
>> b/lib/librte_eal/bsdapp/eal/Makefile
>> index dd455e6..ed1d17b 100644
>> --- a/lib/librte_eal/bsdapp/eal/Makefile
>> +++ b/lib/librte_eal/bsdapp/eal/Makefile
>> @@ -21,7 +21,7 @@ LDLIBS += -lgcc_s
>>
>>   EXPORT_MAP := ../../rte_eal_version.map
>>
>> -LIBABIVER := 6
>> +LIBABIVER := 7
>>
>>   # specific to bsdapp exec-env
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
>> diff --git a/lib/librte_eal/common/eal_common_lcore.c 
>> b/lib/librte_eal/common/eal_common_lcore.c
>> index 7724fa4..827ddeb 100644
>> --- a/lib/librte_eal/common/eal_common_lcore.c
>> +++ b/lib/librte_eal/common/eal_common_lcore.c
>> @@ -28,6 +28,7 @@ rte_eal_cpu_init(void)
>>       struct rte_config *config = rte_eal_get_configuration();
>>       unsigned lcore_id;
>>       unsigned count = 0;
>> +    unsigned int socket_id, max_socket_id = 0;
>>
>>       /*
>>        * Parse the maximum set of logical cores, detect the subset of 
>> running
>> @@ -39,6 +40,19 @@ rte_eal_cpu_init(void)
>>           /* init cpuset for per lcore config */
>>           CPU_ZERO(&lcore_config[lcore_id].cpuset);
>>
>> +        /* find socket first */
>> +        socket_id = eal_cpu_socket_id(lcore_id);
>> +        if (socket_id >= RTE_MAX_NUMA_NODES) {
>> +#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
>> +            socket_id = 0;
>> +#else
>> +            RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than 
>> RTE_MAX_NUMA_NODES (%d)\n",
>> +                    socket_id, RTE_MAX_NUMA_NODES);
>> +            return -1;
>> +#endif
>> +        }
>> +        max_socket_id = RTE_MAX(max_socket_id, socket_id);
>> +
>>           /* in 1:1 mapping, record related cpu detected state */
>>           lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
>>           if (lcore_config[lcore_id].detected == 0) {
>> @@ -54,18 +68,7 @@ rte_eal_cpu_init(void)
>>           config->lcore_role[lcore_id] = ROLE_RTE;
>>           lcore_config[lcore_id].core_role = ROLE_RTE;
>>           lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
>> -        lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
>> -        if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
>> -#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
>> -            lcore_config[lcore_id].socket_id = 0;
>> -#else
>> -            RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
>> -                "RTE_MAX_NUMA_NODES (%d)\n",
>> -                lcore_config[lcore_id].socket_id,
>> -                RTE_MAX_NUMA_NODES);
>> -            return -1;
>> -#endif
>> -        }
>> +        lcore_config[lcore_id].socket_id = socket_id;
>>           RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
>>                   "core %u on socket %u\n",
>>                   lcore_id, lcore_config[lcore_id].core_id,
>> @@ -79,5 +82,15 @@ rte_eal_cpu_init(void)
>>           RTE_MAX_LCORE);
>>       RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
>>
>> +    config->numa_node_count = max_socket_id + 1;
> 
> In some IBM servers, socket ID number does not seem to be in sequence. 
> For an instance, 0 and 8 for a 2 node server.
> 
> In this case, numa_node_count would mislead users if wrongly understood 
> by its variable name IMO (see below)
>> +    RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", 
>> config->numa_node_count);
> 
> For an instance, reading above message would tell 'EAL detected 8 nodes' 
> in my server, but actually there are only two nodes.
> 
> Could its name better be 'numa_node_id_max' ?. Also, we store in actual 
> count of numa nodes in _count variable.
> 
> Also, there could be a case when there is no local memory available to a 
> numa node too.
> 
> Thanks,
> Gowrishankar

The point of this patchset is to (pre)allocate memory only on existing 
sockets.

If we don't know how many sockets there are, we are forced to 
preallocate VA space per each *possible* NUMA node - that is, reserve 
e.g. 8x128G of memory, 6 of which will go unused on a 2-socket system. 
We can't know if there is no memory on socket in advance, but we can at 
least avoid preallocating VA space for sockets that don't exist in the 
first place.

How about we store all possible socket id's instead? e.g. something like:

static int numa_node_ids[MAX_NUMA_NODES];
<...>
int rte_eal_cpu_init() {
	int sockets[RTE_MAX_LCORE];
	<...>
	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
		core_to_socket[lcore_id] = socket;
	}
	<...>
	qsort(sockets);
	<...>
	// store all unique sockets in numa_node_ids in ascending order
}
<...>

on a 2 socket system we then get:

rte_num_sockets() => return 2
rte_get_socket_id(int idx) => return numa_node_ids[idx]

Would that be suitable?
  
Gowrishankar March 22, 2018, 5:16 a.m. UTC | #7
On Wednesday 21 March 2018 03:54 PM, Burakov, Anatoly wrote:
>
>>> +    config->numa_node_count = max_socket_id + 1;
>>
>> In some IBM servers, socket ID number does not seem to be in 
>> sequence. For an instance, 0 and 8 for a 2 node server.
>>
>> In this case, numa_node_count would mislead users if wrongly 
>> understood by its variable name IMO (see below)
>>> +    RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", 
>>> config->numa_node_count);
>>
>> For an instance, reading above message would tell 'EAL detected 8 
>> nodes' in my server, but actually there are only two nodes.
>>
>> Could its name better be 'numa_node_id_max' ?. Also, we store in 
>> actual count of numa nodes in _count variable.
>>
>> Also, there could be a case when there is no local memory available 
>> to a numa node too.
>>
>> Thanks,
>> Gowrishankar
>
> The point of this patchset is to (pre)allocate memory only on existing 
> sockets.
>
> If we don't know how many sockets there are, we are forced to 
> preallocate VA space per each *possible* NUMA node - that is, reserve 
> e.g. 8x128G of memory, 6 of which will go unused on a 2-socket system. 
> We can't know if there is no memory on socket in advance, but we can 
> at least avoid preallocating VA space for sockets that don't exist in 
> the first place.
>

Sounds good Anatoly.
May be, sysfs/ might help to confirm if a numa node has local memory ?.

Anyway, for the context of this particular patch (return numa nodes), 
below approach you mentioned is good.

> How about we store all possible socket id's instead? e.g. something like:
>
> static int numa_node_ids[MAX_NUMA_NODES];
> <...>
> int rte_eal_cpu_init() {
>     int sockets[RTE_MAX_LCORE];
>     <...>
>     for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
>         core_to_socket[lcore_id] = socket;
sockets[lcore_id] = eal_cpu_socket_id(lcore_id);
>     }
>     <...>
>     qsort(sockets);
>     <...>
>     // store all unique sockets in numa_node_ids in ascending order

Just thinking that, is there a purpose of retaining a numa ID which does 
not have local memory attached ?
but sockets[] is suppose to reflect all available nodes though (and 
assuming, its calling place to ensure
for the existence of numa local memory).


> }
> <...>
>
> on a 2 socket system we then get:
>
> rte_num_sockets() => return 2
> rte_get_socket_id(int idx) => return numa_node_ids[idx]
rte_get_socket_mem(idx) might help to validate for local memory existence ?

>
> Would that be suitable?
>

Thanks,
Gowrishankar
  
Anatoly Burakov March 22, 2018, 9:04 a.m. UTC | #8
On 22-Mar-18 5:16 AM, gowrishankar muthukrishnan wrote:
> On Wednesday 21 March 2018 03:54 PM, Burakov, Anatoly wrote:
>>
>>>> +    config->numa_node_count = max_socket_id + 1;
>>>
>>> In some IBM servers, socket ID number does not seem to be in 
>>> sequence. For an instance, 0 and 8 for a 2 node server.
>>>
>>> In this case, numa_node_count would mislead users if wrongly 
>>> understood by its variable name IMO (see below)
>>>> +    RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", 
>>>> config->numa_node_count);
>>>
>>> For an instance, reading above message would tell 'EAL detected 8 
>>> nodes' in my server, but actually there are only two nodes.
>>>
>>> Could its name better be 'numa_node_id_max' ?. Also, we store in 
>>> actual count of numa nodes in _count variable.
>>>
>>> Also, there could be a case when there is no local memory available 
>>> to a numa node too.
>>>
>>> Thanks,
>>> Gowrishankar
>>
>> The point of this patchset is to (pre)allocate memory only on existing 
>> sockets.
>>
>> If we don't know how many sockets there are, we are forced to 
>> preallocate VA space per each *possible* NUMA node - that is, reserve 
>> e.g. 8x128G of memory, 6 of which will go unused on a 2-socket system. 
>> We can't know if there is no memory on socket in advance, but we can 
>> at least avoid preallocating VA space for sockets that don't exist in 
>> the first place.
>>
> 
> Sounds good Anatoly.
> May be, sysfs/ might help to confirm if a numa node has local memory ?.

We can't go to sysfs every time we want to allocate memory, and we can't 
really depend on what sysfs tells us about availability of hugepages on 
a particular socket (assuming that's what you meant by "confirm if a 
numa node has local memory"). User may modify hugepage numbers for each 
socket at runtime, and suddenly we do (or don't) have memory on local 
socket.

Therefore i think a better approach would be - if a socket exists (that 
is, if we can find lcores on that socket, even if they're not active), 
assume it has/had/will have memory, and store it as a valid socket id.

I'll respin a v5 with changes outlined below then. Thanks!

> 
> Anyway, for the context of this particular patch (return numa nodes), 
> below approach you mentioned is good.
> 
>> How about we store all possible socket id's instead? e.g. something like:
>>
>> static int numa_node_ids[MAX_NUMA_NODES];
>> <...>
>> int rte_eal_cpu_init() {
>>     int sockets[RTE_MAX_LCORE];
>>     <...>
>>     for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
>>         core_to_socket[lcore_id] = socket;
> sockets[lcore_id] = eal_cpu_socket_id(lcore_id);
>>     }
>>     <...>
>>     qsort(sockets);
>>     <...>
>>     // store all unique sockets in numa_node_ids in ascending order
> 
> Just thinking that, is there a purpose of retaining a numa ID which does 
> not have local memory attached ?
> but sockets[] is suppose to reflect all available nodes though (and 
> assuming, its calling place to ensure
> for the existence of numa local memory).
> 
> 
>> }
>> <...>
>>
>> on a 2 socket system we then get:
>>
>> rte_num_sockets() => return 2
>> rte_get_socket_id(int idx) => return numa_node_ids[idx]
> rte_get_socket_mem(idx) might help to validate for local memory existence ?
> 
>>
>> Would that be suitable?
>>
> 
> Thanks,
> Gowrishankar
> 
>
  

Patch

diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index dd455e6..ed1d17b 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -21,7 +21,7 @@  LDLIBS += -lgcc_s
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 6
+LIBABIVER := 7
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c
index 7724fa4..827ddeb 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -28,6 +28,7 @@  rte_eal_cpu_init(void)
 	struct rte_config *config = rte_eal_get_configuration();
 	unsigned lcore_id;
 	unsigned count = 0;
+	unsigned int socket_id, max_socket_id = 0;
 
 	/*
 	 * Parse the maximum set of logical cores, detect the subset of running
@@ -39,6 +40,19 @@  rte_eal_cpu_init(void)
 		/* init cpuset for per lcore config */
 		CPU_ZERO(&lcore_config[lcore_id].cpuset);
 
+		/* find socket first */
+		socket_id = eal_cpu_socket_id(lcore_id);
+		if (socket_id >= RTE_MAX_NUMA_NODES) {
+#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
+			socket_id = 0;
+#else
+			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than RTE_MAX_NUMA_NODES (%d)\n",
+					socket_id, RTE_MAX_NUMA_NODES);
+			return -1;
+#endif
+		}
+		max_socket_id = RTE_MAX(max_socket_id, socket_id);
+
 		/* in 1:1 mapping, record related cpu detected state */
 		lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
 		if (lcore_config[lcore_id].detected == 0) {
@@ -54,18 +68,7 @@  rte_eal_cpu_init(void)
 		config->lcore_role[lcore_id] = ROLE_RTE;
 		lcore_config[lcore_id].core_role = ROLE_RTE;
 		lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
-		lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
-		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
-#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
-			lcore_config[lcore_id].socket_id = 0;
-#else
-			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
-				"RTE_MAX_NUMA_NODES (%d)\n",
-				lcore_config[lcore_id].socket_id,
-				RTE_MAX_NUMA_NODES);
-			return -1;
-#endif
-		}
+		lcore_config[lcore_id].socket_id = socket_id;
 		RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
 				"core %u on socket %u\n",
 				lcore_id, lcore_config[lcore_id].core_id,
@@ -79,5 +82,15 @@  rte_eal_cpu_init(void)
 		RTE_MAX_LCORE);
 	RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
 
+	config->numa_node_count = max_socket_id + 1;
+	RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", config->numa_node_count);
+
 	return 0;
 }
+
+unsigned int
+rte_num_sockets(void)
+{
+	const struct rte_config *config = rte_eal_get_configuration();
+	return config->numa_node_count;
+}
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 08c6637..63fcc2e 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -57,6 +57,7 @@  enum rte_proc_type_t {
 struct rte_config {
 	uint32_t master_lcore;       /**< Id of the master lcore */
 	uint32_t lcore_count;        /**< Number of available logical cores. */
+	uint32_t numa_node_count;    /**< Number of detected NUMA nodes. */
 	uint32_t service_lcore_count;/**< Number of available service cores. */
 	enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
 
diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
index d84bcff..ddf4c64 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -120,6 +120,14 @@  rte_lcore_index(int lcore_id)
 unsigned rte_socket_id(void);
 
 /**
+ * Return number of physical sockets on the system.
+ * @return
+ *   the number of physical sockets as recognized by EAL
+ *
+ */
+unsigned int rte_num_sockets(void);
+
+/**
  * Get the ID of the physical socket of the specified lcore
  *
  * @param lcore_id
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7e5bbe8..b9c7727 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@  ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 6
+LIBABIVER := 7
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 4146907..fc83e74 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -211,6 +211,13 @@  DPDK_18.02 {
 
 }  DPDK_17.11;
 
+DPDK_18.05 {
+	global:
+
+	rte_num_sockets;
+
+} DPDK_18.02;
+
 EXPERIMENTAL {
 	global:
 
@@ -255,4 +262,4 @@  EXPERIMENTAL {
 	rte_service_set_stats_enable;
 	rte_service_start_with_defaults;
 
-} DPDK_18.02;
+} DPDK_18.05;