[dpdk-stable] patch 'mem: improve segment list preallocation' has been queued to stable release 18.08.1

Burakov, Anatoly anatoly.burakov at intel.com
Mon Nov 26 12:16:11 CET 2018


Hi Kevin,

FYI

http://patches.dpdk.org/patch/48338/

Thanks,
Anatoly


> -----Original Message-----
> From: Kevin Traynor [mailto:ktraynor at redhat.com]
> Sent: Thursday, November 22, 2018 4:49 PM
> To: Burakov, Anatoly <anatoly.burakov at intel.com>
> Cc: dpdk stable <stable at dpdk.org>
> Subject: patch 'mem: improve segment list preallocation' has been queued
> to stable release 18.08.1
> 
> Hi,
> 
> FYI, your patch has been queued to stable release 18.08.1
> 
> Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet.
> It will be pushed if I get no objections before 11/28/18. So please shout if
> anyone has objections.
> 
> Also note that after the patch there's a diff of the upstream commit vs the
> patch applied to the branch. If the code is different (ie: not only metadata
> diffs), due for example to a change in context or macro names, please
> double check it.
> 
> Thanks.
> 
> Kevin Traynor
> 
> ---
> From 6d552c83eacba7be4b4f2efbafd58724e07e2330 Mon Sep 17 00:00:00
> 2001
> From: Anatoly Burakov <anatoly.burakov at intel.com>
> Date: Fri, 5 Oct 2018 09:29:44 +0100
> Subject: [PATCH] mem: improve segment list preallocation
> 
> [ upstream commit 1dd342d0fdc4f72102f0b48c89b6a39f029004fe ]
> 
> Current code to preallocate segment lists is trying to do everything in one go,
> and thus ends up being convoluted, hard to understand, and, most
> importantly, does not scale beyond initial assumptions about number of
> NUMA nodes and number of page sizes, and therefore has issues on some
> configurations.
> 
> Instead of fixing these issues in the existing code, simply rewrite it to be
> slightly less clever but much more logical, and provide ample comments to
> explain exactly what is going on.
> 
> We cannot use the same approach for 32-bit code because the limitations of
> the target dictate current socket-centric approach rather than type-centric
> approach we use on 64-bit target, so 32-bit code is left unmodified. FreeBSD
> doesn't support NUMA so there's no complexity involved there, and thus its
> code is much more readable and not worth changing.
> 
> Fixes: 1d406458db47 ("mem: make segment preallocation OS-specific")
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
> ---
>  lib/librte_eal/linuxapp/eal/eal_memory.c | 179 +++++++++++++++++------
>  1 file changed, 137 insertions(+), 42 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 6131bfde2..cc2d3fb69 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -2096,7 +2096,13 @@ memseg_primary_init(void)  {
>  	struct rte_mem_config *mcfg = rte_eal_get_configuration()-
> >mem_config;
> -	int i, socket_id, hpi_idx, msl_idx = 0;
> +	struct memtype {
> +		uint64_t page_sz;
> +		int socket_id;
> +	} *memtypes = NULL;
> +	int i, hpi_idx, msl_idx;
>  	struct rte_memseg_list *msl;
> -	uint64_t max_mem, total_mem;
> +	uint64_t max_mem, max_mem_per_type;
> +	unsigned int max_seglists_per_type;
> +	unsigned int n_memtypes, cur_type;
> 
>  	/* no-huge does not need this at all */ @@ -2104,8 +2110,49 @@
> memseg_primary_init(void)
>  		return 0;
> 
> -	max_mem = (uint64_t)RTE_MAX_MEM_MB << 20;
> -	total_mem = 0;
> +	/*
> +	 * figuring out amount of memory we're going to have is a long and
> very
> +	 * involved process. the basic element we're operating with is a
> memory
> +	 * type, defined as a combination of NUMA node ID and page size (so
> that
> +	 * e.g. 2 sockets with 2 page sizes yield 4 memory types in total).
> +	 *
> +	 * deciding amount of memory going towards each memory type is a
> +	 * balancing act between maximum segments per type, maximum
> memory per
> +	 * type, and number of detected NUMA nodes. the goal is to make
> sure
> +	 * each memory type gets at least one memseg list.
> +	 *
> +	 * the total amount of memory is limited by RTE_MAX_MEM_MB
> value.
> +	 *
> +	 * the total amount of memory per type is limited by either
> +	 * RTE_MAX_MEM_MB_PER_TYPE, or by RTE_MAX_MEM_MB
> divided by the number
> +	 * of detected NUMA nodes. additionally, maximum number of
> segments per
> +	 * type is also limited by RTE_MAX_MEMSEG_PER_TYPE. this is
> because for
> +	 * smaller page sizes, it can take hundreds of thousands of segments
> to
> +	 * reach the above specified per-type memory limits.
> +	 *
> +	 * additionally, each type may have multiple memseg lists associated
> +	 * with it, each limited by either RTE_MAX_MEM_MB_PER_LIST for
> bigger
> +	 * page sizes, or RTE_MAX_MEMSEG_PER_LIST segments for smaller
> ones.
> +	 *
> +	 * the number of memseg lists per type is decided based on the
> above
> +	 * limits, and also taking number of detected NUMA nodes, to make
> sure
> +	 * that we don't run out of memseg lists before we populate all
> NUMA
> +	 * nodes with memory.
> +	 *
> +	 * we do this in three stages. first, we collect the number of types.
> +	 * then, we figure out memory constraints and populate the list of
> +	 * would-be memseg lists. then, we go ahead and allocate the
> memseg
> +	 * lists.
> +	 */
> 
> -	/* create memseg lists */
> +	/* create space for mem types */
> +	n_memtypes = internal_config.num_hugepage_sizes *
> rte_socket_count();
> +	memtypes = calloc(n_memtypes, sizeof(*memtypes));
> +	if (memtypes == NULL) {
> +		RTE_LOG(ERR, EAL, "Cannot allocate space for memory
> types\n");
> +		return -1;
> +	}
> +
> +	/* populate mem types */
> +	cur_type = 0;
>  	for (hpi_idx = 0; hpi_idx < (int) internal_config.num_hugepage_sizes;
>  			hpi_idx++) {
> @@ -2116,9 +2163,6 @@ memseg_primary_init(void)
>  		hugepage_sz = hpi->hugepage_sz;
> 
> -		for (i = 0; i < (int) rte_socket_count(); i++) {
> -			uint64_t max_type_mem, total_type_mem = 0;
> -			int type_msl_idx, max_segs, total_segs = 0;
> -
> -			socket_id = rte_socket_id_by_idx(i);
> +		for (i = 0; i < (int) rte_socket_count(); i++, cur_type++) {
> +			int socket_id = rte_socket_id_by_idx(i);
> 
>  #ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES
> @@ -2126,47 +2170,98 @@ memseg_primary_init(void)
>  				break;
>  #endif
> +			memtypes[cur_type].page_sz = hugepage_sz;
> +			memtypes[cur_type].socket_id = socket_id;
> 
> -			if (total_mem >= max_mem)
> -				break;
> +			RTE_LOG(DEBUG, EAL, "Detected memory type: "
> +				"socket_id:%u hugepage_sz:%" PRIu64 "\n",
> +				socket_id, hugepage_sz);
> +		}
> +	}
> 
> -			max_type_mem = RTE_MIN(max_mem -
> total_mem,
> -				(uint64_t)RTE_MAX_MEM_MB_PER_TYPE <<
> 20);
> -			max_segs = RTE_MAX_MEMSEG_PER_TYPE;
> +	/* set up limits for types */
> +	max_mem = (uint64_t)RTE_MAX_MEM_MB << 20;
> +	max_mem_per_type =
> RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20,
> +			max_mem / n_memtypes);
> +	/*
> +	 * limit maximum number of segment lists per type to ensure there's
> +	 * space for memseg lists for all NUMA nodes with all page sizes
> +	 */
> +	max_seglists_per_type = RTE_MAX_MEMSEG_LISTS / n_memtypes;
> 
> -			type_msl_idx = 0;
> -			while (total_type_mem < max_type_mem &&
> -					total_segs < max_segs) {
> -				uint64_t cur_max_mem, cur_mem;
> -				unsigned int n_segs;
> +	if (max_seglists_per_type == 0) {
> +		RTE_LOG(ERR, EAL, "Cannot accommodate all memory types,
> please increase %s\n",
> +			RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS));
> +		return -1;
> +	}
> 
> -				if (msl_idx >= RTE_MAX_MEMSEG_LISTS) {
> -					RTE_LOG(ERR, EAL,
> -						"No more space in memseg
> lists, please increase %s\n",
> -
> 	RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS));
> -					return -1;
> -				}
> +	/* go through all mem types and create segment lists */
> +	msl_idx = 0;
> +	for (cur_type = 0; cur_type < n_memtypes; cur_type++) {
> +		unsigned int cur_seglist, n_seglists, n_segs;
> +		unsigned int max_segs_per_type, max_segs_per_list;
> +		struct memtype *type = &memtypes[cur_type];
> +		uint64_t max_mem_per_list, pagesz;
> +		int socket_id;
> 
> -				msl = &mcfg->memsegs[msl_idx++];
> +		pagesz = type->page_sz;
> +		socket_id = type->socket_id;
> 
> -				cur_max_mem = max_type_mem -
> total_type_mem;
> +		/*
> +		 * we need to create segment lists for this type. we must
> take
> +		 * into account the following things:
> +		 *
> +		 * 1. total amount of memory we can use for this memory
> type
> +		 * 2. total amount of memory per memseg list allowed
> +		 * 3. number of segments needed to fit the amount of
> memory
> +		 * 4. number of segments allowed per type
> +		 * 5. number of segments allowed per memseg list
> +		 * 6. number of memseg lists we are allowed to take up
> +		 */
> 
> -				cur_mem =
> get_mem_amount(hugepage_sz,
> -						cur_max_mem);
> -				n_segs = cur_mem / hugepage_sz;
> +		/* calculate how much segments we will need in total */
> +		max_segs_per_type = max_mem_per_type / pagesz;
> +		/* limit number of segments to maximum allowed per type
> */
> +		max_segs_per_type = RTE_MIN(max_segs_per_type,
> +				(unsigned
> int)RTE_MAX_MEMSEG_PER_TYPE);
> +		/* limit number of segments to maximum allowed per list */
> +		max_segs_per_list = RTE_MIN(max_segs_per_type,
> +				(unsigned
> int)RTE_MAX_MEMSEG_PER_LIST);
> 
> -				if (alloc_memseg_list(msl, hugepage_sz,
> n_segs,
> -						socket_id, type_msl_idx))
> -					return -1;
> +		/* calculate how much memory we can have per segment list
> */
> +		max_mem_per_list = RTE_MIN(max_segs_per_list * pagesz,
> +				(uint64_t)RTE_MAX_MEM_MB_PER_LIST <<
> 20);
> 
> -				total_segs += msl->memseg_arr.len;
> -				total_type_mem = total_segs *
> hugepage_sz;
> -				type_msl_idx++;
> +		/* calculate how many segments each segment list will have
> */
> +		n_segs = RTE_MIN(max_segs_per_list, max_mem_per_list /
> pagesz);
> 
> -				if (alloc_va_space(msl)) {
> -					RTE_LOG(ERR, EAL, "Cannot allocate
> VA space for memseg list\n");
> -					return -1;
> -				}
> +		/* calculate how many segment lists we can have */
> +		n_seglists = RTE_MIN(max_segs_per_type / n_segs,
> +				max_mem_per_type / max_mem_per_list);
> +
> +		/* limit number of segment lists according to our maximum
> */
> +		n_seglists = RTE_MIN(n_seglists, max_seglists_per_type);
> +
> +		RTE_LOG(DEBUG, EAL, "Creating %i segment lists: "
> +				"n_segs:%i socket_id:%i hugepage_sz:%"
> PRIu64 "\n",
> +			n_seglists, n_segs, socket_id, pagesz);
> +
> +		/* create all segment lists */
> +		for (cur_seglist = 0; cur_seglist < n_seglists; cur_seglist++) {
> +			if (msl_idx >= RTE_MAX_MEMSEG_LISTS) {
> +				RTE_LOG(ERR, EAL,
> +					"No more space in memseg lists,
> please increase %s\n",
> +
> 	RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS));
> +				return -1;
> +			}
> +			msl = &mcfg->memsegs[msl_idx++];
> +
> +			if (alloc_memseg_list(msl, pagesz, n_segs,
> +					socket_id, cur_seglist))
> +				return -1;
> +
> +			if (alloc_va_space(msl)) {
> +				RTE_LOG(ERR, EAL, "Cannot allocate VA
> space for memseg list\n");
> +				return -1;
>  			}
> -			total_mem += total_type_mem;
>  		}
>  	}
> --
> 2.19.0
> 
> ---
>   Diff of the applied patch vs upstream commit (please double-check if non-
> empty:
> ---
> --- -	2018-11-22 16:47:32.741839268 +0000
> +++ 0018-mem-improve-segment-list-preallocation.patch	2018-11-22
> 16:47:32.000000000 +0000
> @@ -1,8 +1,10 @@
> -From 1dd342d0fdc4f72102f0b48c89b6a39f029004fe Mon Sep 17 00:00:00
> 2001
> +From 6d552c83eacba7be4b4f2efbafd58724e07e2330 Mon Sep 17 00:00:00
> 2001
>  From: Anatoly Burakov <anatoly.burakov at intel.com>
>  Date: Fri, 5 Oct 2018 09:29:44 +0100
>  Subject: [PATCH] mem: improve segment list preallocation
> 
> +[ upstream commit 1dd342d0fdc4f72102f0b48c89b6a39f029004fe ]
> +
>  Current code to preallocate segment lists is trying to do  everything in one
> go, and thus ends up being convoluted,  hard to understand, and, most
> importantly, does not scale beyond @@ -21,7 +23,6 @@  its code is much
> more readable and not worth changing.
> 
>  Fixes: 1d406458db47 ("mem: make segment preallocation OS-specific")
> -Cc: stable at dpdk.org
> 
>  Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
>  ---
> @@ -29,10 +30,10 @@
>   1 file changed, 137 insertions(+), 42 deletions(-)
> 
>  diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> -index 04f264818..19e686eb6 100644
> +index 6131bfde2..cc2d3fb69 100644
>  --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>  +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> -@@ -2132,7 +2132,13 @@ memseg_primary_init(void)
> +@@ -2096,7 +2096,13 @@ memseg_primary_init(void)
>   {
>   	struct rte_mem_config *mcfg = rte_eal_get_configuration()-
> >mem_config;
>  -	int i, socket_id, hpi_idx, msl_idx = 0;
> @@ -48,7 +49,7 @@
>  +	unsigned int n_memtypes, cur_type;
> 
>   	/* no-huge does not need this at all */ -@@ -2140,8 +2146,49 @@
> memseg_primary_init(void)
> +@@ -2104,8 +2110,49 @@ memseg_primary_init(void)
>   		return 0;
> 
>  -	max_mem = (uint64_t)RTE_MAX_MEM_MB << 20;
> @@ -101,7 +102,7 @@
>  +	cur_type = 0;
>   	for (hpi_idx = 0; hpi_idx < (int) internal_config.num_hugepage_sizes;
>   			hpi_idx++) {
> -@@ -2152,9 +2199,6 @@ memseg_primary_init(void)
> +@@ -2116,9 +2163,6 @@ memseg_primary_init(void)
>   		hugepage_sz = hpi->hugepage_sz;
> 
>  -		for (i = 0; i < (int) rte_socket_count(); i++) {
> @@ -113,7 +114,7 @@
>  +			int socket_id = rte_socket_id_by_idx(i);
> 
>   #ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES
> -@@ -2162,47 +2206,98 @@ memseg_primary_init(void)
> +@@ -2126,47 +2170,98 @@ memseg_primary_init(void)
>   				break;
>   #endif
>  +			memtypes[cur_type].page_sz = hugepage_sz;


More information about the stable mailing list