[dpdk-dev,v3,3/3] memzone: improve zero-length memzone reserve

Message ID d0ac8c0cd12fc64b57ef01032228b67d8f0c1d7b.1525366407.git.anatoly.burakov@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Anatoly Burakov May 3, 2018, 5:18 p.m. UTC
  Currently, reserving zero-length memzones is done by looking at
malloc statistics, and reserving biggest sized element found in those
statistics. This has two issues.

First, there is a race condition. The heap is unlocked between the
time we check stats, and the time we reserve malloc element for memzone.
This may lead to inability to reserve the memzone we wanted to reserve,
because another allocation might have taken place and biggest sized
element may no longer be available.

Second, the size returned by malloc statistics does not include any
alignment information, which is worked around by being conservative and
subtracting alignment length from the final result. This leads to
fragmentation and reserving memzones that could have been bigger but
aren't.

Fix all of this by using earlier-introduced operation to reserve
biggest possible malloc element. This, however, comes with a trade-off,
because we can only lock one heap at a time. So, if we check the first
available heap and find *any* element at all, that element will be
considered "the biggest", even though other heaps might have bigger
elements. We cannot know what other heaps have before we try and
allocate it, and it is not a good idea to lock all of the heaps at
the same time, so, we will just document this limitation and
encourage users to reserve memzones with socket id properly set.

Also, fixup unit tests to account for the new behavior.

Fixes: fafcc11985a2 ("mem: rework memzone to be allocated by malloc")
Cc: sergio.gonzalez.monroy@intel.com

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_memzone.c  |  62 ++---------
 lib/librte_eal/common/include/rte_memzone.h |  18 ++++
 test/test/test_memzone.c                    | 157 +++++++++++++++-------------
 3 files changed, 115 insertions(+), 122 deletions(-)
  

Comments

Remy Horton May 11, 2018, 10:25 a.m. UTC | #1
On 03/05/2018 18:18, Anatoly Burakov wrote:
[..]
> Also, fixup unit tests to account for the new behavior.

Tried running the tests but it fails on a boundary check:

RTE>>memzone_autotest
test basic memzone API
[...]
1GB Huge pages available
test alignment for memzone_reserve
check alignments and lengths
check overlapping
test boundary alignment for memzone_reserve
check_memzone_bounded(MZ_TEST_bounded_128): invalid memzone boundary 128 
crossed
Test Failed
RTE>>
  
Anatoly Burakov May 14, 2018, 8:21 a.m. UTC | #2
On 11-May-18 11:25 AM, Remy Horton wrote:
> 
> On 03/05/2018 18:18, Anatoly Burakov wrote:
> [..]
>> Also, fixup unit tests to account for the new behavior.
> 
> Tried running the tests but it fails on a boundary check:
> 
> RTE>>memzone_autotest
> test basic memzone API
> [...]
> 1GB Huge pages available
> test alignment for memzone_reserve
> check alignments and lengths
> check overlapping
> test boundary alignment for memzone_reserve
> check_memzone_bounded(MZ_TEST_bounded_128): invalid memzone boundary 128 
> crossed
> Test Failed
> RTE>>
> 

Hi Remy,

Passes here, but i'll look into it, thanks for the report.
  
Anatoly Burakov May 14, 2018, 11:29 a.m. UTC | #3
On 14-May-18 9:21 AM, Burakov, Anatoly wrote:
> On 11-May-18 11:25 AM, Remy Horton wrote:
>>
>> On 03/05/2018 18:18, Anatoly Burakov wrote:
>> [..]
>>> Also, fixup unit tests to account for the new behavior.
>>
>> Tried running the tests but it fails on a boundary check:
>>
>> RTE>>memzone_autotest
>> test basic memzone API
>> [...]
>> 1GB Huge pages available
>> test alignment for memzone_reserve
>> check alignments and lengths
>> check overlapping
>> test boundary alignment for memzone_reserve
>> check_memzone_bounded(MZ_TEST_bounded_128): invalid memzone boundary 
>> 128 crossed
>> Test Failed
>> RTE>>
>>
> 
> Hi Remy,
> 
> Passes here, but i'll look into it, thanks for the report.
> 

This failure is not caused by this patchset, and you should get similar 
failures on master if you get these while testing my patchset. I am not 
able to reproduce this issue, but i'll double-check the bounded reserve 
code with a fine-toothed comb anyway.
  
Anatoly Burakov May 14, 2018, 12:23 p.m. UTC | #4
On 14-May-18 12:29 PM, Burakov, Anatoly wrote:
> On 14-May-18 9:21 AM, Burakov, Anatoly wrote:
>> On 11-May-18 11:25 AM, Remy Horton wrote:
>>>
>>> On 03/05/2018 18:18, Anatoly Burakov wrote:
>>> [..]
>>>> Also, fixup unit tests to account for the new behavior.
>>>
>>> Tried running the tests but it fails on a boundary check:
>>>
>>> RTE>>memzone_autotest
>>> test basic memzone API
>>> [...]
>>> 1GB Huge pages available
>>> test alignment for memzone_reserve
>>> check alignments and lengths
>>> check overlapping
>>> test boundary alignment for memzone_reserve
>>> check_memzone_bounded(MZ_TEST_bounded_128): invalid memzone boundary 
>>> 128 crossed
>>> Test Failed
>>> RTE>>
>>>
>>
>> Hi Remy,
>>
>> Passes here, but i'll look into it, thanks for the report.
>>
> 
> This failure is not caused by this patchset, and you should get similar 
> failures on master if you get these while testing my patchset. I am not 
> able to reproduce this issue, but i'll double-check the bounded reserve 
> code with a fine-toothed comb anyway.
> 
Further investigation showed that this is indeed related to the new 
code. So, v5 has a bug too, i'll fix it for v6.
  
Remy Horton May 15, 2018, 6:24 a.m. UTC | #5
On 14/05/2018 12:29, Burakov, Anatoly wrote:
[..]
> This failure is not caused by this patchset, and you should get similar
> failures on master if you get these while testing my patchset. I am not
> able to reproduce this issue, but i'll double-check the bounded reserve
> code with a fine-toothed comb anyway.

I reliably get the failure with V3 applied to RC2, but so far haven't 
been able to replicate with any other combination (clean RC2, RC2+V5, 
master+V4, etc). Odd..
  
Anatoly Burakov May 15, 2018, 8:37 a.m. UTC | #6
On 15-May-18 7:24 AM, Remy Horton wrote:
> 
> On 14/05/2018 12:29, Burakov, Anatoly wrote:
> [..]
>> This failure is not caused by this patchset, and you should get similar
>> failures on master if you get these while testing my patchset. I am not
>> able to reproduce this issue, but i'll double-check the bounded reserve
>> code with a fine-toothed comb anyway.
> 
> I reliably get the failure with V3 applied to RC2, but so far haven't 
> been able to replicate with any other combination (clean RC2, RC2+V5, 
> master+V4, etc). Odd..
> 

I've retested on plain v3+rc2 (i was previously testing on master), and 
i can reproduce the failure as well. The bug that was causing this issue 
was fixed on rebase in v4. So, rc2+v4/v5 no longer has this issue.
  

Patch

diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index bce3321..18e2349 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -52,38 +52,6 @@  memzone_lookup_thread_unsafe(const char *name)
 	return NULL;
 }
 
-
-/* This function will return the greatest free block if a heap has been
- * specified. If no heap has been specified, it will return the heap and
- * length of the greatest free block available in all heaps */
-static size_t
-find_heap_max_free_elem(int *s, unsigned align)
-{
-	struct rte_mem_config *mcfg;
-	struct rte_malloc_socket_stats stats;
-	int i, socket = *s;
-	size_t len = 0;
-
-	/* get pointer to global configuration */
-	mcfg = rte_eal_get_configuration()->mem_config;
-
-	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
-		if ((socket != SOCKET_ID_ANY) && (socket != i))
-			continue;
-
-		malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
-		if (stats.greatest_free_size > len) {
-			len = stats.greatest_free_size;
-			*s = i;
-		}
-	}
-
-	if (len < MALLOC_ELEM_OVERHEAD + align)
-		return 0;
-
-	return len - MALLOC_ELEM_OVERHEAD - align;
-}
-
 static const struct rte_memzone *
 memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		int socket_id, unsigned int flags, unsigned int align,
@@ -92,6 +60,7 @@  memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	struct rte_memzone *mz;
 	struct rte_mem_config *mcfg;
 	struct rte_fbarray *arr;
+	void *mz_addr;
 	size_t requested_len;
 	int mz_idx;
 	bool contig;
@@ -165,27 +134,16 @@  memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	/* malloc only cares about size flags, remove contig flag from flags */
 	flags &= ~RTE_MEMZONE_IOVA_CONTIG;
 
-	if (len == 0) {
-		/* len == 0 is only allowed for non-contiguous zones */
-		if (contig) {
-			RTE_LOG(DEBUG, EAL, "Reserving zero-length contiguous memzones is not supported\n");
-			rte_errno = EINVAL;
-			return NULL;
-		}
-		if (bound != 0)
+	if (len == 0 && bound == 0) {
+		mz_addr = malloc_heap_alloc_biggest(NULL, socket_id, flags,
+				align, contig);
+	} else {
+		if (len == 0)
 			requested_len = bound;
-		else {
-			requested_len = find_heap_max_free_elem(&socket_id, align);
-			if (requested_len == 0) {
-				rte_errno = ENOMEM;
-				return NULL;
-			}
-		}
+		/* allocate memory on heap */
+		mz_addr = malloc_heap_alloc(NULL, requested_len, socket_id,
+				flags, align, bound, contig);
 	}
-
-	/* allocate memory on heap */
-	void *mz_addr = malloc_heap_alloc(NULL, requested_len, socket_id, flags,
-			align, bound, contig);
 	if (mz_addr == NULL) {
 		rte_errno = ENOMEM;
 		return NULL;
@@ -213,7 +171,7 @@  memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	snprintf(mz->name, sizeof(mz->name), "%s", name);
 	mz->iova = rte_malloc_virt2iova(mz_addr);
 	mz->addr = mz_addr;
-	mz->len = (requested_len == 0 ? elem->size : requested_len);
+	mz->len = elem->size - MALLOC_ELEM_OVERHEAD;
 	mz->hugepage_sz = elem->msl->page_sz;
 	mz->socket_id = elem->msl->socket_id;
 	mz->flags = 0;
diff --git a/lib/librte_eal/common/include/rte_memzone.h b/lib/librte_eal/common/include/rte_memzone.h
index 0eeb94f..beb591e 100644
--- a/lib/librte_eal/common/include/rte_memzone.h
+++ b/lib/librte_eal/common/include/rte_memzone.h
@@ -77,6 +77,12 @@  struct rte_memzone {
  * correctly filled memzone descriptor. If the allocation cannot be
  * done, return NULL.
  *
+ * @note: It is preferable to reserve memzones with len set to 0 and a valid
+ *   socket_id. Setting socket_id to SOCKET_ID_ANY is supported, but will likely
+ *   not yield expected results. Specifically, the resulting memzone may not
+ *   necessarily be the biggest memzone, but rather biggest memzone on socket id
+ *   corresponding to an lcore from which reservation was called.
+ *
  * @param name
  *   The name of the memzone. If it already exists, the function will
  *   fail and return NULL.
@@ -130,6 +136,12 @@  const struct rte_memzone *rte_memzone_reserve(const char *name,
  * descriptor. If the allocation cannot be done or if the alignment
  * is not a power of 2, returns NULL.
  *
+ * @note: It is preferable to reserve memzones with len set to 0 and a valid
+ *   socket_id. Setting socket_id to SOCKET_ID_ANY is supported, but will likely
+ *   not yield expected results. Specifically, the resulting memzone may not
+ *   necessarily be the biggest memzone, but rather biggest memzone on socket id
+ *   corresponding to an lcore from which reservation was called.
+ *
  * @param name
  *   The name of the memzone. If it already exists, the function will
  *   fail and return NULL.
@@ -188,6 +200,12 @@  const struct rte_memzone *rte_memzone_reserve_aligned(const char *name,
  * boundary. That implies that requested length should be less or equal
  * then boundary.
  *
+ * @note: It is preferable to reserve memzones with len set to 0 and a valid
+ *   socket_id. Setting socket_id to SOCKET_ID_ANY is supported, but will likely
+ *   not yield expected results. Specifically, the resulting memzone may not
+ *   necessarily be the biggest memzone, but rather biggest memzone on socket id
+ *   corresponding to an lcore from which reservation was called.
+ *
  * @param name
  *   The name of the memzone. If it already exists, the function will
  *   fail and return NULL.
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index efcf732..696a7e9 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -467,23 +467,16 @@  test_memzone_reserve_flags(void)
 
 /* Find the heap with the greatest free block size */
 static size_t
-find_max_block_free_size(const unsigned _align)
+find_max_block_free_size(unsigned align, unsigned int socket_id)
 {
 	struct rte_malloc_socket_stats stats;
-	unsigned i, align = _align;
-	size_t len = 0;
+	size_t len;
 
-	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
-		rte_malloc_get_socket_stats(i, &stats);
-		if (stats.greatest_free_size > len)
-			len = stats.greatest_free_size;
-	}
+	rte_malloc_get_socket_stats(socket_id, &stats);
 
-	if (align < RTE_CACHE_LINE_SIZE)
-		align = RTE_CACHE_LINE_ROUNDUP(align+1);
+	len = stats.greatest_free_size;
 
-	if (len <= MALLOC_ELEM_OVERHEAD + align)
-		return 0;
+	align = RTE_CACHE_LINE_ROUNDUP(align);
 
 	return len - MALLOC_ELEM_OVERHEAD - align;
 }
@@ -491,37 +484,44 @@  find_max_block_free_size(const unsigned _align)
 static int
 test_memzone_reserve_max(void)
 {
-	const struct rte_memzone *mz;
-	size_t maxlen;
+	unsigned int i;
 
-	maxlen = find_max_block_free_size(0);
+	for (i = 0; i < rte_socket_count(); i++) {
+		const struct rte_memzone *mz;
+		size_t maxlen;
+		int socket;
 
-	if (maxlen == 0) {
-		printf("There is no space left!\n");
-		return 0;
-	}
+		socket = rte_socket_id_by_idx(i);
+		maxlen = find_max_block_free_size(0, socket);
 
-	mz = rte_memzone_reserve(TEST_MEMZONE_NAME("max_zone"), 0,
-			SOCKET_ID_ANY, 0);
-	if (mz == NULL){
-		printf("Failed to reserve a big chunk of memory - %s\n",
-				rte_strerror(rte_errno));
-		rte_dump_physmem_layout(stdout);
-		rte_memzone_dump(stdout);
-		return -1;
-	}
+		if (maxlen == 0) {
+			printf("There is no space left!\n");
+			return 0;
+		}
 
-	if (mz->len != maxlen) {
-		printf("Memzone reserve with 0 size did not return bigest block\n");
-		printf("Expected size = %zu, actual size = %zu\n", maxlen, mz->len);
-		rte_dump_physmem_layout(stdout);
-		rte_memzone_dump(stdout);
-		return -1;
-	}
+		mz = rte_memzone_reserve(TEST_MEMZONE_NAME("max_zone"), 0,
+				socket, 0);
+		if (mz == NULL) {
+			printf("Failed to reserve a big chunk of memory - %s\n",
+					rte_strerror(rte_errno));
+			rte_dump_physmem_layout(stdout);
+			rte_memzone_dump(stdout);
+			return -1;
+		}
 
-	if (rte_memzone_free(mz)) {
-		printf("Fail memzone free\n");
-		return -1;
+		if (mz->len != maxlen) {
+			printf("Memzone reserve with 0 size did not return bigest block\n");
+			printf("Expected size = %zu, actual size = %zu\n",
+					maxlen, mz->len);
+			rte_dump_physmem_layout(stdout);
+			rte_memzone_dump(stdout);
+			return -1;
+		}
+
+		if (rte_memzone_free(mz)) {
+			printf("Fail memzone free\n");
+			return -1;
+		}
 	}
 
 	return 0;
@@ -530,45 +530,62 @@  test_memzone_reserve_max(void)
 static int
 test_memzone_reserve_max_aligned(void)
 {
-	const struct rte_memzone *mz;
-	size_t maxlen = 0;
+	unsigned int i;
 
-	/* random alignment */
-	rte_srand((unsigned)rte_rdtsc());
-	const unsigned align = 1 << ((rte_rand() % 8) + 5); /* from 128 up to 4k alignment */
+	for (i = 0; i < rte_socket_count(); i++) {
+		const struct rte_memzone *mz;
+		size_t max_len, min_len = 0;
+		int socket;
 
-	maxlen = find_max_block_free_size(align);
+		socket = rte_socket_id_by_idx(i);
 
-	if (maxlen == 0) {
-		printf("There is no space left for biggest %u-aligned memzone!\n", align);
-		return 0;
-	}
+		/* random alignment */
+		rte_srand((unsigned)rte_rdtsc());
+		const unsigned int align = 1 << ((rte_rand() % 8) + 5); /* from 128 up to 4k alignment */
 
-	mz = rte_memzone_reserve_aligned(TEST_MEMZONE_NAME("max_zone_aligned"),
-			0, SOCKET_ID_ANY, 0, align);
-	if (mz == NULL){
-		printf("Failed to reserve a big chunk of memory - %s\n",
-				rte_strerror(rte_errno));
-		rte_dump_physmem_layout(stdout);
-		rte_memzone_dump(stdout);
-		return -1;
-	}
+		/* memzone size may be between size and size - align */
+		min_len = find_max_block_free_size(align, socket);
+		max_len = find_max_block_free_size(0, socket);
 
-	if (mz->len != maxlen) {
-		printf("Memzone reserve with 0 size and alignment %u did not return"
-				" bigest block\n", align);
-		printf("Expected size = %zu, actual size = %zu\n",
-				maxlen, mz->len);
-		rte_dump_physmem_layout(stdout);
-		rte_memzone_dump(stdout);
-		return -1;
-	}
+		if (min_len == 0 || max_len == 0) {
+			printf("There is no space left for biggest %u-aligned memzone!\n",
+					align);
+			return 0;
+		}
 
-	if (rte_memzone_free(mz)) {
-		printf("Fail memzone free\n");
-		return -1;
-	}
+		mz = rte_memzone_reserve_aligned(
+				TEST_MEMZONE_NAME("max_zone_aligned"),
+				0, socket, 0, align);
+		if (mz == NULL) {
+			printf("Failed to reserve a big chunk of memory - %s\n",
+					rte_strerror(rte_errno));
+			rte_dump_physmem_layout(stdout);
+			rte_memzone_dump(stdout);
+			return -1;
+		}
+		if (mz->addr != RTE_PTR_ALIGN(mz->addr, align)) {
+			printf("Memzone reserve with 0 size and alignment %u did not return aligned block\n",
+					align);
+			rte_dump_physmem_layout(stdout);
+			rte_memzone_dump(stdout);
+			return -1;
+		}
+
+		if (mz->len < min_len || mz->len > max_len) {
+			printf("Memzone reserve with 0 size and alignment %u did not return"
+					" bigest block\n", align);
+			printf("Expected size = %zu-%zu, actual size = %zu\n",
+					min_len, max_len, mz->len);
+			rte_dump_physmem_layout(stdout);
+			rte_memzone_dump(stdout);
+			return -1;
+		}
 
+		if (rte_memzone_free(mz)) {
+			printf("Fail memzone free\n");
+			return -1;
+		}
+	}
 	return 0;
 }