[dpdk-dev,1/2] eal/malloc: merge malloc_elems in heap if they are contiguous

Message ID ebc0cbca46c5df7ebc1a04717b9bb83f2c2d0204.1525341819.git.gowrishankar.m@linux.vnet.ibm.com (mailing list archive)
State Not Applicable, archived
Delegated to: Thomas Monjalon
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation fail apply issues

Commit Message

Gowrishankar May 3, 2018, 10:11 a.m. UTC
  From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>

During malloc heap init, if there are malloc_elems contiguous in
virt addresses, they could be merged so that, merged malloc_elem
would guarantee larger free memory size than its actual hugepage
size, it was created for.

Fixes: fafcc11985 ("mem: rework memzone to be allocated by malloc")
Cc: stable@dpdk.org

Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
---
 lib/librte_eal/common/malloc_heap.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)
  

Comments

Anatoly Burakov May 4, 2018, 9:29 a.m. UTC | #1
On 03-May-18 11:11 AM, Gowrishankar wrote:
> From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
> 
> During malloc heap init, if there are malloc_elems contiguous in
> virt addresses, they could be merged so that, merged malloc_elem
> would guarantee larger free memory size than its actual hugepage
> size, it was created for.
> 
> Fixes: fafcc11985 ("mem: rework memzone to be allocated by malloc")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
> ---

Hi Gowrishankar,

I haven't looked at the patchset in detail yet, however i have a general 
question: how do we end up with VA-contiguous memsegs that are not part 
of the same memseg in the first place? Is there something wrong with 
memseg sorting code? Alternatively, if they were broken up, presumably 
they were broken up for a reason, namely while they may be VA 
contiguous, they weren't IOVA-contiguous.

Can you provide a dump of physmem layout where memory would have been VA 
and IOVA-contiguous while belonging to different memsegs?
  
Gowrishankar May 4, 2018, 10:41 a.m. UTC | #2
Hi Anatoly,

On Friday 04 May 2018 02:59 PM, Burakov, Anatoly wrote:
> On 03-May-18 11:11 AM, Gowrishankar wrote:
>> From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
>>
>> During malloc heap init, if there are malloc_elems contiguous in
>> virt addresses, they could be merged so that, merged malloc_elem
>> would guarantee larger free memory size than its actual hugepage
>> size, it was created for.
>>
>> Fixes: fafcc11985 ("mem: rework memzone to be allocated by malloc")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Gowrishankar Muthukrishnan 
>> <gowrishankar.m@linux.vnet.ibm.com>
>> ---
>
> Hi Gowrishankar,
>
> I haven't looked at the patchset in detail yet, however i have a 
> general question: how do we end up with VA-contiguous memsegs that are 
> not part of the same memseg in the first place? Is there something 
> wrong with memseg sorting code? Alternatively, if 
> they were broken up, presumably they were broken up for a reason, 
> namely while they may be VA contiguous, they weren't IOVA-contiguous.

In powerpc, when *nr_overcommit_hugepages set* (to respect address hint 
in get_virtual_area() as requested by secondary process), mmap() would 
not be allocate one big VA chunk for all the available hugepages. In 
order to support secondary process be in same VA
range, we need to add anonymous and hugetlb flags in mmap calls while 
remapping. As mmap can only create max VA at the size of hugepage 
(MAP_HUGETLB) and also to respect address hint (MAP_ANONYMOUS), multiple 
VA chunks are created, even though both VA and IOVA are contiguous in 
most of the cases.

>
> Can you provide a dump of physmem layout where memory would have been 
> VA and IOVA-contiguous while belonging to different memsegs?

Please find here: https://pastebin.com/tDNEaxdU

As you notice malloc_heaps, its index for heap size is 8 which is 
supposedly 11.

To note, these are not problems with memory rework done in latest code 
base. So, I refered code until v18.02.
  
Anatoly Burakov May 4, 2018, 11:02 a.m. UTC | #3
On 04-May-18 11:41 AM, gowrishankar muthukrishnan wrote:
> Hi Anatoly,
> 
> On Friday 04 May 2018 02:59 PM, Burakov, Anatoly wrote:
>> On 03-May-18 11:11 AM, Gowrishankar wrote:
>>> From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
>>>
>>> During malloc heap init, if there are malloc_elems contiguous in
>>> virt addresses, they could be merged so that, merged malloc_elem
>>> would guarantee larger free memory size than its actual hugepage
>>> size, it was created for.
>>>
>>> Fixes: fafcc11985 ("mem: rework memzone to be allocated by malloc")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Gowrishankar Muthukrishnan 
>>> <gowrishankar.m@linux.vnet.ibm.com>
>>> ---
>>
>> Hi Gowrishankar,
>>
>> I haven't looked at the patchset in detail yet, however i have a 
>> general question: how do we end up with VA-contiguous memsegs that are 
>> not part of the same memseg in the first place? Is there something 
>> wrong with memseg sorting code? Alternatively, if they were broken up, 
>> presumably they were broken up for a reason, namely while they may be 
>> VA contiguous, they weren't IOVA-contiguous.
> 
> In powerpc, when *nr_overcommit_hugepages set* (to respect address hint 
> in get_virtual_area() as requested by secondary process), mmap() would 
> not be allocate one big VA chunk for all the available hugepages. In 
> order to support secondary process be in same VA
> range, we need to add anonymous and hugetlb flags in mmap calls while 
> remapping. As mmap can only create max VA at the size of hugepage 
> (MAP_HUGETLB) and also to respect address hint (MAP_ANONYMOUS), multiple 
> VA chunks are created, even though both VA and IOVA are contiguous in 
> most of the cases.

OK, suppose on PPC64, that may happen. Still (and please correct me if 
i'm misunderstanding the patchset - as i said, i haven't looked at it in 
detail, and have only taken a cursory look), there are two issues i see 
here:

1) there's no check for IOVA-contiguousness, only VA-contiguousness, 
which means you are risking accidentally concatenating segments that 
aren't IOVA-contiguous. Prior to 18.05, the rest of DPDK expects all 
segments to be VA- and IOVA-contiguous.

2) i don't think this problem should be solved in malloc. Malloc 
elements have memseg pointers in them, and if you concatenate multiple 
segments, you will end up having malloc elements which point to wrong 
segments. Instead, you should fix memseg allocation code to do 
concatenate seemingly disparate segments, and avoid the problem with 
malloc elements in the first place. Maybe do another sorting pass, or 
something. In any case, memseg allocation code is the correct place to 
fix this, IMO.

> 
>>
>> Can you provide a dump of physmem layout where memory would have been 
>> VA and IOVA-contiguous while belonging to different memsegs?
> 
> Please find here: https://pastebin.com/tDNEaxdU
> 
> As you notice malloc_heaps, its index for heap size is 8 which is 
> supposedly 11.

That's a bit hard to read. There's a rte_eal_dump_physmem_layout() 
function that should help display this in a more user-friendly manner :)

> 
> To note, these are not problems with memory rework done in latest code 
> base. So, I refered code until v18.02.
>
  

Patch

diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 267a4c6..1cacf7f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -213,7 +213,9 @@ 
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	unsigned ms_cnt;
-	struct rte_memseg *ms;
+	struct rte_memseg *ms, *prev_ms = NULL;
+	struct malloc_elem *elem, *prev_elem;
+	int ret;
 
 	if (mcfg == NULL)
 		return -1;
@@ -222,6 +224,32 @@ 
 			(ms_cnt < RTE_MAX_MEMSEG) && (ms->len > 0);
 			ms_cnt++, ms++) {
 		malloc_heap_add_memseg(&mcfg->malloc_heaps[ms->socket_id], ms);
+		elem = (struct malloc_elem *)ms->addr;
+		if (prev_ms != NULL && \
+			(ms->socket_id == prev_ms->socket_id)) {
+			prev_elem = (struct malloc_elem *)prev_ms->addr;
+
+			/* prev_elem and elem to be contiguous for the resize.
+			   Other wise look for prev_elem in iterations */
+			if (elem != RTE_PTR_ADD(prev_elem,
+				prev_elem->size + MALLOC_ELEM_OVERHEAD)) {
+				prev_ms = ms;
+				continue;
+			}
+			/* end BUSY elem pointed by prev_elem can be merged
+			   with prev_elem itself, as it expands it size now.
+			 */
+			prev_elem->size += MALLOC_ELEM_OVERHEAD;
+
+			/* preserve end BUSY elem that points to current elem,
+			   or else free_list will be broken */
+			ret = malloc_elem_resize(prev_elem,
+				prev_elem->size + elem->size - MALLOC_ELEM_OVERHEAD);
+			if (ret < 0)
+				prev_elem = elem;
+		} else {
+			prev_ms = ms;
+		}
 	}
 
 	return 0;