eal: unmap unneed dpdk VA spaces for legacy mem

Message ID 20190308053859.19980-1-jerry.lilijun@huawei.com (mailing list archive)
State Rejected, archived
Delegated to: Thomas Monjalon
Headers
Series eal: unmap unneed dpdk VA spaces for legacy mem |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/intel-Performance-Testing success Performance Testing PASS
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Lilijun (Jerry) March 8, 2019, 5:38 a.m. UTC
  Comparing dpdk VA spaces to dpdk 16.11, the dpdk app process's VA spaces increase to above 30G.
Here we can unmap the unneed VA spaces in rte_memseg_list.

Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)
  

Comments

Burakov, Anatoly March 8, 2019, 9:37 a.m. UTC | #1
On 08-Mar-19 5:38 AM, Lilijun wrote:
> Comparing dpdk VA spaces to dpdk 16.11, the dpdk app process's VA spaces increase to above 30G.
> Here we can unmap the unneed VA spaces in rte_memseg_list.
> 
> Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
> ---
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 32feb41..56abdd2 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1626,8 +1626,19 @@ void numa_error(char *where)
>   		if (msl->base_va == NULL)
>   			continue;
>   		/* skip lists where there is at least one page allocated */
> -		if (msl->memseg_arr.count > 0)
> +		if (msl->memseg_arr.count > 0) {
> +			if (internal_config.legacy_mem) {
> +				struct rte_fbarray *arr = &msl->memseg_arr;
> +				int idx = rte_fbarray_find_next_free(arr, 0);
> +
> +				while (idx >= 0) {
> +					void *va = (void*)((char*)msl->base_va + idx * msl->page_sz);
> +					munmap(va, msl->page_sz);
> +					idx = rte_fbarray_find_next_free(arr, idx + 1);
> +				}

I am not entirely convinced this change is safe to do. Technically, this 
space is marked as free, so correctly written code should not attempt to 
access it, however it is still potentially dangerous to have memory area 
that is supposed to be allocated (according to data structures' 
parameters), but isn't.

If you are deallocating the VA space, ideally you should also resize the 
memseg list (as in, change its length), because that leftover memory 
area is no longer valid. However, this then presents us with a mismatch 
between (va_start + len) and (va_start + page_sz * memseg_arr.len), 
which may break things further.

May i ask what is the purpose of this change? I mean, i understand the 
part about unused VA space sitting there, but what is the consequence of 
that? This isn't 32-bit codepath, and in 64-bit there's plenty of 
address space to go around, and this memory doesn't take up any system 
resources anyway because it is read-only anonymous memory, and is 
therefore backed by zero page instead of real pages. So, what's wrong 
with just leaving it there?

I don't see any advantage of this change, and i see plenty of 
disadvantages, so for now i'm inclined to NACK this particular patch.

_However_, i should note that if you feel this is very important feature 
to have and would still like to implement it, my advise would be to look 
at how 32-bit code works, and model the 64-bit implementation after 
that, because 32-bit codepath does exactly what you propose, and doesn't 
leave unused address space.

> +			} >   			continue;
> +		}
>   		/* this is an unused list, deallocate it */
>   		mem_sz = msl->len;
>   		munmap(msl->base_va, mem_sz);
>
  
Lilijun (Jerry) March 12, 2019, 1:47 a.m. UTC | #2
Hi Anatoly,

> -----Original Message-----
> From: Burakov, Anatoly [mailto:anatoly.burakov@intel.com]
> Sent: Friday, March 08, 2019 5:38 PM
> To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun@huawei.com>;
> dev@dpdk.org
> Cc: jerry.zhang@intel.com; ian.stokes@intel.com
> Subject: Re: [dpdk-dev] [PATCH] eal: unmap unneed dpdk VA spaces for
> legacy mem
> 
> On 08-Mar-19 5:38 AM, Lilijun wrote:
> > Comparing dpdk VA spaces to dpdk 16.11, the dpdk app process's VA
> spaces increase to above 30G.
> > Here we can unmap the unneed VA spaces in rte_memseg_list.
> >
> > Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
> > ---
> >   lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
> >   1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
> > b/lib/librte_eal/linuxapp/eal/eal_memory.c
> > index 32feb41..56abdd2 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> > @@ -1626,8 +1626,19 @@ void numa_error(char *where)
> >   		if (msl->base_va == NULL)
> >   			continue;
> >   		/* skip lists where there is at least one page allocated */
> > -		if (msl->memseg_arr.count > 0)
> > +		if (msl->memseg_arr.count > 0) {
> > +			if (internal_config.legacy_mem) {
> > +				struct rte_fbarray *arr = &msl->memseg_arr;
> > +				int idx = rte_fbarray_find_next_free(arr, 0);
> > +
> > +				while (idx >= 0) {
> > +					void *va = (void*)((char*)msl-
> >base_va + idx * msl->page_sz);
> > +					munmap(va, msl->page_sz);
> > +					idx = rte_fbarray_find_next_free(arr,
> idx + 1);
> > +				}
> 
> I am not entirely convinced this change is safe to do. Technically, this space is
> marked as free, so correctly written code should not attempt to access it,
> however it is still potentially dangerous to have memory area that is
> supposed to be allocated (according to data structures'
> parameters), but isn't.
> 
> If you are deallocating the VA space, ideally you should also resize the
> memseg list (as in, change its length), because that leftover memory area is
> no longer valid. However, this then presents us with a mismatch between
> (va_start + len) and (va_start + page_sz * memseg_arr.len), which may
> break things further.

Yes, you're right, here we need resize the memseg length. I will update it if this patch is needed.
> 
> May i ask what is the purpose of this change? I mean, i understand the part
> about unused VA space sitting there, but what is the consequence of that?
> This isn't 32-bit codepath, and in 64-bit there's plenty of address space to go
> around, and this memory doesn't take up any system resources anyway
> because it is read-only anonymous memory, and is therefore backed by zero
> page instead of real pages. So, what's wrong with just leaving it there?

This change will cause a issues:  when dpdk apps crashed, the coredump file will become too large.

Thanks.

> 
> I don't see any advantage of this change, and i see plenty of disadvantages,
> so for now i'm inclined to NACK this particular patch.
> 
> _However_, i should note that if you feel this is very important feature to
> have and would still like to implement it, my advise would be to look at how
> 32-bit code works, and model the 64-bit implementation after that, because
> 32-bit codepath does exactly what you propose, and doesn't leave unused
> address space.
> 
> > +			} >   			continue;
> > +		}
> >   		/* this is an unused list, deallocate it */
> >   		mem_sz = msl->len;
> >   		munmap(msl->base_va, mem_sz);
> >
> 
> 
> --
> Thanks,
> Anatoly
  
Burakov, Anatoly March 12, 2019, 11:02 a.m. UTC | #3
On 12-Mar-19 1:47 AM, Lilijun (Jerry, Cloud Networking) wrote:
> Hi Anatoly,
> 
>> -----Original Message-----
>> From: Burakov, Anatoly [mailto:anatoly.burakov@intel.com]
>> Sent: Friday, March 08, 2019 5:38 PM
>> To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun@huawei.com>;
>> dev@dpdk.org
>> Cc: jerry.zhang@intel.com; ian.stokes@intel.com
>> Subject: Re: [dpdk-dev] [PATCH] eal: unmap unneed dpdk VA spaces for
>> legacy mem
>>
>> On 08-Mar-19 5:38 AM, Lilijun wrote:
>>> Comparing dpdk VA spaces to dpdk 16.11, the dpdk app process's VA
>> spaces increase to above 30G.
>>> Here we can unmap the unneed VA spaces in rte_memseg_list.
>>>
>>> Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
>>> ---
>>>    lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
>>>    1 file changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> b/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> index 32feb41..56abdd2 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> @@ -1626,8 +1626,19 @@ void numa_error(char *where)
>>>    		if (msl->base_va == NULL)
>>>    			continue;
>>>    		/* skip lists where there is at least one page allocated */
>>> -		if (msl->memseg_arr.count > 0)
>>> +		if (msl->memseg_arr.count > 0) {
>>> +			if (internal_config.legacy_mem) {
>>> +				struct rte_fbarray *arr = &msl->memseg_arr;
>>> +				int idx = rte_fbarray_find_next_free(arr, 0);
>>> +
>>> +				while (idx >= 0) {
>>> +					void *va = (void*)((char*)msl-
>>> base_va + idx * msl->page_sz);
>>> +					munmap(va, msl->page_sz);
>>> +					idx = rte_fbarray_find_next_free(arr,
>> idx + 1);
>>> +				}
>>
>> I am not entirely convinced this change is safe to do. Technically, this space is
>> marked as free, so correctly written code should not attempt to access it,
>> however it is still potentially dangerous to have memory area that is
>> supposed to be allocated (according to data structures'
>> parameters), but isn't.
>>
>> If you are deallocating the VA space, ideally you should also resize the
>> memseg list (as in, change its length), because that leftover memory area is
>> no longer valid. However, this then presents us with a mismatch between
>> (va_start + len) and (va_start + page_sz * memseg_arr.len), which may
>> break things further.
> 
> Yes, you're right, here we need resize the memseg length. I will update it if this patch is needed.

Resizing memseg list is not the best course of action because fbarray 
itself doesn't support resizing, so you'll end up with a mismatch 
between length of memory and length of fbarray backing the memseg list.

See below suggestion for implementation.

>>
>> May i ask what is the purpose of this change? I mean, i understand the part
>> about unused VA space sitting there, but what is the consequence of that?
>> This isn't 32-bit codepath, and in 64-bit there's plenty of address space to go
>> around, and this memory doesn't take up any system resources anyway
>> because it is read-only anonymous memory, and is therefore backed by zero
>> page instead of real pages. So, what's wrong with just leaving it there?
> 
> This change will cause a issues:  when dpdk apps crashed, the coredump file will become too large.
> 
> Thanks.

You must have different default coredump settings than i do, because i 
haven't seen Linux attempting to dump the entire address space before (i 
have seen FreeBSD do that, mind you...).

> 
>>
>> I don't see any advantage of this change, and i see plenty of disadvantages,
>> so for now i'm inclined to NACK this particular patch.
>>
>> _However_, i should note that if you feel this is very important feature to
>> have and would still like to implement it, my advise would be to look at how
>> 32-bit code works, and model the 64-bit implementation after that, because
>> 32-bit codepath does exactly what you propose, and doesn't leave unused
>> address space.

The above is the way to go as far as implementing this particular 
feature goes: this has to be done at memseg list allocation time, not 
post-factum, when memseg lists are already allocated.

>>
>>> +			} >   			continue;
>>> +		}
>>>    		/* this is an unused list, deallocate it */
>>>    		mem_sz = msl->len;
>>>    		munmap(msl->base_va, mem_sz);
>>>
>>
>>
>> --
>> Thanks,
>> Anatoly
  

Patch

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 32feb41..56abdd2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1626,8 +1626,19 @@  void numa_error(char *where)
 		if (msl->base_va == NULL)
 			continue;
 		/* skip lists where there is at least one page allocated */
-		if (msl->memseg_arr.count > 0)
+		if (msl->memseg_arr.count > 0) {
+			if (internal_config.legacy_mem) {
+				struct rte_fbarray *arr = &msl->memseg_arr;
+				int idx = rte_fbarray_find_next_free(arr, 0);
+
+				while (idx >= 0) {
+					void *va = (void*)((char*)msl->base_va + idx * msl->page_sz);
+					munmap(va, msl->page_sz);
+					idx = rte_fbarray_find_next_free(arr, idx + 1);
+				}
+			}
 			continue;
+		}
 		/* this is an unused list, deallocate it */
 		mem_sz = msl->len;
 		munmap(msl->base_va, mem_sz);