config: reduce memory requirements for DPDK

Message ID 40cf48703f5fae8af8c31dcc8a1a1ecb0b151d27.1532426170.git.anatoly.burakov@intel.com (mailing list archive)
State Rejected, archived
Delegated to: Thomas Monjalon
Headers
Series config: reduce memory requirements for DPDK |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Anatoly Burakov July 24, 2018, 10:03 a.m. UTC
  It has been reported that current memory limitations do not work
well on an 8-socket machines in default configuration when big
page sizes are used [1].

Fix it by reducing memory amount reserved by DPDK by default to
32G per page size per NUMA node. This translates to allowing us
to reserve 32G per page size per NUMA node on 8 nodes with 2
page sizes.

[1] https://mails.dpdk.org/archives/dev/2018-July/108071.html

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    We could have increased CONFIG_RTE_MAX_MEM_MB but this would've
    brought other potential problems due to increased memory
    preallocation, and secondary process initialization is flaky
    enough as it is. I am willing to bet that 32G per page size is
    more than enough for the majority of use cases, and any
    application with bigger requirements could adjust config options
    itself.

 config/common_base | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Thomas Monjalon July 24, 2018, 10:23 a.m. UTC | #1
24/07/2018 12:03, Anatoly Burakov:
> It has been reported that current memory limitations do not work
> well on an 8-socket machines in default configuration when big
> page sizes are used [1].
> 
> Fix it by reducing memory amount reserved by DPDK by default to
> 32G per page size per NUMA node. This translates to allowing us
> to reserve 32G per page size per NUMA node on 8 nodes with 2
> page sizes.
> 
> [1] https://mails.dpdk.org/archives/dev/2018-July/108071.html
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> 
> Notes:
>     We could have increased CONFIG_RTE_MAX_MEM_MB but this would've
>     brought other potential problems due to increased memory
>     preallocation, and secondary process initialization is flaky
>     enough as it is. I am willing to bet that 32G per page size is
>     more than enough for the majority of use cases, and any
>     application with bigger requirements could adjust config options
>     itself.
[...]
> -CONFIG_RTE_MAX_MEMSEG_PER_TYPE=32768
> -CONFIG_RTE_MAX_MEM_MB_PER_TYPE=131072
> +CONFIG_RTE_MAX_MEMSEG_PER_TYPE=16384
> +CONFIG_RTE_MAX_MEM_MB_PER_TYPE=32768

Ideally, it should be a run-time option.
  
Anatoly Burakov July 24, 2018, 11:04 a.m. UTC | #2
On 24-Jul-18 11:23 AM, Thomas Monjalon wrote:
> 24/07/2018 12:03, Anatoly Burakov:
>> It has been reported that current memory limitations do not work
>> well on an 8-socket machines in default configuration when big
>> page sizes are used [1].
>>
>> Fix it by reducing memory amount reserved by DPDK by default to
>> 32G per page size per NUMA node. This translates to allowing us
>> to reserve 32G per page size per NUMA node on 8 nodes with 2
>> page sizes.
>>
>> [1] https://mails.dpdk.org/archives/dev/2018-July/108071.html
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>
>> Notes:
>>      We could have increased CONFIG_RTE_MAX_MEM_MB but this would've
>>      brought other potential problems due to increased memory
>>      preallocation, and secondary process initialization is flaky
>>      enough as it is. I am willing to bet that 32G per page size is
>>      more than enough for the majority of use cases, and any
>>      application with bigger requirements could adjust config options
>>      itself.
> [...]
>> -CONFIG_RTE_MAX_MEMSEG_PER_TYPE=32768
>> -CONFIG_RTE_MAX_MEM_MB_PER_TYPE=131072
>> +CONFIG_RTE_MAX_MEMSEG_PER_TYPE=16384
>> +CONFIG_RTE_MAX_MEM_MB_PER_TYPE=32768
> 
> Ideally, it should be a run-time option.
> 

It can be, yes, and this can be worked on for next release. However, we 
also need to have good default values that work across all supported 
platforms.
  
Thomas Monjalon July 24, 2018, 12:03 p.m. UTC | #3
24/07/2018 13:04, Burakov, Anatoly:
> On 24-Jul-18 11:23 AM, Thomas Monjalon wrote:
> > 24/07/2018 12:03, Anatoly Burakov:
> >> It has been reported that current memory limitations do not work
> >> well on an 8-socket machines in default configuration when big
> >> page sizes are used [1].
> >>
> >> Fix it by reducing memory amount reserved by DPDK by default to
> >> 32G per page size per NUMA node. This translates to allowing us
> >> to reserve 32G per page size per NUMA node on 8 nodes with 2
> >> page sizes.
> >>
> >> [1] https://mails.dpdk.org/archives/dev/2018-July/108071.html
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> >> ---
> >>
> >> Notes:
> >>      We could have increased CONFIG_RTE_MAX_MEM_MB but this would've
> >>      brought other potential problems due to increased memory
> >>      preallocation, and secondary process initialization is flaky
> >>      enough as it is. I am willing to bet that 32G per page size is
> >>      more than enough for the majority of use cases, and any
> >>      application with bigger requirements could adjust config options
> >>      itself.
> > [...]
> >> -CONFIG_RTE_MAX_MEMSEG_PER_TYPE=32768
> >> -CONFIG_RTE_MAX_MEM_MB_PER_TYPE=131072
> >> +CONFIG_RTE_MAX_MEMSEG_PER_TYPE=16384
> >> +CONFIG_RTE_MAX_MEM_MB_PER_TYPE=32768
> > 
> > Ideally, it should be a run-time option.
> > 
> 
> It can be, yes, and this can be worked on for next release. However, we 
> also need to have good default values that work across all supported 
> platforms.

Yes sure, we can wait the next release for a run-time option.

How can we be sure these default values are good enough?
It would be good to have several acks from various projects or companies.
  
Kevin Traynor July 25, 2018, 5:43 p.m. UTC | #4
On 07/24/2018 01:03 PM, Thomas Monjalon wrote:
> 24/07/2018 13:04, Burakov, Anatoly:
>> On 24-Jul-18 11:23 AM, Thomas Monjalon wrote:
>>> 24/07/2018 12:03, Anatoly Burakov:
>>>> It has been reported that current memory limitations do not work
>>>> well on an 8-socket machines in default configuration when big
>>>> page sizes are used [1].
>>>>
>>>> Fix it by reducing memory amount reserved by DPDK by default to
>>>> 32G per page size per NUMA node. This translates to allowing us
>>>> to reserve 32G per page size per NUMA node on 8 nodes with 2
>>>> page sizes.
>>>>
>>>> [1] https://mails.dpdk.org/archives/dev/2018-July/108071.html
>>>>
>>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>> ---
>>>>
>>>> Notes:
>>>>      We could have increased CONFIG_RTE_MAX_MEM_MB but this would've
>>>>      brought other potential problems due to increased memory
>>>>      preallocation, and secondary process initialization is flaky
>>>>      enough as it is. I am willing to bet that 32G per page size is
>>>>      more than enough for the majority of use cases, and any
>>>>      application with bigger requirements could adjust config options
>>>>      itself.
>>> [...]
>>>> -CONFIG_RTE_MAX_MEMSEG_PER_TYPE=32768
>>>> -CONFIG_RTE_MAX_MEM_MB_PER_TYPE=131072
>>>> +CONFIG_RTE_MAX_MEMSEG_PER_TYPE=16384
>>>> +CONFIG_RTE_MAX_MEM_MB_PER_TYPE=32768
>>>
>>> Ideally, it should be a run-time option.
>>>
>>
>> It can be, yes, and this can be worked on for next release. However, we 
>> also need to have good default values that work across all supported 
>> platforms.
> 
> Yes sure, we can wait the next release for a run-time option.
> 
> How can we be sure these default values are good enough?

Why add a new limitation? Why not take the other approach that was
suggested of increasing the max possible memory?

If there is new limitations or backwards compatibility issues with
default settings compared with before the large memory management
rework, then it would be good to have that clear in the docs at a high
level for the users who want to update.

It would also help a lot to add what the implications and limits for
changing the most important defines are - will it be slower? will it not
work above X? etc.

> It would be good to have several acks from various projects or companies.
> 
>
  
Anatoly Burakov July 26, 2018, 9:51 a.m. UTC | #5
On 25-Jul-18 6:43 PM, Kevin Traynor wrote:
> On 07/24/2018 01:03 PM, Thomas Monjalon wrote:
>> 24/07/2018 13:04, Burakov, Anatoly:
>>> On 24-Jul-18 11:23 AM, Thomas Monjalon wrote:
>>>> 24/07/2018 12:03, Anatoly Burakov:
>>>>> It has been reported that current memory limitations do not work
>>>>> well on an 8-socket machines in default configuration when big
>>>>> page sizes are used [1].
>>>>>
>>>>> Fix it by reducing memory amount reserved by DPDK by default to
>>>>> 32G per page size per NUMA node. This translates to allowing us
>>>>> to reserve 32G per page size per NUMA node on 8 nodes with 2
>>>>> page sizes.
>>>>>
>>>>> [1] https://mails.dpdk.org/archives/dev/2018-July/108071.html
>>>>>
>>>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>> ---
>>>>>
>>>>> Notes:
>>>>>       We could have increased CONFIG_RTE_MAX_MEM_MB but this would've
>>>>>       brought other potential problems due to increased memory
>>>>>       preallocation, and secondary process initialization is flaky
>>>>>       enough as it is. I am willing to bet that 32G per page size is
>>>>>       more than enough for the majority of use cases, and any
>>>>>       application with bigger requirements could adjust config options
>>>>>       itself.
>>>> [...]
>>>>> -CONFIG_RTE_MAX_MEMSEG_PER_TYPE=32768
>>>>> -CONFIG_RTE_MAX_MEM_MB_PER_TYPE=131072
>>>>> +CONFIG_RTE_MAX_MEMSEG_PER_TYPE=16384
>>>>> +CONFIG_RTE_MAX_MEM_MB_PER_TYPE=32768
>>>>
>>>> Ideally, it should be a run-time option.
>>>>
>>>
>>> It can be, yes, and this can be worked on for next release. However, we
>>> also need to have good default values that work across all supported
>>> platforms.
>>
>> Yes sure, we can wait the next release for a run-time option.
>>
>> How can we be sure these default values are good enough?
> 
> Why add a new limitation? Why not take the other approach that was
> suggested of increasing the max possible memory?

The commit notes explain that :) Basically, increasing total amount of 
allocate-able memory increases risk of secondary processes not working 
due to inability to map segments at the same addresses. Granted, the 
"usual" case of running DPDK on a 1- or 2-socket machine with 1- or 2- 
pagesizes will not be affected by increase in total amounts of memory, 
so things will stay as they are.

However, reducing the memory requirements will reduce the VA space 
consumption for what i perceive to be most common case (under 32G per 
page size per NUMA node) thereby improving secondary process experience, 
while still enabling 8 NUMA nodes with two page sizes to work on default 
settings.

> 
> If there is new limitations or backwards compatibility issues with
> default settings compared with before the large memory management
> rework, then it would be good to have that clear in the docs at a high
> level for the users who want to update.

Agreed, i will follow this patch up with doc updates.

> 
> It would also help a lot to add what the implications and limits for
> changing the most important defines are - will it be slower? will it not
> work above X? etc.

The main impact is amount of VA-contiguous memory you can have in DPDK, 
and amount of total memory you can have in DPDK. How that memory works 
(slower, faster etc.) is not affected.

So, for example, if you really needed a one VA-contiguous memzone of 20 
gigabytes - yes, this change would affect you. However, i suspect this 
is not a common case, and given that you've went that far, increasing 
memory limits would not be such a big deal anyway.

> 
>> It would be good to have several acks from various projects or companies.
>>
>>
> 
> 
> 
> 
> 
>
  

Patch

diff --git a/config/common_base b/config/common_base
index 201cdf698..78a644fb2 100644
--- a/config/common_base
+++ b/config/common_base
@@ -71,8 +71,8 @@  CONFIG_RTE_MAX_MEM_MB_PER_LIST=32768
 # over multiple lists of RTE_MAX_MEMSEG_PER_LIST pages), or
 # RTE_MAX_MEM_MB_PER_TYPE megabytes of memory (split over multiple lists of
 # RTE_MAX_MEM_MB_PER_LIST), whichever is smaller
-CONFIG_RTE_MAX_MEMSEG_PER_TYPE=32768
-CONFIG_RTE_MAX_MEM_MB_PER_TYPE=131072
+CONFIG_RTE_MAX_MEMSEG_PER_TYPE=16384
+CONFIG_RTE_MAX_MEM_MB_PER_TYPE=32768
 # global maximum usable amount of VA, in megabytes
 CONFIG_RTE_MAX_MEM_MB=524288
 CONFIG_RTE_MAX_MEMZONE=2560