Bug 725 - hugepages.py script broken
Summary: hugepages.py script broken
Status: CONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: other (show other bugs)
Version: unspecified
Hardware: All All
: Normal normal
Target Milestone: ---
Assignee: dev
URL:
Depends on:
Blocks:
 
Reported: 2021-06-03 17:08 CEST by Harry van Haaren (Intel)
Modified: 2021-06-08 17:59 CEST (History)
1 user (show)



Attachments

Description Harry van Haaren (Intel) 2021-06-03 17:08:52 CEST
Hi Folks,

It seems like the usertools/dpdk-hugepages.py script has changed/broken for certain systems. I've applied the below patch to see why it was calling "sys.exit()", which provides the following output;

$ usertools/dpdk-hugepages.py -p1G --setup 2G
num pages 0 and pages 0
num pages 1 and pages 0
Unable to reserve required pages.
num pages 0 and pages 0
num pages 2 and pages 0
Unable to reserve required pages.
num pages 4 and pages 2
Unable to reserve required pages.
num pages 2 and pages 2

As we can see, it attempts multiple things, and then succeeds (DPDK runs and allocs hugepages as required after this).

Today the behavior will "fail" on the first "unable to reserve required pages" as 1 != 0, and does *not* continue attempting other pages sizes/things.

This check and "sys.exit()" was introduce in this commit: b25f0a7df80b620bab09dcb34bf4547d31ddede1

I'm not familiar with this script/hugepage reservation, so don't know what's a good fix. No owner of this script in MAINTAINERS either.


diff --git a/usertools/dpdk-hugepages.py b/usertools/dpdk-hugepages.py
index fb368b6933..0ef667c5f9 100755
--- a/usertools/dpdk-hugepages.py
+++ b/usertools/dpdk-hugepages.py
@@ -68,8 +68,10 @@ def set_hugepages(path, pages):
     except FileNotFoundError:
         sys.exit("Invalid page size. Valid page sizes: {}".format(
                  get_valid_page_sizes(path)))
-    if get_hugepages(path) != pages:
-        sys.exit('Unable to reserve required pages.')
+    num_pages = get_hugepages(path)
+    print("num pages {0} and pages {1}".format(num_pages, pages))
+    if num_pages != pages:
+        print('Unable to reserve required pages.')
Comment 1 Thomas Monjalon 2021-06-03 18:06:02 CEST
Your patch does not match what is in the latest version.
Which version are you testing?

What are the multiple trials?
I see a loop to allocate on all nodes,
but any failure should be fatal in this case.

Please describe your setup, the command line and the issue.
Comment 2 Harry van Haaren (Intel) 2021-06-03 18:20:55 CEST
Testing main branch at latest commit "version: 21.08-rc0" now.  (Output above was from older branch - apologies for mixing up, but same issue persists here).
Setup is a 2x NUMA system, with lots of memory so 2G per socket isn't the issue :)

I'm not sure what the "trials" are either, but it seems to attempt node0 first, then node 1, and then node0 again (succeeding the 3rd time).
As before, without this patch DPDK fails on no hugepages, with the patch it succeeds.

Now with output from latest main branch (with below diff applied);

BEFORE:
$ ./usertools/dpdk-hugepages.py -p1G --setup 2G
Unable to set pages (1 instead of 0 in /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages).
$ # dpdk fails to launch after this

AFTER (with patch):

$ ./usertools/dpdk-hugepages.py -p1G --setup 2G
Unable to set pages (1 instead of 0 in /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages).
Unable to set pages (2 instead of 0 in /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages).
Unable to set pages (3 instead of 2 in /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages).
$ # dpdk happy to launch now

Patch:

diff --git a/usertools/dpdk-hugepages.py b/usertools/dpdk-hugepages.py
index 4fdb199744..71aa11e7d4 100755
--- a/usertools/dpdk-hugepages.py
+++ b/usertools/dpdk-hugepages.py
@@ -70,7 +70,7 @@ def set_hugepages(path, reqpages):
                  get_valid_page_sizes(path)))
     gotpages = get_hugepages(path)
     if gotpages != reqpages:
-        sys.exit('Unable to set pages ({} instead of {} in {}).'.format(
+        print('Unable to set pages ({} instead of {} in {}).'.format(
                  gotpages, reqpages, filename))
Comment 3 Thomas Monjalon 2021-06-03 18:42:39 CEST
As you can see, the failures are because of pages not being cleared (0 requested).
It would be interesting to understand why some pages refuse to be cleared,
The script can be improved with 2 checks:
1/ If the request is 0, we can continue on failure (no exit)
2/ If the result is more than the request, we should warn but not exit.
Comment 4 Stephen Hemminger 2021-06-03 21:36:12 CEST
On Thu, 03 Jun 2021 15:08:52 +0000
bugzilla@dpdk.org wrote:

> https://bugs.dpdk.org/show_bug.cgi?id=725
> 
>             Bug ID: 725
>            Summary: hugepages.py script broken
>            Product: DPDK
>            Version: unspecified
>           Hardware: All
>                 OS: All
>             Status: UNCONFIRMED
>           Severity: normal
>           Priority: Normal
>          Component: other
>           Assignee: dev@dpdk.org
>           Reporter: harry.van.haaren@intel.com
>   Target Milestone: ---
> 
> Hi Folks,
> 
> It seems like the usertools/dpdk-hugepages.py script has changed/broken for
> certain systems. I've applied the below patch to see why it was calling
> "sys.exit()", which provides the following output;
> 
> $ usertools/dpdk-hugepages.py -p1G --setup 2G
> num pages 0 and pages 0
> num pages 1 and pages 0
> Unable to reserve required pages.
> num pages 0 and pages 0
> num pages 2 and pages 0
> Unable to reserve required pages.
> num pages 4 and pages 2
> Unable to reserve required pages.
> num pages 2 and pages 2
> 
> As we can see, it attempts multiple things, and then succeeds (DPDK runs and
> allocs hugepages as required after this).
> 
> Today the behavior will "fail" on the first "unable to reserve required
> pages"
> as 1 != 0, and does *not* continue attempting other pages sizes/things.
> 
> This check and "sys.exit()" was introduce in this commit:
> b25f0a7df80b620bab09dcb34bf4547d31ddede1
> 
> I'm not familiar with this script/hugepage reservation, so don't know what's
> a
> good fix. No owner of this script in MAINTAINERS either.
> 
> 
> diff --git a/usertools/dpdk-hugepages.py b/usertools/dpdk-hugepages.py
> index fb368b6933..0ef667c5f9 100755
> --- a/usertools/dpdk-hugepages.py
> +++ b/usertools/dpdk-hugepages.py
> @@ -68,8 +68,10 @@ def set_hugepages(path, pages):
>      except FileNotFoundError:
>          sys.exit("Invalid page size. Valid page sizes: {}".format(
>                   get_valid_page_sizes(path)))
> -    if get_hugepages(path) != pages:
> -        sys.exit('Unable to reserve required pages.')
> +    num_pages = get_hugepages(path)
> +    print("num pages {0} and pages {1}".format(num_pages, pages))
> +    if num_pages != pages:
> +        print('Unable to reserve required pages.')


If the number of hugepages is incorrect, please keep the exit call.
It is not correct to return success if there is an error in setup.
Comment 5 Harry van Haaren (Intel) 2021-06-08 17:59:06 CEST
Thomas; I'm not sure its failing either, and I'm not sure what next steps are to debug. I can try find somebody to look specifically at this machine, but I suspect its not just this machine that has the issue.

Stephen, agreed, of course we shouldn't return OK if there was an actual error. (I'm not proposing to actually merge the above diff - it merely helps/shows some debug info.)

Note You need to log in before you can comment on or make changes to this bug.