[dpdk-dev,v2] test: fix debug autotest with eal cleanup addition

Message ID 1517336769-18052-1-git-send-email-harry.van.haaren@intel.com (mailing list archive)
State Accepted, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Van Haaren, Harry Jan. 30, 2018, 6:26 p.m. UTC
  Before this patch, the debug_autotest would call fork(),
call rte_panic() or rte_exit() in the child process, and
examine the return code to verify that rte_panic() and
rte_exit() were correctly reporting failures.

With the inclusion of the rte_eal_cleanup() patch, rte_exit()
was modified to cleanly tear-down EAL allocations. Currently
only one library (service cores) is allocated by EAL at startup
and should be cleaned up. This library has a check on a normal
(non-hugepage) variable to protect against double cleanup. The
service cores finalize() function itself frees back hugepage mem.

Given the fork() approach from the unit test, and the fact that
the double-free check is on an ordinary variable, causes multiple
child processed (fork()-ed from the unit-test runner) to attempt
to free the huge-page memory multiple times. The variable to
protect against double-cleanup was not effective, as the fork()
would restore it to show initialized in the next child.

The solution is to call rte_service_finalize() *before* calling
fork(), which results in the service cores double-cleanup variable
to be zero before the fork(), and hence the child processes never
free the hugepage service-cores memory (correct behavior, as the
unit-test suite is still running, and owns the hugepages).

Fixes: aec9c13c5257 ("eal: add function to release internal resources")

Reported-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>

---

v2:
- Fix 2 typo/spello mistakes in commit message

Cc: thomas@monjalon.net

Please consider for including in RC2 as this fixes the
currently failing debug_autotest.

---
 test/test/test_debug.c | 6 ++++++
 1 file changed, 6 insertions(+)
  

Comments

Thomas Monjalon Jan. 30, 2018, 11:53 p.m. UTC | #1
30/01/2018 19:26, Harry van Haaren:
> Before this patch, the debug_autotest would call fork(),
> call rte_panic() or rte_exit() in the child process, and
> examine the return code to verify that rte_panic() and
> rte_exit() were correctly reporting failures.
> 
> With the inclusion of the rte_eal_cleanup() patch, rte_exit()
> was modified to cleanly tear-down EAL allocations. Currently
> only one library (service cores) is allocated by EAL at startup
> and should be cleaned up. This library has a check on a normal
> (non-hugepage) variable to protect against double cleanup. The
> service cores finalize() function itself frees back hugepage mem.
> 
> Given the fork() approach from the unit test, and the fact that
> the double-free check is on an ordinary variable, causes multiple
> child processed (fork()-ed from the unit-test runner) to attempt
> to free the huge-page memory multiple times. The variable to
> protect against double-cleanup was not effective, as the fork()
> would restore it to show initialized in the next child.
> 
> The solution is to call rte_service_finalize() *before* calling
> fork(), which results in the service cores double-cleanup variable
> to be zero before the fork(), and hence the child processes never
> free the hugepage service-cores memory (correct behavior, as the
> unit-test suite is still running, and owns the hugepages).
> 
> Fixes: aec9c13c5257 ("eal: add function to release internal resources")
> 
> Reported-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>

Applied, thanks
  
Ananyev, Konstantin Jan. 31, 2018, 1:53 p.m. UTC | #2
Hi Harry,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Harry van Haaren
> Sent: Tuesday, January 30, 2018 6:26 PM
> To: dev@dpdk.org
> Cc: pbhagavatula@caviumnetworks.com; Van Haaren, Harry <harry.van.haaren@intel.com>; thomas@monjalon.net
> Subject: [dpdk-dev] [PATCH v2] test: fix debug autotest with eal cleanup addition
> 
> Before this patch, the debug_autotest would call fork(),
> call rte_panic() or rte_exit() in the child process, and
> examine the return code to verify that rte_panic() and
> rte_exit() were correctly reporting failures.
> 
> With the inclusion of the rte_eal_cleanup() patch, rte_exit()
> was modified to cleanly tear-down EAL allocations. Currently
> only one library (service cores) is allocated by EAL at startup
> and should be cleaned up. This library has a check on a normal
> (non-hugepage) variable to protect against double cleanup. The
> service cores finalize() function itself frees back hugepage mem.
> 
> Given the fork() approach from the unit test, and the fact that
> the double-free check is on an ordinary variable, causes multiple
> child processed (fork()-ed from the unit-test runner) to attempt
> to free the huge-page memory multiple times. The variable to
> protect against double-cleanup was not effective, as the fork()
> would restore it to show initialized in the next child.
> 
> The solution is to call rte_service_finalize() *before* calling
> fork(), which results in the service cores double-cleanup variable
> to be zero before the fork(), and hence the child processes never
> free the hugepage service-cores memory (correct behavior, as the
> unit-test suite is still running, and owns the hugepages).

Ok, you fixed it in UT, but what to do other apps that use fork()?
Let say our examples/multi_process/l2fwd_fork uses fork() to
spawn child processes instead of threads.
Might be some generic way is needed: let say at fork time setup some
global to indicate that it is a child process and it shouldn't call rte_finalize() or so.
Konstantin


> 
> Fixes: aec9c13c5257 ("eal: add function to release internal resources")
> 
> Reported-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> 
> ---
> 
> v2:
> - Fix 2 typo/spello mistakes in commit message
> 
> Cc: thomas@monjalon.net
> 
> Please consider for including in RC2 as this fixes the
> currently failing debug_autotest.
> 
> ---
>  test/test/test_debug.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/test/test/test_debug.c b/test/test/test_debug.c
> index dd0de44..faf2cf5 100644
> --- a/test/test/test_debug.c
> +++ b/test/test/test_debug.c
> @@ -10,6 +10,7 @@
>  #include <rte_debug.h>
>  #include <rte_common.h>
>  #include <rte_eal.h>
> +#include <rte_service_component.h>
> 
>  #include "test.h"
> 
> @@ -50,6 +51,11 @@ test_exit_val(int exit_val)
>  	int pid;
>  	int status;
> 
> +	/* manually cleanup EAL memory, as the fork() below would otherwise
> +	 * cause the same hugepages to be free()-ed multiple times.
> +	 */
> +	rte_service_finalize();
> +
>  	pid = fork();
> 
>  	if (pid == 0)
> --
> 2.7.4
  
Thomas Monjalon Jan. 31, 2018, 2:31 p.m. UTC | #3
31/01/2018 14:53, Ananyev, Konstantin:
> Hi Harry,
> 
> From: Harry van Haaren
> > 
> > Before this patch, the debug_autotest would call fork(),
> > call rte_panic() or rte_exit() in the child process, and
> > examine the return code to verify that rte_panic() and
> > rte_exit() were correctly reporting failures.
> > 
> > With the inclusion of the rte_eal_cleanup() patch, rte_exit()
> > was modified to cleanly tear-down EAL allocations. Currently
> > only one library (service cores) is allocated by EAL at startup
> > and should be cleaned up. This library has a check on a normal
> > (non-hugepage) variable to protect against double cleanup. The
> > service cores finalize() function itself frees back hugepage mem.
> > 
> > Given the fork() approach from the unit test, and the fact that
> > the double-free check is on an ordinary variable, causes multiple
> > child processed (fork()-ed from the unit-test runner) to attempt
> > to free the huge-page memory multiple times. The variable to
> > protect against double-cleanup was not effective, as the fork()
> > would restore it to show initialized in the next child.
> > 
> > The solution is to call rte_service_finalize() *before* calling
> > fork(), which results in the service cores double-cleanup variable
> > to be zero before the fork(), and hence the child processes never
> > free the hugepage service-cores memory (correct behavior, as the
> > unit-test suite is still running, and owns the hugepages).
> 
> Ok, you fixed it in UT, but what to do other apps that use fork()?
> Let say our examples/multi_process/l2fwd_fork uses fork() to
> spawn child processes instead of threads.
> Might be some generic way is needed: let say at fork time setup some
> global to indicate that it is a child process and it shouldn't call rte_finalize() or so.
> Konstantin

At first, we should discuss whether it is a good idea to support fork,
given that we have the "secondary process solution".

Then, if an improvement is needed, it should go in 18.05.
I think the fix in UT is good enough for 18.02.
  
Van Haaren, Harry Jan. 31, 2018, 2:54 p.m. UTC | #4
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Wednesday, January 31, 2018 2:32 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Van Haaren, Harry
> <harry.van.haaren@intel.com>
> Cc: dev@dpdk.org; pbhagavatula@caviumnetworks.com
> Subject: Re: [dpdk-dev] [PATCH v2] test: fix debug autotest with eal cleanup
> addition
> 
> 31/01/2018 14:53, Ananyev, Konstantin:
> > Hi Harry,
> >
> > From: Harry van Haaren
> > >
> > > Before this patch, the debug_autotest would call fork(),
> > > call rte_panic() or rte_exit() in the child process, and
> > > examine the return code to verify that rte_panic() and
> > > rte_exit() were correctly reporting failures.
> > >
> > > With the inclusion of the rte_eal_cleanup() patch, rte_exit()
> > > was modified to cleanly tear-down EAL allocations. Currently
> > > only one library (service cores) is allocated by EAL at startup
> > > and should be cleaned up. This library has a check on a normal
> > > (non-hugepage) variable to protect against double cleanup. The
> > > service cores finalize() function itself frees back hugepage mem.
> > >
> > > Given the fork() approach from the unit test, and the fact that
> > > the double-free check is on an ordinary variable, causes multiple
> > > child processed (fork()-ed from the unit-test runner) to attempt
> > > to free the huge-page memory multiple times. The variable to
> > > protect against double-cleanup was not effective, as the fork()
> > > would restore it to show initialized in the next child.
> > >
> > > The solution is to call rte_service_finalize() *before* calling
> > > fork(), which results in the service cores double-cleanup variable
> > > to be zero before the fork(), and hence the child processes never
> > > free the hugepage service-cores memory (correct behavior, as the
> > > unit-test suite is still running, and owns the hugepages).
> >
> > Ok, you fixed it in UT, but what to do other apps that use fork()?
> > Let say our examples/multi_process/l2fwd_fork uses fork() to
> > spawn child processes instead of threads.
> > Might be some generic way is needed: let say at fork time setup some
> > global to indicate that it is a child process and it shouldn't call
> rte_finalize() or so.
> > Konstantin

Valid concerns, the issue gets complex when we mix shared resources
and fork() multiple processes, threads etc.


> At first, we should discuss whether it is a good idea to support fork,
> given that we have the "secondary process solution".
> 
> Then, if an improvement is needed, it should go in 18.05.
> I think the fix in UT is good enough for 18.02.

Agreed, and I'd prefer not rush changes here given
the complexity and multitude of use-cases.
  

Patch

diff --git a/test/test/test_debug.c b/test/test/test_debug.c
index dd0de44..faf2cf5 100644
--- a/test/test/test_debug.c
+++ b/test/test/test_debug.c
@@ -10,6 +10,7 @@ 
 #include <rte_debug.h>
 #include <rte_common.h>
 #include <rte_eal.h>
+#include <rte_service_component.h>
 
 #include "test.h"
 
@@ -50,6 +51,11 @@  test_exit_val(int exit_val)
 	int pid;
 	int status;
 
+	/* manually cleanup EAL memory, as the fork() below would otherwise
+	 * cause the same hugepages to be free()-ed multiple times.
+	 */
+	rte_service_finalize();
+
 	pid = fork();
 
 	if (pid == 0)