Bug 611 - deadlock in example vdpa app
Summary: deadlock in example vdpa app
Status: CONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: examples (show other bugs)
Version: 20.08
Hardware: x86 Linux
: Normal normal
Target Milestone: ---
Assignee: Maxime Coquelin
URL:
Depends on:
Blocks: 628
  Show dependency tree
 
Reported: 2021-01-12 13:41 CET by xianghao
Modified: 2021-03-09 05:00 CET (History)
2 users (show)



Attachments

Description xianghao 2021-01-12 13:41:52 CET
Recently we see a deadlock in vdpa example app if SIGTERM signal is sent at early start stage.After debuging, we suspect it's due to vhost_user.mutex is requested twice,once by start sequence start_vdpa(), the other by SIGTERM signal handler which calling close_vdpa().

Also checked other example apps, seems they all have same issue.
Comment 1 Maxime Coquelin 2021-01-28 15:38:13 CET
Hi, thanks for the report.

Any chance that once the deadlock happens, you attach with GDB
and dump the backatraces of the threads?

Maxime
Comment 2 Maxime Coquelin 2021-01-28 17:18:43 CET
A backtrace is not needed, I confirm the issue.

This is problematic that we detach the vDPA device and unregister the Vhost port in
the signal handler, as several mutexes and spinlocks could potentially be held by 
the interrupted thread.

It seems also that some other examples are impacted, and more importantly, testpmd 
seems also impacted. Indeed, pmd_test_exit() gets called, which turns out to call 
dev_stop() and dev_close() ethdev ops that almost all take spinlock or mutexes.
Comment 3 xianghao 2021-02-20 11:17:59 CET
(In reply to Maxime Coquelin from comment #2)
> A backtrace is not needed, I confirm the issue.
> 
> This is problematic that we detach the vDPA device and unregister the Vhost
> port in
> the signal handler, as several mutexes and spinlocks could potentially be
> held by 
> the interrupted thread.
> 
> It seems also that some other examples are impacted, and more importantly,
> testpmd 
> seems also impacted. Indeed, pmd_test_exit() gets called, which turns out to
> call 
> dev_stop() and dev_close() ethdev ops that almost all take spinlock or
> mutexes.

When the signal is received, it is not processed immediately, we just record the signal value. In the main function, after start_vdpa, we process this signal according to the recorded signal value. This may solve the deadlock problem.
--- main.c	2021-02-21 16:21:45.041985623 +0800
+++ main.c	2021-02-21 18:02:14.500870588 +0800
@@ -41,6 +41,7 @@
 static int devcnt;
 static int interactive;
 static int client_mode;
+static int signal;
 
 /* display usage */
 static void
@@ -236,9 +237,8 @@
 signal_handler(int signum)
 {
 	if (signum == SIGINT || signum == SIGTERM) {
-		printf("\nSignal %d received, preparing to exit...\n", signum);
-		vdpa_sample_quit();
-		exit(0);
+            printf("\nSignal %d received, preparing to exit...\n", signum);
+            signal = 1;
 	}
 }
 
@@ -572,6 +572,10 @@
 					printf("%c", ch);
 			}
 			printf("enter \'q\' to quit\n");
+                        if (signal == 1) {
+                            vdpa_sample_quit();
+                            exit(0);
+                        }
 		}
 		vdpa_sample_quit();
 	}
Comment 4 xianghao 2021-02-20 11:24:24 CET
(In reply to Maxime Coquelin from comment #2)
> A backtrace is not needed, I confirm the issue.
> 
> This is problematic that we detach the vDPA device and unregister the Vhost
> port in
> the signal handler, as several mutexes and spinlocks could potentially be
> held by 
> the interrupted thread.
> 
> It seems also that some other examples are impacted, and more importantly,
> testpmd 
> seems also impacted. Indeed, pmd_test_exit() gets called, which turns out to
> call 
> dev_stop() and dev_close() ethdev ops that almost all take spinlock or
> mutexes.

When the signal is received, it is not processed immediately, we just record the "signal" flag. In the main function, after start_vdpa, we handle this signal according to the recorded "signal" flag. This may solve the deadlock problem.
I think that testpmd can also use this method to solve the problem of lock competition.

examples/vdpa/main.c
--- main.c	2021-02-21 16:21:45.041985623 +0800
+++ main.c	2021-02-21 18:02:14.500870588 +0800
@@ -41,6 +41,7 @@
 static int devcnt;
 static int interactive;
 static int client_mode;
+static int signal;
 
 /* display usage */
 static void
@@ -236,9 +237,8 @@
 signal_handler(int signum)
 {
 	if (signum == SIGINT || signum == SIGTERM) {
-		printf("\nSignal %d received, preparing to exit...\n", signum);
-		vdpa_sample_quit();
-		exit(0);
+            printf("\nSignal %d received, preparing to exit...\n", signum);
+            signal = 1;
 	}
 }
 
@@ -572,6 +572,10 @@
 					printf("%c", ch);
 			}
 			printf("enter \'q\' to quit\n");
+                        if (signal == 1) {
+                            vdpa_sample_quit();
+                            exit(0);
+                        }
 		}
 		vdpa_sample_quit();
 	}

Note You need to log in before you can comment on or make changes to this bug.