Recently we see a deadlock in vdpa example app if SIGTERM signal is sent at early start stage.After debuging, we suspect it's due to vhost_user.mutex is requested twice,once by start sequence start_vdpa(), the other by SIGTERM signal handler which calling close_vdpa(). Also checked other example apps, seems they all have same issue.
Hi, thanks for the report. Any chance that once the deadlock happens, you attach with GDB and dump the backatraces of the threads? Maxime
A backtrace is not needed, I confirm the issue. This is problematic that we detach the vDPA device and unregister the Vhost port in the signal handler, as several mutexes and spinlocks could potentially be held by the interrupted thread. It seems also that some other examples are impacted, and more importantly, testpmd seems also impacted. Indeed, pmd_test_exit() gets called, which turns out to call dev_stop() and dev_close() ethdev ops that almost all take spinlock or mutexes.
(In reply to Maxime Coquelin from comment #2) > A backtrace is not needed, I confirm the issue. > > This is problematic that we detach the vDPA device and unregister the Vhost > port in > the signal handler, as several mutexes and spinlocks could potentially be > held by > the interrupted thread. > > It seems also that some other examples are impacted, and more importantly, > testpmd > seems also impacted. Indeed, pmd_test_exit() gets called, which turns out to > call > dev_stop() and dev_close() ethdev ops that almost all take spinlock or > mutexes. When the signal is received, it is not processed immediately, we just record the signal value. In the main function, after start_vdpa, we process this signal according to the recorded signal value. This may solve the deadlock problem. --- main.c 2021-02-21 16:21:45.041985623 +0800 +++ main.c 2021-02-21 18:02:14.500870588 +0800 @@ -41,6 +41,7 @@ static int devcnt; static int interactive; static int client_mode; +static int signal; /* display usage */ static void @@ -236,9 +237,8 @@ signal_handler(int signum) { if (signum == SIGINT || signum == SIGTERM) { - printf("\nSignal %d received, preparing to exit...\n", signum); - vdpa_sample_quit(); - exit(0); + printf("\nSignal %d received, preparing to exit...\n", signum); + signal = 1; } } @@ -572,6 +572,10 @@ printf("%c", ch); } printf("enter \'q\' to quit\n"); + if (signal == 1) { + vdpa_sample_quit(); + exit(0); + } } vdpa_sample_quit(); }
(In reply to Maxime Coquelin from comment #2) > A backtrace is not needed, I confirm the issue. > > This is problematic that we detach the vDPA device and unregister the Vhost > port in > the signal handler, as several mutexes and spinlocks could potentially be > held by > the interrupted thread. > > It seems also that some other examples are impacted, and more importantly, > testpmd > seems also impacted. Indeed, pmd_test_exit() gets called, which turns out to > call > dev_stop() and dev_close() ethdev ops that almost all take spinlock or > mutexes. When the signal is received, it is not processed immediately, we just record the "signal" flag. In the main function, after start_vdpa, we handle this signal according to the recorded "signal" flag. This may solve the deadlock problem. I think that testpmd can also use this method to solve the problem of lock competition. examples/vdpa/main.c --- main.c 2021-02-21 16:21:45.041985623 +0800 +++ main.c 2021-02-21 18:02:14.500870588 +0800 @@ -41,6 +41,7 @@ static int devcnt; static int interactive; static int client_mode; +static int signal; /* display usage */ static void @@ -236,9 +237,8 @@ signal_handler(int signum) { if (signum == SIGINT || signum == SIGTERM) { - printf("\nSignal %d received, preparing to exit...\n", signum); - vdpa_sample_quit(); - exit(0); + printf("\nSignal %d received, preparing to exit...\n", signum); + signal = 1; } } @@ -572,6 +572,10 @@ printf("%c", ch); } printf("enter \'q\' to quit\n"); + if (signal == 1) { + vdpa_sample_quit(); + exit(0); + } } vdpa_sample_quit(); }