[v2] vhost: fix vid allocation race

Message ID 20210201084844.2434-1-hepeng.0320@bytedance.com (mailing list archive)
State Accepted, archived
Delegated to: Maxime Coquelin
Headers
Series [v2] vhost: fix vid allocation race |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-testing warning Testing issues

Commit Message

贺鹏 Feb. 1, 2021, 8:48 a.m. UTC
  vhost_new_device might be called in different threads at the same time.
thread 1(config thread)
            rte_vhost_driver_start
               ->vhost_user_start_client
                   ->vhost_user_add_connection
                     -> vhost_new_device

thread 2(vhost-events)
	vhost_user_read_cb
           ->vhost_user_msg_handler (return value < 0)
             -> vhost_user_start_client
                 -> vhost_new_device

So there could be a case that a same vid has been allocated twice, or
some vid might be lost in DPDK lib however still held by the upper
applications.

Another place where race would happen is at the func *vhost_destroy_device*,
but after a detailed investigation, the race does not exist as long as
no two devices have the same vid: Calling vhost_destroy_devices in
different threads with different vids is actually safe.

Fixes: a277c715987 ("vhost: refactor code structure")
Reported-by: Peng He <hepeng.0320@bytedance.com>
Signed-off-by: Fei Chen <chenwei.0515@bytedance.com>
Reviewed-by: Zhihong Wang <wangzhihong.wzh@bytedance.com>
---
 lib/librte_vhost/vhost.c | 6 ++++++
 1 file changed, 6 insertions(+)
  

Comments

Chenbo Xia Feb. 3, 2021, 2:44 a.m. UTC | #1
Hi Peng,

> -----Original Message-----
> From: Peng He <xnhp0320@gmail.com>
> Sent: Monday, February 1, 2021 4:49 PM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: stable@dpdk.org
> Subject: [PATCH v2] vhost: fix vid allocation race
> 
> vhost_new_device might be called in different threads at the same time.
> thread 1(config thread)
>             rte_vhost_driver_start
>                ->vhost_user_start_client
>                    ->vhost_user_add_connection
>                      -> vhost_new_device
> 
> thread 2(vhost-events)
> 	vhost_user_read_cb
>            ->vhost_user_msg_handler (return value < 0)
>              -> vhost_user_start_client
>                  -> vhost_new_device
> 
> So there could be a case that a same vid has been allocated twice, or
> some vid might be lost in DPDK lib however still held by the upper
> applications.
> 
> Another place where race would happen is at the func *vhost_destroy_device*,
> but after a detailed investigation, the race does not exist as long as
> no two devices have the same vid: Calling vhost_destroy_devices in
> different threads with different vids is actually safe.

I want to clarify another thing, vhost_destroy_device and get_device may have
a thread-safe problem. That is: vhost_user_read_cb() destroys the device while
app thread is calling vhost API (with get_device in it) to use that device.

A good thing is before vhost_user_read_cb() destroys the device, it notifies the app
thread, so vhost app should make sure it avoids this kind of problem. Otherwise we may
need to lock all places that uses the global vhost_devices, which is not good since
that will affect data path perf a lot.

Anyway, your patch fixes the specific problem you mentioned.

For this patch:

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

> 
> Fixes: a277c715987 ("vhost: refactor code structure")
> Reported-by: Peng He <hepeng.0320@bytedance.com>
> Signed-off-by: Fei Chen <chenwei.0515@bytedance.com>
> Reviewed-by: Zhihong Wang <wangzhihong.wzh@bytedance.com>
> ---
>  lib/librte_vhost/vhost.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index efb136edd1..52ab93d1ec 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -26,6 +26,7 @@
>  #include "vhost_user.h"
> 
>  struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
> +pthread_mutex_t vhost_dev_lock = PTHREAD_MUTEX_INITIALIZER;
> 
>  /* Called with iotlb_lock read-locked */
>  uint64_t
> @@ -645,6 +646,7 @@ vhost_new_device(void)
>  	struct virtio_net *dev;
>  	int i;
> 
> +	pthread_mutex_lock(&vhost_dev_lock);
>  	for (i = 0; i < MAX_VHOST_DEVICE; i++) {
>  		if (vhost_devices[i] == NULL)
>  			break;
> @@ -653,6 +655,7 @@ vhost_new_device(void)
>  	if (i == MAX_VHOST_DEVICE) {
>  		VHOST_LOG_CONFIG(ERR,
>  			"Failed to find a free slot for new device.\n");
> +		pthread_mutex_unlock(&vhost_dev_lock);
>  		return -1;
>  	}
> 
> @@ -660,10 +663,13 @@ vhost_new_device(void)
>  	if (dev == NULL) {
>  		VHOST_LOG_CONFIG(ERR,
>  			"Failed to allocate memory for new dev.\n");
> +		pthread_mutex_unlock(&vhost_dev_lock);
>  		return -1;
>  	}
> 
>  	vhost_devices[i] = dev;
> +	pthread_mutex_unlock(&vhost_dev_lock);
> +
>  	dev->vid = i;
>  	dev->flags = VIRTIO_DEV_BUILTIN_VIRTIO_NET;
>  	dev->slave_req_fd = -1;
> --
> 2.23.0
  
Maxime Coquelin Feb. 3, 2021, 5:21 p.m. UTC | #2
On 2/1/21 9:48 AM, Peng He wrote:
> vhost_new_device might be called in different threads at the same time.
> thread 1(config thread)
>             rte_vhost_driver_start
>                ->vhost_user_start_client
>                    ->vhost_user_add_connection
>                      -> vhost_new_device
> 
> thread 2(vhost-events)
> 	vhost_user_read_cb
>            ->vhost_user_msg_handler (return value < 0)
>              -> vhost_user_start_client
>                  -> vhost_new_device
> 
> So there could be a case that a same vid has been allocated twice, or
> some vid might be lost in DPDK lib however still held by the upper
> applications.
> 
> Another place where race would happen is at the func *vhost_destroy_device*,
> but after a detailed investigation, the race does not exist as long as
> no two devices have the same vid: Calling vhost_destroy_devices in
> different threads with different vids is actually safe.
> 
> Fixes: a277c715987 ("vhost: refactor code structure")
> Reported-by: Peng He <hepeng.0320@bytedance.com>
> Signed-off-by: Fei Chen <chenwei.0515@bytedance.com>
> Reviewed-by: Zhihong Wang <wangzhihong.wzh@bytedance.com>
> ---
>  lib/librte_vhost/vhost.c | 6 ++++++
>  1 file changed, 6 insertions(+)

Applied to dpdk-next-virtio/main.

Thanks,
Maxime
  

Patch

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index efb136edd1..52ab93d1ec 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -26,6 +26,7 @@ 
 #include "vhost_user.h"
 
 struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
+pthread_mutex_t vhost_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
 /* Called with iotlb_lock read-locked */
 uint64_t
@@ -645,6 +646,7 @@  vhost_new_device(void)
 	struct virtio_net *dev;
 	int i;
 
+	pthread_mutex_lock(&vhost_dev_lock);
 	for (i = 0; i < MAX_VHOST_DEVICE; i++) {
 		if (vhost_devices[i] == NULL)
 			break;
@@ -653,6 +655,7 @@  vhost_new_device(void)
 	if (i == MAX_VHOST_DEVICE) {
 		VHOST_LOG_CONFIG(ERR,
 			"Failed to find a free slot for new device.\n");
+		pthread_mutex_unlock(&vhost_dev_lock);
 		return -1;
 	}
 
@@ -660,10 +663,13 @@  vhost_new_device(void)
 	if (dev == NULL) {
 		VHOST_LOG_CONFIG(ERR,
 			"Failed to allocate memory for new dev.\n");
+		pthread_mutex_unlock(&vhost_dev_lock);
 		return -1;
 	}
 
 	vhost_devices[i] = dev;
+	pthread_mutex_unlock(&vhost_dev_lock);
+
 	dev->vid = i;
 	dev->flags = VIRTIO_DEV_BUILTIN_VIRTIO_NET;
 	dev->slave_req_fd = -1;