[2/3] vfio: fix to add handler lock for hot-unplug

Message ID 1541484436-91320-3-git-send-email-jia.guo@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series fix vfio hot-unplug issue |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Guo, Jia Nov. 6, 2018, 6:07 a.m. UTC
  This patch add hot-unplug handler lock and unlock in device request
handler when process bus and device resource, in order to avoid the
synchronization issue when device be hot-unplugged.

Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")
Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 drivers/bus/pci/linux/pci_vfio.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)
  

Comments

Matan Azrad Nov. 6, 2018, 6:23 a.m. UTC | #1
Hi Jeff

Can you detail more in the commit log that we can understand the synchronization problematic scenario. And how does this commit fix it? 


> -----Original Message-----
> From: Jeff Guo <jia.guo@intel.com>
> Sent: Tuesday, November 6, 2018 8:07 AM
> To: konstantin.ananyev@intel.com; anatoly.burakov@intel.com; Thomas
> Monjalon <thomas@monjalon.net>; bernard.iremonger@intel.com;
> jingjing.wu@intel.com; wenzhuo.lu@intel.com
> Cc: ferruh.yigit@intel.com; dev@dpdk.org; jia.guo@intel.com;
> helin.zhang@intel.com; Matan Azrad <matan@mellanox.com>;
> shaopeng.he@intel.com
> Subject: [PATCH 2/3] vfio: fix to add handler lock for hot-unplug
> 
> This patch add hot-unplug handler lock and unlock in device request handler
> when process bus and device resource, in order to avoid the synchronization
> issue when device be hot-unplugged.
> 
> Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
>  drivers/bus/pci/linux/pci_vfio.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
> index 305cc06..d2c8410 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -19,6 +19,7 @@
>  #include <rte_vfio.h>
>  #include <rte_eal.h>
>  #include <rte_bus.h>
> +#include <rte_spinlock.h>
> 
>  #include "eal_filesystem.h"
> 
> @@ -35,6 +36,14 @@
>   * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
>   */
> 
> +/*
> + * spinlock for device hot-unplug failure handling. If it try to access
> +bus or
> + * device, such as handle sigbus on bus or handle memory failure for
> +device
> + * just need to use this lock. It could protect the bus and the device
> +to avoid
> + * race condition.
> + */
> +static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
> +
>  #ifdef VFIO_PRESENT
> 
>  #ifndef PAGE_SIZE
> @@ -289,11 +298,12 @@ pci_vfio_req_handler(void *param)
>  	int ret;
>  	struct rte_device *device = (struct rte_device *)param;
> 
> +	rte_spinlock_lock(&failure_handle_lock);
>  	bus = rte_bus_find_by_device(device);
>  	if (bus == NULL) {
>  		RTE_LOG(ERR, EAL, "Cannot find bus for device (%s)\n",
>  			device->name);
> -		return;
> +		goto handle_end;
>  	}
> 
>  	/*
> @@ -306,6 +316,8 @@ pci_vfio_req_handler(void *param)
>  		RTE_LOG(ERR, EAL,
>  			"Can not handle hot-unplug for device (%s)\n",
>  			device->name);
> +handle_end:
> +	rte_spinlock_unlock(&failure_handle_lock);
>  }
> 
>  /* enable notifier (only enable req now) */
> --
> 2.7.4
  
Guo, Jia Nov. 7, 2018, 6:15 a.m. UTC | #2
On 11/6/2018 2:23 PM, Matan Azrad wrote:
> Hi Jeff
>
> Can you detail more in the commit log that we can understand the synchronization problematic scenario. And how does this commit fix it?
>

Please check my reply in the 1/3 mail. And explain more here is that, 
when device be hot-unplugged in vfio, the req notifier will invoked, 
then user space could release device resource in user space side,

then vfio check that the device be released out from the device group, 
it will take the device control again and trigger the device kernel 
release processing, at the mean time it will sent remove uevent to

user space. Here although the req handler seems will always process 
before uevent handler, but even for fast path and slow path protection 
of device accessing when device is removing , it should also be need.

what do you think about that?


>> -----Original Message-----
>> From: Jeff Guo <jia.guo@intel.com>
>> Sent: Tuesday, November 6, 2018 8:07 AM
>> To: konstantin.ananyev@intel.com; anatoly.burakov@intel.com; Thomas
>> Monjalon <thomas@monjalon.net>; bernard.iremonger@intel.com;
>> jingjing.wu@intel.com; wenzhuo.lu@intel.com
>> Cc: ferruh.yigit@intel.com; dev@dpdk.org; jia.guo@intel.com;
>> helin.zhang@intel.com; Matan Azrad <matan@mellanox.com>;
>> shaopeng.he@intel.com
>> Subject: [PATCH 2/3] vfio: fix to add handler lock for hot-unplug
>>
>> This patch add hot-unplug handler lock and unlock in device request handler
>> when process bus and device resource, in order to avoid the synchronization
>> issue when device be hot-unplugged.
>>
>> Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>>   drivers/bus/pci/linux/pci_vfio.c | 14 +++++++++++++-
>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
>> index 305cc06..d2c8410 100644
>> --- a/drivers/bus/pci/linux/pci_vfio.c
>> +++ b/drivers/bus/pci/linux/pci_vfio.c
>> @@ -19,6 +19,7 @@
>>   #include <rte_vfio.h>
>>   #include <rte_eal.h>
>>   #include <rte_bus.h>
>> +#include <rte_spinlock.h>
>>
>>   #include "eal_filesystem.h"
>>
>> @@ -35,6 +36,14 @@
>>    * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
>>    */
>>
>> +/*
>> + * spinlock for device hot-unplug failure handling. If it try to access
>> +bus or
>> + * device, such as handle sigbus on bus or handle memory failure for
>> +device
>> + * just need to use this lock. It could protect the bus and the device
>> +to avoid
>> + * race condition.
>> + */
>> +static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
>> +
>>   #ifdef VFIO_PRESENT
>>
>>   #ifndef PAGE_SIZE
>> @@ -289,11 +298,12 @@ pci_vfio_req_handler(void *param)
>>   	int ret;
>>   	struct rte_device *device = (struct rte_device *)param;
>>
>> +	rte_spinlock_lock(&failure_handle_lock);
>>   	bus = rte_bus_find_by_device(device);
>>   	if (bus == NULL) {
>>   		RTE_LOG(ERR, EAL, "Cannot find bus for device (%s)\n",
>>   			device->name);
>> -		return;
>> +		goto handle_end;
>>   	}
>>
>>   	/*
>> @@ -306,6 +316,8 @@ pci_vfio_req_handler(void *param)
>>   		RTE_LOG(ERR, EAL,
>>   			"Can not handle hot-unplug for device (%s)\n",
>>   			device->name);
>> +handle_end:
>> +	rte_spinlock_unlock(&failure_handle_lock);
>>   }
>>
>>   /* enable notifier (only enable req now) */
>> --
>> 2.7.4
  
Matan Azrad Nov. 8, 2018, 7:20 a.m. UTC | #3
From: Jeff Guo
> On 11/6/2018 2:23 PM, Matan Azrad wrote:
> > Hi Jeff
> >
> > Can you detail more in the commit log that we can understand the
> synchronization problematic scenario. And how does this commit fix it?
> >
> 
> Please check my reply in the 1/3 mail. And explain more here is that, when
> device be hot-unplugged in vfio, the req notifier will invoked, then user
> space could release device resource in user space side,
> 
> then vfio check that the device be released out from the device group, it will
> take the device control again and trigger the device kernel release
> processing, at the mean time it will sent remove uevent to
> 
> user space. Here although the req handler seems will always process before
> uevent handler, but even for fast path and slow path protection of device
> accessing when device is removing , it should also be need.
> 
> what do you think about that?

Just don't understanf hoe the fast\control-path can access to pci_vfio_req_handler...

> 
> 
> >> -----Original Message-----
> >> From: Jeff Guo <jia.guo@intel.com>
> >> Sent: Tuesday, November 6, 2018 8:07 AM
> >> To: konstantin.ananyev@intel.com; anatoly.burakov@intel.com; Thomas
> >> Monjalon <thomas@monjalon.net>; bernard.iremonger@intel.com;
> >> jingjing.wu@intel.com; wenzhuo.lu@intel.com
> >> Cc: ferruh.yigit@intel.com; dev@dpdk.org; jia.guo@intel.com;
> >> helin.zhang@intel.com; Matan Azrad <matan@mellanox.com>;
> >> shaopeng.he@intel.com
> >> Subject: [PATCH 2/3] vfio: fix to add handler lock for hot-unplug
> >>
> >> This patch add hot-unplug handler lock and unlock in device request
> >> handler when process bus and device resource, in order to avoid the
> >> synchronization issue when device be hot-unplugged.
> >>
> >> Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")
> >> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> >> ---
> >>   drivers/bus/pci/linux/pci_vfio.c | 14 +++++++++++++-
> >>   1 file changed, 13 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/bus/pci/linux/pci_vfio.c
> >> b/drivers/bus/pci/linux/pci_vfio.c
> >> index 305cc06..d2c8410 100644
> >> --- a/drivers/bus/pci/linux/pci_vfio.c
> >> +++ b/drivers/bus/pci/linux/pci_vfio.c
> >> @@ -19,6 +19,7 @@
> >>   #include <rte_vfio.h>
> >>   #include <rte_eal.h>
> >>   #include <rte_bus.h>
> >> +#include <rte_spinlock.h>
> >>
> >>   #include "eal_filesystem.h"
> >>
> >> @@ -35,6 +36,14 @@
> >>    * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
> >>    */
> >>
> >> +/*
> >> + * spinlock for device hot-unplug failure handling. If it try to
> >> +access bus or
> >> + * device, such as handle sigbus on bus or handle memory failure for
> >> +device
> >> + * just need to use this lock. It could protect the bus and the
> >> +device to avoid
> >> + * race condition.
> >> + */
> >> +static rte_spinlock_t failure_handle_lock =
> >> +RTE_SPINLOCK_INITIALIZER;
> >> +
> >>   #ifdef VFIO_PRESENT
> >>
> >>   #ifndef PAGE_SIZE
> >> @@ -289,11 +298,12 @@ pci_vfio_req_handler(void *param)
> >>   	int ret;
> >>   	struct rte_device *device = (struct rte_device *)param;
> >>
> >> +	rte_spinlock_lock(&failure_handle_lock);
> >>   	bus = rte_bus_find_by_device(device);
> >>   	if (bus == NULL) {
> >>   		RTE_LOG(ERR, EAL, "Cannot find bus for device (%s)\n",
> >>   			device->name);
> >> -		return;
> >> +		goto handle_end;
> >>   	}
> >>
> >>   	/*
> >> @@ -306,6 +316,8 @@ pci_vfio_req_handler(void *param)
> >>   		RTE_LOG(ERR, EAL,
> >>   			"Can not handle hot-unplug for device (%s)\n",
> >>   			device->name);
> >> +handle_end:
> >> +	rte_spinlock_unlock(&failure_handle_lock);
> >>   }
> >>
> >>   /* enable notifier (only enable req now) */
> >> --
> >> 2.7.4
  
Guo, Jia Nov. 8, 2018, 8:30 a.m. UTC | #4
On 11/8/2018 3:20 PM, Matan Azrad wrote:
>
> From: Jeff Guo
>> On 11/6/2018 2:23 PM, Matan Azrad wrote:
>>> Hi Jeff
>>>
>>> Can you detail more in the commit log that we can understand the
>> synchronization problematic scenario. And how does this commit fix it?
>> Please check my reply in the 1/3 mail. And explain more here is that, when
>> device be hot-unplugged in vfio, the req notifier will invoked, then user
>> space could release device resource in user space side,
>>
>> then vfio check that the device be released out from the device group, it will
>> take the device control again and trigger the device kernel release
>> processing, at the mean time it will sent remove uevent to
>>
>> user space. Here although the req handler seems will always process before
>> uevent handler, but even for fast path and slow path protection of device
>> accessing when device is removing , it should also be need.
>>
>> what do you think about that?
> Just don't understanf hoe the fast\control-path can access to pci_vfio_req_handler...


The pci_vfio_req_handler is register in eal interrupt thread when pci 
probe driver, so it should be access in the control path.

Since the sigbus handler will be enable when enable hotplug , if origin 
sigbus occur the sigbus handler be invoked,  in the handler it will 
process the bus and device.  So i think protection of the bus and device 
so that it will not cause race condition that should be need.


>>
>>>> -----Original Message-----
>>>> From: Jeff Guo <jia.guo@intel.com>
>>>> Sent: Tuesday, November 6, 2018 8:07 AM
>>>> To: konstantin.ananyev@intel.com; anatoly.burakov@intel.com; Thomas
>>>> Monjalon <thomas@monjalon.net>; bernard.iremonger@intel.com;
>>>> jingjing.wu@intel.com; wenzhuo.lu@intel.com
>>>> Cc: ferruh.yigit@intel.com; dev@dpdk.org; jia.guo@intel.com;
>>>> helin.zhang@intel.com; Matan Azrad <matan@mellanox.com>;
>>>> shaopeng.he@intel.com
>>>> Subject: [PATCH 2/3] vfio: fix to add handler lock for hot-unplug
>>>>
>>>> This patch add hot-unplug handler lock and unlock in device request
>>>> handler when process bus and device resource, in order to avoid the
>>>> synchronization issue when device be hot-unplugged.
>>>>
>>>> Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")
>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>>> ---
>>>>    drivers/bus/pci/linux/pci_vfio.c | 14 +++++++++++++-
>>>>    1 file changed, 13 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/bus/pci/linux/pci_vfio.c
>>>> b/drivers/bus/pci/linux/pci_vfio.c
>>>> index 305cc06..d2c8410 100644
>>>> --- a/drivers/bus/pci/linux/pci_vfio.c
>>>> +++ b/drivers/bus/pci/linux/pci_vfio.c
>>>> @@ -19,6 +19,7 @@
>>>>    #include <rte_vfio.h>
>>>>    #include <rte_eal.h>
>>>>    #include <rte_bus.h>
>>>> +#include <rte_spinlock.h>
>>>>
>>>>    #include "eal_filesystem.h"
>>>>
>>>> @@ -35,6 +36,14 @@
>>>>     * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
>>>>     */
>>>>
>>>> +/*
>>>> + * spinlock for device hot-unplug failure handling. If it try to
>>>> +access bus or
>>>> + * device, such as handle sigbus on bus or handle memory failure for
>>>> +device
>>>> + * just need to use this lock. It could protect the bus and the
>>>> +device to avoid
>>>> + * race condition.
>>>> + */
>>>> +static rte_spinlock_t failure_handle_lock =
>>>> +RTE_SPINLOCK_INITIALIZER;
>>>> +
>>>>    #ifdef VFIO_PRESENT
>>>>
>>>>    #ifndef PAGE_SIZE
>>>> @@ -289,11 +298,12 @@ pci_vfio_req_handler(void *param)
>>>>    	int ret;
>>>>    	struct rte_device *device = (struct rte_device *)param;
>>>>
>>>> +	rte_spinlock_lock(&failure_handle_lock);
>>>>    	bus = rte_bus_find_by_device(device);
>>>>    	if (bus == NULL) {
>>>>    		RTE_LOG(ERR, EAL, "Cannot find bus for device (%s)\n",
>>>>    			device->name);
>>>> -		return;
>>>> +		goto handle_end;
>>>>    	}
>>>>
>>>>    	/*
>>>> @@ -306,6 +316,8 @@ pci_vfio_req_handler(void *param)
>>>>    		RTE_LOG(ERR, EAL,
>>>>    			"Can not handle hot-unplug for device (%s)\n",
>>>>    			device->name);
>>>> +handle_end:
>>>> +	rte_spinlock_unlock(&failure_handle_lock);
>>>>    }
>>>>
>>>>    /* enable notifier (only enable req now) */
>>>> --
>>>> 2.7.4
  

Patch

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 305cc06..d2c8410 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -19,6 +19,7 @@ 
 #include <rte_vfio.h>
 #include <rte_eal.h>
 #include <rte_bus.h>
+#include <rte_spinlock.h>
 
 #include "eal_filesystem.h"
 
@@ -35,6 +36,14 @@ 
  * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
  */
 
+/*
+ * spinlock for device hot-unplug failure handling. If it try to access bus or
+ * device, such as handle sigbus on bus or handle memory failure for device
+ * just need to use this lock. It could protect the bus and the device to avoid
+ * race condition.
+ */
+static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
+
 #ifdef VFIO_PRESENT
 
 #ifndef PAGE_SIZE
@@ -289,11 +298,12 @@  pci_vfio_req_handler(void *param)
 	int ret;
 	struct rte_device *device = (struct rte_device *)param;
 
+	rte_spinlock_lock(&failure_handle_lock);
 	bus = rte_bus_find_by_device(device);
 	if (bus == NULL) {
 		RTE_LOG(ERR, EAL, "Cannot find bus for device (%s)\n",
 			device->name);
-		return;
+		goto handle_end;
 	}
 
 	/*
@@ -306,6 +316,8 @@  pci_vfio_req_handler(void *param)
 		RTE_LOG(ERR, EAL,
 			"Can not handle hot-unplug for device (%s)\n",
 			device->name);
+handle_end:
+	rte_spinlock_unlock(&failure_handle_lock);
 }
 
 /* enable notifier (only enable req now) */