[v8,7/7] igb_uio: fix uio release issue for hotplug

Message ID 1531220607-2977-8-git-send-email-jia.guo@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series hotplug failure handle mechanism |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Guo, Jia July 10, 2018, 11:03 a.m. UTC
  When hotplug out device, the device resource will be released in kernel.
The fd sys file will disappear, and the irq will be released. At this time,
if igb uio driver still try to release this resource, it will cause kernel
crash. On the other hand, interrupt disabling do not automatically be
processed in kernel. If not handle it, this redundancy and dirty thing will
affect the interrupt resource be used by other device. So the igb_uio
driver have to check the hotplug status, and the corresponding process
should be taken in igb uio driver.

This patch propose to add enum rte_udev_state into struct rte_uio_pci_dev
of igb uio driver, which will record the state of uio device, such as
probed/opened/released/removed. When detect the unexpected removal which
cause of hotplug out behavior, it will corresponding disable interrupt
resource. For the part of releasement which kernel have already handle,
just skip it to avoid double free or null pointer crash issue.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v8->v7:
change enum of udev state, refine code to release udev resource
---
 kernel/linux/igb_uio/igb_uio.c | 69 +++++++++++++++++++++++++++++++++---------
 1 file changed, 55 insertions(+), 14 deletions(-)
  

Comments

Stephen Hemminger July 10, 2018, 9:48 p.m. UTC | #1
On Tue, 10 Jul 2018 19:03:27 +0800
Jeff Guo <jia.guo@intel.com> wrote:

>  
> +/* uio pci device state */
> +enum rte_udev_state {
> +	RTE_UDEV_PROBED,
> +	RTE_UDEV_OPENNED,
> +	RTE_UDEV_RELEASED,
> +	RTE_UDEV_REMOVED,
> +};
> +

The states here are a little confusing. especially since pci_release
seems to take different actions based on the state. And there is nothing
preventing races between unexpected removal (PCI), and removing the
device from being used by igb_uio.

Would it be possible to only use state variable from the kernel PCI
layer where the value is consistent?

Also there is refcounting in PCI layer (and locking). Could that
be used instead?
  
Stephen Hemminger July 10, 2018, 9:52 p.m. UTC | #2
On Tue, 10 Jul 2018 19:03:27 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> When hotplug out device, the device resource will be released in kernel.
> The fd sys file will disappear, and the irq will be released. At this time,
> if igb uio driver still try to release this resource, it will cause kernel
> crash. On the other hand, interrupt disabling do not automatically be
> processed in kernel. If not handle it, this redundancy and dirty thing will
> affect the interrupt resource be used by other device. So the igb_uio
> driver have to check the hotplug status, and the corresponding process
> should be taken in igb uio driver.
> 
> This patch propose to add enum rte_udev_state into struct rte_uio_pci_dev
> of igb uio driver, which will record the state of uio device, such as
> probed/opened/released/removed. When detect the unexpected removal which
> cause of hotplug out behavior, it will corresponding disable interrupt
> resource. For the part of releasement which kernel have already handle,
> just skip it to avoid double free or null pointer crash issue.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>

The PCI hotplug management is an important and potentially error prone
error of DPDK.

I realize that English is not your native language, but the commit messages
for this are hard to read. Perhaps you can get a volunteer or other person
in the community to reword them. The commit logs and comments contain
important information about the documentation of the code.

How does VFIO handle hotplug? We should direct all users to use VFIO
since it is supported and secure. Igb uio has always been a slightly
dangerous (as they say "running with scissors") way of accessing devices.
  
Guo, Jia July 11, 2018, 2:46 a.m. UTC | #3
On 7/11/2018 5:52 AM, Stephen Hemminger wrote:
> On Tue, 10 Jul 2018 19:03:27 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> When hotplug out device, the device resource will be released in kernel.
>> The fd sys file will disappear, and the irq will be released. At this time,
>> if igb uio driver still try to release this resource, it will cause kernel
>> crash. On the other hand, interrupt disabling do not automatically be
>> processed in kernel. If not handle it, this redundancy and dirty thing will
>> affect the interrupt resource be used by other device. So the igb_uio
>> driver have to check the hotplug status, and the corresponding process
>> should be taken in igb uio driver.
>>
>> This patch propose to add enum rte_udev_state into struct rte_uio_pci_dev
>> of igb uio driver, which will record the state of uio device, such as
>> probed/opened/released/removed. When detect the unexpected removal which
>> cause of hotplug out behavior, it will corresponding disable interrupt
>> resource. For the part of releasement which kernel have already handle,
>> just skip it to avoid double free or null pointer crash issue.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> The PCI hotplug management is an important and potentially error prone
> error of DPDK.
>
> I realize that English is not your native language, but the commit messages
> for this are hard to read. Perhaps you can get a volunteer or other person
> in the community to reword them. The commit logs and comments contain
> important information about the documentation of the code.

yes, i think that it might not be the whole thing to let you confused 
except something specific. But definitely it is my task to let reviewer 
most easily know what i want to propose before they will ack it,
especial for some complex case. let's try my best for check my word. I 
am also planning to go to more native English country and great 
conference study, anyway to improve my word : )

> How does VFIO handle hotplug? We should direct all users to use VFIO
> since it is supported and secure. Igb uio has always been a slightly
> dangerous (as they say "running with scissors") way of accessing devices.

You exposure VFIO here to replace igo uio, maybe we should check if it 
is optional or not.
If we can fix all igb_uio issue it should be optional, if only vfio show 
stable we should go to vfio only.
What other else guy comment here?
  
Guo, Jia July 11, 2018, 3:10 a.m. UTC | #4
On 7/11/2018 5:48 AM, Stephen Hemminger wrote:
> On Tue, 10 Jul 2018 19:03:27 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>>   
>> +/* uio pci device state */
>> +enum rte_udev_state {
>> +	RTE_UDEV_PROBED,
>> +	RTE_UDEV_OPENNED,
>> +	RTE_UDEV_RELEASED,
>> +	RTE_UDEV_REMOVED,
>> +};
>> +
> The states here are a little confusing. especially since pci_release
> seems to take different actions based on the state. And there is nothing
> preventing races between unexpected removal (PCI), and removing the
> device from being used by igb_uio.

The states here just manage in igb uio, and only restore the status of 
igb uio.
And only the RTE_UDEV_REMOVED that the key status might be a highlight 
and process, it is means pci have been removed, then directly come to 
igb uio remove without go igb uio release at first,
the status is for hotplug out, need to do specific process. It will no 
affect the normal process, and for normal igb uio remove, udev be 
release, no any status need to restore.

> Would it be possible to only use state variable from the kernel PCI
> layer where the value is consistent?

The state which i only care here is hot remove state, i check that 
kobj->state_remove_uevent_sent could be use in igb uio,
except that still can not find a specific state of kernel pci to 
identify it. If we can find it, it should be best.
what's other comment from guys?

> Also there is refcounting in PCI layer (and locking). Could that
> be used instead?

It might be, but if state is enough here we might not considerate to use 
it. If i am missing anything, let me know.
  
Guo, Jia July 11, 2018, 10:01 a.m. UTC | #5
On 7/11/2018 10:46 AM, Jeff Guo wrote:
>
>
> On 7/11/2018 5:52 AM, Stephen Hemminger wrote:
>> On Tue, 10 Jul 2018 19:03:27 +0800
>> Jeff Guo <jia.guo@intel.com> wrote:
>>
>>> When hotplug out device, the device resource will be released in 
>>> kernel.
>>> The fd sys file will disappear, and the irq will be released. At 
>>> this time,
>>> if igb uio driver still try to release this resource, it will cause 
>>> kernel
>>> crash. On the other hand, interrupt disabling do not automatically be
>>> processed in kernel. If not handle it, this redundancy and dirty 
>>> thing will
>>> affect the interrupt resource be used by other device. So the igb_uio
>>> driver have to check the hotplug status, and the corresponding process
>>> should be taken in igb uio driver.
>>>
>>> This patch propose to add enum rte_udev_state into struct 
>>> rte_uio_pci_dev
>>> of igb uio driver, which will record the state of uio device, such as
>>> probed/opened/released/removed. When detect the unexpected removal 
>>> which
>>> cause of hotplug out behavior, it will corresponding disable interrupt
>>> resource. For the part of releasement which kernel have already handle,
>>> just skip it to avoid double free or null pointer crash issue.
>>>
>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> The PCI hotplug management is an important and potentially error prone
>> error of DPDK.
>>
>> I realize that English is not your native language, but the commit 
>> messages
>> for this are hard to read. Perhaps you can get a volunteer or other 
>> person
>> in the community to reword them. The commit logs and comments contain
>> important information about the documentation of the code.
>
> yes, i think that it might not be the whole thing to let you confused 
> except something specific. But definitely it is my task to let 
> reviewer most easily know what i want to propose before they will ack it,
> especial for some complex case. let's try my best for check my word. I 
> am also planning to go to more native English country and great 
> conference study, anyway to improve my word : )
>
>> How does VFIO handle hotplug? We should direct all users to use VFIO
>> since it is supported and secure. Igb uio has always been a slightly
>> dangerous (as they say "running with scissors") way of accessing 
>> devices.
>
> You exposure VFIO here to replace igo uio, maybe we should check if it 
> is optional or not.
> If we can fix all igb_uio issue it should be optional, if only vfio 
> show stable we should go to vfio only.
> What other else guy comment here?

Plus, the pci vfio hotplug enabling is still on the next plan, since the 
different framework between vifo and uio for uevent process.
Per vfio, when user space control  it will holding the resource so the 
uevent will be blocked to sent out from kernel.
Anyway that should be another story. We hope this patch set just fix 
hotplug failure issue for igb uio case.
  

Patch

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index 3398eac..d126371 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -19,6 +19,14 @@ 
 
 #include "compat.h"
 
+/* uio pci device state */
+enum rte_udev_state {
+	RTE_UDEV_PROBED,
+	RTE_UDEV_OPENNED,
+	RTE_UDEV_RELEASED,
+	RTE_UDEV_REMOVED,
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -28,6 +36,7 @@  struct rte_uio_pci_dev {
 	enum rte_intr_mode mode;
 	struct mutex lock;
 	int refcnt;
+	enum rte_udev_state state;
 };
 
 static int wc_activate;
@@ -309,6 +318,17 @@  igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
+/* Unmap previously ioremap'd resources */
+static void
+igbuio_pci_release_iomem(struct uio_info *info)
+{
+	int i;
+
+	for (i = 0; i < MAX_UIO_MAPS; i++) {
+		if (info->mem[i].internal_addr)
+			iounmap(info->mem[i].internal_addr);
+	}
+}
 
 /**
  * This gets called while opening uio device file.
@@ -331,20 +351,35 @@  igbuio_pci_open(struct uio_info *info, struct inode *inode)
 
 	/* enable interrupts */
 	err = igbuio_pci_enable_interrupts(udev);
-	mutex_unlock(&udev->lock);
 	if (err) {
 		dev_err(&dev->dev, "Enable interrupt fails\n");
+		pci_clear_master(dev);
+		mutex_unlock(&udev->lock);
 		return err;
 	}
+	udev->state = RTE_UDEV_OPENNED;
+	mutex_unlock(&udev->lock);
 	return 0;
 }
 
+/**
+ * This gets called while closing uio device file.
+ */
 static int
 igbuio_pci_release(struct uio_info *info, struct inode *inode)
 {
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	if (udev->state == RTE_UDEV_REMOVED) {
+		mutex_destroy(&udev->lock);
+		igbuio_pci_release_iomem(&udev->info);
+		pci_disable_device(dev);
+		pci_set_drvdata(dev, NULL);
+		kfree(udev);
+		return 0;
+	}
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
@@ -356,7 +391,7 @@  igbuio_pci_release(struct uio_info *info, struct inode *inode)
 
 	/* stop the device from further DMA */
 	pci_clear_master(dev);
-
+	udev->state = RTE_UDEV_RELEASED;
 	mutex_unlock(&udev->lock);
 	return 0;
 }
@@ -414,18 +449,6 @@  igbuio_pci_setup_ioport(struct pci_dev *dev, struct uio_info *info,
 	return 0;
 }
 
-/* Unmap previously ioremap'd resources */
-static void
-igbuio_pci_release_iomem(struct uio_info *info)
-{
-	int i;
-
-	for (i = 0; i < MAX_UIO_MAPS; i++) {
-		if (info->mem[i].internal_addr)
-			iounmap(info->mem[i].internal_addr);
-	}
-}
-
 static int
 igbuio_setup_bars(struct pci_dev *dev, struct uio_info *info)
 {
@@ -562,6 +585,9 @@  igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 			 (unsigned long long)map_dma_addr, map_addr);
 	}
 
+	mutex_lock(&udev->lock);
+	udev->state = RTE_UDEV_PROBED;
+	mutex_unlock(&udev->lock);
 	return 0;
 
 fail_remove_group:
@@ -579,6 +605,21 @@  static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
 	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
+	struct pci_dev *pdev = udev->pdev;
+	int ret;
+
+	/* handle unexpected removal */
+	if (udev->state == RTE_UDEV_OPENNED ||
+	    (&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
+		dev_notice(&dev->dev, "Unexpected removal!\n");
+		ret = igbuio_pci_release(&udev->info, NULL);
+		if (ret)
+			return;
+		mutex_lock(&udev->lock);
+		udev->state = RTE_UDEV_REMOVED;
+		mutex_unlock(&udev->lock);
+		return;
+	}
 
 	mutex_destroy(&udev->lock);
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);