bus/pci: fix unexpected resource mapping override

Message ID 20180903084005.29706-1-qi.z.zhang@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series bus/pci: fix unexpected resource mapping override |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Qi Zhang Sept. 3, 2018, 8:40 a.m. UTC
  When scanning an already plugged device, the virtual address
of mapped PCI resource in rte_pci_device will be overridden
with 0, that may cause driver does not work correctly.
The fix is not to update any rte_pci_device's field if the being
scanned device's driver is already probed.

Bugzilla ID: 85
Fixes: c752998b5e2e ("pci: introduce library and driver")
Cc: stable@dpdk.org

Reported-by: Lv Geoffrey <geoffrey.lv@gmail.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 drivers/bus/pci/linux/pci.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)
  

Comments

Thomas Monjalon Sept. 26, 2018, 1:50 p.m. UTC | #1
Hi,

03/09/2018 10:40, Qi Zhang:
> When scanning an already plugged device, the virtual address
> of mapped PCI resource in rte_pci_device will be overridden
> with 0, that may cause driver does not work correctly.

Why is it overridden with 0?
Can we try to fix the root cause?

> The fix is not to update any rte_pci_device's field if the being
> scanned device's driver is already probed.

I am going to send a patch to have a safer way of checking
whether a device is probed or not.
If this change is really needed, I suggest to rebase it on top my patch.
  
Qi Zhang Sept. 29, 2018, 6:43 a.m. UTC | #2
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Wednesday, September 26, 2018 9:51 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit,
> Ferruh <ferruh.yigit@intel.com>; geoffrey.lv@gmail.com;
> ajit.khaparde@broadcom.com; stable@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] bus/pci: fix unexpected resource mapping
> override
> 
> Hi,
> 
> 03/09/2018 10:40, Qi Zhang:
> > When scanning an already plugged device, the virtual address of mapped
> > PCI resource in rte_pci_device will be overridden with 0, that may
> > cause driver does not work correctly.
> 
> Why is it overridden with 0?
> Can we try to fix the root cause?

From my view this is place to fix the issue: "scan an already probed device will corrupt the PCI resource map"
Another option is "to prevent scan an already probed device", this can be implemented by adding some check before bus->scan in rte_dev_hotplug_add but I'm not prefer for this solution, because it's better to keep bus->scan's independency.

> 
> > The fix is not to update any rte_pci_device's field if the being
> > scanned device's driver is already probed.
> 
> I am going to send a patch to have a safer way of checking whether a device is
> probed or not.
> If this change is really needed, I suggest to rebase it on top my patch.

OK, I will rebase on it by using rte_dev_is_probed.

Regards
Qi
>
  
Thomas Monjalon Sept. 30, 2018, 8:52 a.m. UTC | #3
29/09/2018 08:43, Zhang, Qi Z:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 
> > Hi,
> > 
> > 03/09/2018 10:40, Qi Zhang:
> > > When scanning an already plugged device, the virtual address of mapped
> > > PCI resource in rte_pci_device will be overridden with 0, that may
> > > cause driver does not work correctly.
> > 
> > Why is it overridden with 0?
> > Can we try to fix the root cause?
> 
> From my view this is place to fix the issue: "scan an already probed device will corrupt the PCI resource map"
> Another option is "to prevent scan an already probed device", this can be implemented by adding some check before bus->scan in rte_dev_hotplug_add but I'm not prefer for this solution, because it's better to keep bus->scan's independency.

I don't understand why we are currently changing an already scanned device
in pci_scan_one.
We could check the PCI address is known at the beginning and stop here,
even before allocating a new rte_pci_device.
Why trying to override with this memmove?
  
Qi Zhang Oct. 3, 2018, 1:03 p.m. UTC | #4
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Sunday, September 30, 2018 4:53 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit,
> Ferruh <ferruh.yigit@intel.com>; geoffrey.lv@gmail.com;
> ajit.khaparde@broadcom.com; stable@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] bus/pci: fix unexpected resource mapping
> override
> 
> 29/09/2018 08:43, Zhang, Qi Z:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > >
> > > Hi,
> > >
> > > 03/09/2018 10:40, Qi Zhang:
> > > > When scanning an already plugged device, the virtual address of
> > > > mapped PCI resource in rte_pci_device will be overridden with 0,
> > > > that may cause driver does not work correctly.
> > >
> > > Why is it overridden with 0?
> > > Can we try to fix the root cause?
> >
> > From my view this is place to fix the issue: "scan an already probed device
> will corrupt the PCI resource map"
> > Another option is "to prevent scan an already probed device", this can be
> implemented by adding some check before bus->scan in rte_dev_hotplug_add
> but I'm not prefer for this solution, because it's better to keep bus->scan's
> independency.
> 
> I don't understand why we are currently changing an already scanned device in
> pci_scan_one.

OK, this need to be figured out, due to hotplug, bus scan is unpredictable, so between two scans, something could happen
device maybe be unbound then bound with a different driver, or vf number is changed, so device information need to be updated.
but I'm not sure if resource mapping address ( read from /sys/bus/pci/devices/<pci addr>/resource) is possible be changed or not. 
Though I don't have evidence that it is possible, but the patch respect the original assumption that it is possible, so I keep memmove here.
But I will not be surprise if It should be removed and the assumption is not correct.  

> We could check the PCI address is known at the beginning and stop here, even
> before allocating a new rte_pci_device.

Yes we could check this at beginning, which means we need to figure out the rte_device by a pci address , then call rte_dev_is_probed(dev), I think that require another iterate on rte_pci_bus->device_list.
So, the benefit of a lazy check is we could merge this iterate with the iterate for device information update, and I don't have strong option for both options

> Why trying to override with this memmove?

comment in my first segment.
>
  

Patch

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..b94eb7401 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -348,11 +348,35 @@  pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 			if (ret < 0) {
 				rte_pci_insert_device(dev2, dev);
 			} else { /* already registered */
-				dev2->kdrv = dev->kdrv;
-				dev2->max_vfs = dev->max_vfs;
-				pci_name_set(dev2);
-				memmove(dev2->mem_resource, dev->mem_resource,
-					sizeof(dev->mem_resource));
+				if (dev2->driver == NULL) {
+					dev2->kdrv = dev->kdrv;
+					dev2->max_vfs = dev->max_vfs;
+					pci_name_set(dev2);
+					memmove(dev2->mem_resource,
+						dev->mem_resource,
+						sizeof(dev->mem_resource));
+				} else {
+					/**
+					 * If device is plugged and driver is
+					 * probed already, we don't need to do
+					 * anything here. (This happens when we
+					 * call rte_eal_hotplug_add)
+					 */
+					if (dev2->kdrv != dev->kdrv ||
+						dev2->max_vfs != dev->max_vfs)
+						/*
+						 * This should not happens.
+						 * But it is still possible if
+						 * we unbind a device from
+						 * vfio or uio before hotplug
+						 * remove and rebind it with
+						 * a different configure.
+						 * So we just print out the
+						 * error as an alarm.
+						 */
+						RTE_LOG(ERR, EAL, "Unexpected device scan at %s!\n",
+							filename);
+				}
 				free(dev);
 			}
 			return 0;