[dpdk-dev] igb_uio: revert open and release operations

Message ID 20171017201436.65270-1-ferruh.yigit@intel.com (mailing list archive)
State Rejected, archived
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Ferruh Yigit Oct. 17, 2017, 8:14 p.m. UTC
  This reverts commit 6b9ed026a8704b9e5ee5da7997617ef7cc82e114.
This reverts commit 5f6ff30dc5075c49069d684bab229aef7ff0fdc3.
This reverts commit b58eedfc7dd57eef6d12e2c654a52c834f36084a.

There were bug reports about terminated application may leave device in
undesired state:
http://dpdk.org/ml/archives/dev/2016-November/049745.html
http://dpdk.org/ml/archives/dev/2016-November/050932.html

And a proposal to fix:
http://dpdk.org/ml/archives/dev/2016-December/051844.html

Later another proposal triggered the discussion:
http://dpdk.org/ml/archives/dev/2017-May/066317.html

Finally a fix patch pushed into v17.08:
Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")

Later a regression report sent related to the pushed patch:
http://dpdk.org/ml/archives/dev/2017-September/075236.html

And a fix for regression integrated into v17.11-rc1:
http://dpdk.org/ml/archives/dev/2017-October/079166.html
Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in VM")
Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17")

Even after the fix qede PMD reported to be broken:
http://dpdk.org/ml/archives/dev/2017-October/079359.html

So this patch reverts original fix and related commits. The related
igb_uio code part turns back to v17.05 base.

Cc: Jianfeng Tan <jianfeng.tan@intel.com>
Cc: Jingjing Wu <jingjing.wu@intel.com>
Cc: Shijith Thotton <shijith.thotton@caviumnetworks.com>
Cc: Gregory Etelson <gregory@weka.io>
Cc: Harish Patil <harish.patil@cavium.com>
Cc: George Prekas <george.prekas@epfl.ch>

Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
It would be nice to solve this issue in LTS release, but being close to
the release and the error report without details makes it hard to work
more on this issue.

Thanks everyone who spent effort for this, hopefully we can continue to
work on next release cycle.

Jingjing, there is a i40e commit, was part of igb_uio fix patchset, is
it generic, or needs to be reverted with this patch?
Commit: 8cacf78469a7 ("net/i40e: fix VF initialization error")
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 201 +++++++++++-------------------
 1 file changed, 76 insertions(+), 125 deletions(-)
  

Comments

Thomas Monjalon Oct. 17, 2017, 8:33 p.m. UTC | #1
17/10/2017 22:14, Ferruh Yigit:
> There were bug reports about terminated application may leave device in
> undesired state:
> http://dpdk.org/ml/archives/dev/2016-November/049745.html
> http://dpdk.org/ml/archives/dev/2016-November/050932.html
> 
> And a proposal to fix:
> http://dpdk.org/ml/archives/dev/2016-December/051844.html
> 
> Later another proposal triggered the discussion:
> http://dpdk.org/ml/archives/dev/2017-May/066317.html
> 
> Finally a fix patch pushed into v17.08:
> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")
> 
> Later a regression report sent related to the pushed patch:
> http://dpdk.org/ml/archives/dev/2017-September/075236.html
> 
> And a fix for regression integrated into v17.11-rc1:
> http://dpdk.org/ml/archives/dev/2017-October/079166.html
> Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in VM")
> Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17")
> 
> Even after the fix qede PMD reported to be broken:
> http://dpdk.org/ml/archives/dev/2017-October/079359.html
> 
> So this patch reverts original fix and related commits. The related
> igb_uio code part turns back to v17.05 base.
[...]
> ---
> It would be nice to solve this issue in LTS release, but being close to
> the release and the error report without details makes it hard to work
> more on this issue.

With this revert, we are back to the original issue.
We must really try the proposed solution.
Harish, please describe your issue and think how it could be fixed.
Jingjing made it work for i40e.
I know it is less effort to request a simple revert.
Please let's try to fix it once for all.
  
Jingjing Wu Oct. 18, 2017, 12:14 a.m. UTC | #2
Hi, Ferruh

This one is generic, we can keep it.

Commit: 8cacf78469a7 ("net/i40e: fix VF initialization error")

Thanks
Jingjing

> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Wednesday, October 18, 2017 4:15 AM
> To: Thomas Monjalon <thomas@monjalon.net>; Yigit, Ferruh
> <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; Shijith Thotton
> <shijith.thotton@caviumnetworks.com>; Gregory Etelson <gregory@weka.io>;
> Harish Patil <harish.patil@cavium.com>; George Prekas
> <george.prekas@epfl.ch>; stable@dpdk.org
> Subject: [PATCH] igb_uio: revert open and release operations
> 
> This reverts commit 6b9ed026a8704b9e5ee5da7997617ef7cc82e114.
> This reverts commit 5f6ff30dc5075c49069d684bab229aef7ff0fdc3.
> This reverts commit b58eedfc7dd57eef6d12e2c654a52c834f36084a.
> 
> There were bug reports about terminated application may leave device in
> undesired state:
> http://dpdk.org/ml/archives/dev/2016-November/049745.html
> http://dpdk.org/ml/archives/dev/2016-November/050932.html
> 
> And a proposal to fix:
> http://dpdk.org/ml/archives/dev/2016-December/051844.html
> 
> Later another proposal triggered the discussion:
> http://dpdk.org/ml/archives/dev/2017-May/066317.html
> 
> Finally a fix patch pushed into v17.08:
> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device
> file")
> 
> Later a regression report sent related to the pushed patch:
> http://dpdk.org/ml/archives/dev/2017-September/075236.html
> 
> And a fix for regression integrated into v17.11-rc1:
> http://dpdk.org/ml/archives/dev/2017-October/079166.html
> Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in VM")
> Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17")
> 
> Even after the fix qede PMD reported to be broken:
> http://dpdk.org/ml/archives/dev/2017-October/079359.html
> 
> So this patch reverts original fix and related commits. The related igb_uio code
> part turns back to v17.05 base.
> 
> Cc: Jianfeng Tan <jianfeng.tan@intel.com>
> Cc: Jingjing Wu <jingjing.wu@intel.com>
> Cc: Shijith Thotton <shijith.thotton@caviumnetworks.com>
> Cc: Gregory Etelson <gregory@weka.io>
> Cc: Harish Patil <harish.patil@cavium.com>
> Cc: George Prekas <george.prekas@epfl.ch>
> 
> Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> It would be nice to solve this issue in LTS release, but being close to the release
> and the error report without details makes it hard to work more on this issue.
> 
> Thanks everyone who spent effort for this, hopefully we can continue to work
> on next release cycle.
> 
> Jingjing, there is a i40e commit, was part of igb_uio fix patchset, is it generic,
> or needs to be reverted with this patch?
> Commit: 8cacf78469a7 ("net/i40e: fix VF initialization error")
> ---
>  lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 201 +++++++++++-------------------
>  1 file changed, 76 insertions(+), 125 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> index f7ef82554..786df68a2 100644
> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> @@ -49,6 +49,7 @@ struct rte_uio_pci_dev {
> 
>  static char *intr_mode;
>  static enum rte_intr_mode igbuio_intr_mode_preferred =
> RTE_INTR_MODE_MSIX;
> +
>  /* sriov sysfs */
>  static ssize_t
>  show_max_vfs(struct device *dev, struct device_attribute *attr, @@ -207,22
> +208,80 @@ igbuio_pci_irqcontrol(struct uio_info *info, s32 irq_state)
>   * If yes, disable it here and will be enable later.
>   */
>  static irqreturn_t
> -igbuio_pci_irqhandler(int irq, void *dev_id)
> +igbuio_pci_irqhandler(int irq, struct uio_info *info)
>  {
> -	struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id;
> -	struct uio_info *info = &udev->info;
> +	struct rte_uio_pci_dev *udev = info->priv;
> 
>  	/* Legacy mode need to mask in hardware */
>  	if (udev->mode == RTE_INTR_MODE_LEGACY &&
>  	    !pci_check_and_mask_intx(udev->pdev))
>  		return IRQ_NONE;
> 
> -	uio_event_notify(info);
> -
>  	/* Message signal mode, no share IRQ and automasked */
>  	return IRQ_HANDLED;
>  }
> 
> +/* Remap pci resources described by bar #pci_bar in uio resource n. */
> +static int igbuio_pci_setup_iomem(struct pci_dev *dev, struct uio_info
> +*info,
> +		       int n, int pci_bar, const char *name) {
> +	unsigned long addr, len;
> +	void *internal_addr;
> +
> +	if (n >= ARRAY_SIZE(info->mem))
> +		return -EINVAL;
> +
> +	addr = pci_resource_start(dev, pci_bar);
> +	len = pci_resource_len(dev, pci_bar);
> +	if (addr == 0 || len == 0)
> +		return -1;
> +	internal_addr = ioremap(addr, len);
> +	if (internal_addr == NULL)
> +		return -1;
> +	info->mem[n].name = name;
> +	info->mem[n].addr = addr;
> +	info->mem[n].internal_addr = internal_addr;
> +	info->mem[n].size = len;
> +	info->mem[n].memtype = UIO_MEM_PHYS;
> +	return 0;
> +}
> +
> +/* Get pci port io resources described by bar #pci_bar in uio resource
> +n. */ static int igbuio_pci_setup_ioport(struct pci_dev *dev, struct
> +uio_info *info,
> +		int n, int pci_bar, const char *name) {
> +	unsigned long addr, len;
> +
> +	if (n >= ARRAY_SIZE(info->port))
> +		return -EINVAL;
> +
> +	addr = pci_resource_start(dev, pci_bar);
> +	len = pci_resource_len(dev, pci_bar);
> +	if (addr == 0 || len == 0)
> +		return -EINVAL;
> +
> +	info->port[n].name = name;
> +	info->port[n].start = addr;
> +	info->port[n].size = len;
> +	info->port[n].porttype = UIO_PORT_X86;
> +
> +	return 0;
> +}
> +
> +/* Unmap previously ioremap'd resources */ static void
> +igbuio_pci_release_iomem(struct uio_info *info) {
> +	int i;
> +
> +	for (i = 0; i < MAX_UIO_MAPS; i++) {
> +		if (info->mem[i].internal_addr)
> +			iounmap(info->mem[i].internal_addr);
> +	}
> +}
> +
>  static int
>  igbuio_pci_enable_interrupts(struct rte_uio_pci_dev *udev)  { @@ -252,7
> +311,6 @@ igbuio_pci_enable_interrupts(struct rte_uio_pci_dev *udev)
>  			break;
>  		}
>  #endif
> -
>  	/* fall back to MSI */
>  	case RTE_INTR_MODE_MSI:
>  #ifndef HAVE_ALLOC_IRQ_VECTORS
> @@ -291,28 +349,15 @@ igbuio_pci_enable_interrupts(struct rte_uio_pci_dev
> *udev)
>  	default:
>  		dev_err(&udev->pdev->dev, "invalid IRQ mode %u",
>  			igbuio_intr_mode_preferred);
> -		udev->info.irq = UIO_IRQ_NONE;
>  		err = -EINVAL;
>  	}
> 
> -	if (udev->info.irq != UIO_IRQ_NONE)
> -		err = request_irq(udev->info.irq, igbuio_pci_irqhandler,
> -				  udev->info.irq_flags, udev->info.name,
> -				  udev);
> -	dev_info(&udev->pdev->dev, "uio device registered with irq %lx\n",
> -		 udev->info.irq);
> -
>  	return err;
>  }
> 
>  static void
>  igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)  {
> -	if (udev->info.irq) {
> -		free_irq(udev->info.irq, udev);
> -		udev->info.irq = 0;
> -	}
> -
>  #ifndef HAVE_ALLOC_IRQ_VECTORS
>  	if (udev->mode == RTE_INTR_MODE_MSIX)
>  		pci_disable_msix(udev->pdev);
> @@ -325,109 +370,6 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev
> *udev)  #endif  }
> 
> -
> -/**
> - * This gets called while opening uio device file.
> - */
> -static int
> -igbuio_pci_open(struct uio_info *info, struct inode *inode) -{
> -	struct rte_uio_pci_dev *udev = info->priv;
> -	struct pci_dev *dev = udev->pdev;
> -	int err;
> -
> -	pci_reset_function(dev);
> -
> -	/* set bus master, which was cleared by the reset function */
> -	pci_set_master(dev);
> -
> -	/* enable interrupts */
> -	err = igbuio_pci_enable_interrupts(udev);
> -	if (err) {
> -		dev_err(&dev->dev, "Enable interrupt fails\n");
> -		return err;
> -	}
> -	return 0;
> -}
> -
> -static int
> -igbuio_pci_release(struct uio_info *info, struct inode *inode) -{
> -	struct rte_uio_pci_dev *udev = info->priv;
> -	struct pci_dev *dev = udev->pdev;
> -
> -	/* disable interrupts */
> -	igbuio_pci_disable_interrupts(udev);
> -
> -	/* stop the device from further DMA */
> -	pci_clear_master(dev);
> -
> -	pci_reset_function(dev);
> -
> -	return 0;
> -}
> -
> -/* Remap pci resources described by bar #pci_bar in uio resource n. */ -static
> int -igbuio_pci_setup_iomem(struct pci_dev *dev, struct uio_info *info,
> -		       int n, int pci_bar, const char *name)
> -{
> -	unsigned long addr, len;
> -	void *internal_addr;
> -
> -	if (n >= ARRAY_SIZE(info->mem))
> -		return -EINVAL;
> -
> -	addr = pci_resource_start(dev, pci_bar);
> -	len = pci_resource_len(dev, pci_bar);
> -	if (addr == 0 || len == 0)
> -		return -1;
> -	internal_addr = ioremap(addr, len);
> -	if (internal_addr == NULL)
> -		return -1;
> -	info->mem[n].name = name;
> -	info->mem[n].addr = addr;
> -	info->mem[n].internal_addr = internal_addr;
> -	info->mem[n].size = len;
> -	info->mem[n].memtype = UIO_MEM_PHYS;
> -	return 0;
> -}
> -
> -/* Get pci port io resources described by bar #pci_bar in uio resource n. */ -
> static int -igbuio_pci_setup_ioport(struct pci_dev *dev, struct uio_info *info,
> -		int n, int pci_bar, const char *name)
> -{
> -	unsigned long addr, len;
> -
> -	if (n >= ARRAY_SIZE(info->port))
> -		return -EINVAL;
> -
> -	addr = pci_resource_start(dev, pci_bar);
> -	len = pci_resource_len(dev, pci_bar);
> -	if (addr == 0 || len == 0)
> -		return -EINVAL;
> -
> -	info->port[n].name = name;
> -	info->port[n].start = addr;
> -	info->port[n].size = len;
> -	info->port[n].porttype = UIO_PORT_X86;
> -
> -	return 0;
> -}
> -
> -/* Unmap previously ioremap'd resources */ -static void -
> igbuio_pci_release_iomem(struct uio_info *info) -{
> -	int i;
> -
> -	for (i = 0; i < MAX_UIO_MAPS; i++) {
> -		if (info->mem[i].internal_addr)
> -			iounmap(info->mem[i].internal_addr);
> -	}
> -}
> -
>  static int
>  igbuio_setup_bars(struct pci_dev *dev, struct uio_info *info)  { @@ -518,16
> +460,19 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id
> *id)
>  	/* fill uio infos */
>  	udev->info.name = "igb_uio";
>  	udev->info.version = "0.1";
> +	udev->info.handler = igbuio_pci_irqhandler;
>  	udev->info.irqcontrol = igbuio_pci_irqcontrol;
> -	udev->info.open = igbuio_pci_open;
> -	udev->info.release = igbuio_pci_release;
>  	udev->info.priv = udev;
>  	udev->pdev = dev;
> 
> -	err = sysfs_create_group(&dev->dev.kobj, &dev_attr_grp);
> +	err = igbuio_pci_enable_interrupts(udev);
>  	if (err != 0)
>  		goto fail_release_iomem;
> 
> +	err = sysfs_create_group(&dev->dev.kobj, &dev_attr_grp);
> +	if (err != 0)
> +		goto fail_disable_interrupts;
> +
>  	/* register uio driver */
>  	err = uio_register_device(&dev->dev, &udev->info);
>  	if (err != 0)
> @@ -535,6 +480,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct
> pci_device_id *id)
> 
>  	pci_set_drvdata(dev, udev);
> 
> +	dev_info(&dev->dev, "uio device registered with irq %lx\n",
> +		 udev->info.irq);
> +
>  	/*
>  	 * Doing a harmless dma mapping for attaching the device to
>  	 * the iommu identity mapping if kernel boots with iommu=pt.
> @@ -560,6 +508,8 @@ igbuio_pci_probe(struct pci_dev *dev, const struct
> pci_device_id *id)
> 
>  fail_remove_group:
>  	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
> +fail_disable_interrupts:
> +	igbuio_pci_disable_interrupts(udev);
>  fail_release_iomem:
>  	igbuio_pci_release_iomem(&udev->info);
>  	pci_disable_device(dev);
> @@ -576,6 +526,7 @@ igbuio_pci_remove(struct pci_dev *dev)
> 
>  	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
>  	uio_unregister_device(&udev->info);
> +	igbuio_pci_disable_interrupts(udev);
>  	igbuio_pci_release_iomem(&udev->info);
>  	pci_disable_device(dev);
>  	pci_set_drvdata(dev, NULL);
> --
> 2.13.6
  
Patil, Harish Oct. 18, 2017, 4:50 a.m. UTC | #3
-----Original Message-----
From: Thomas Monjalon <thomas@monjalon.net>

Date: Tuesday, October 17, 2017 at 1:33 PM
To: Ferruh Yigit <ferruh.yigit@intel.com>, Harish Patil
<Harish.Patil@cavium.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, Jianfeng Tan <jianfeng.tan@intel.com>,
Jingjing Wu <jingjing.wu@intel.com>, "Thotton, Shijith"
<Shijith.Thotton@cavium.com>, Gregory Etelson <gregory@weka.io>, George
Prekas <george.prekas@epfl.ch>, "stable@dpdk.org" <stable@dpdk.org>
Subject: Re: [PATCH] igb_uio: revert open and release operations

>17/10/2017 22:14, Ferruh Yigit:

>> There were bug reports about terminated application may leave device in

>> undesired state:

>> http://dpdk.org/ml/archives/dev/2016-November/049745.html

>> http://dpdk.org/ml/archives/dev/2016-November/050932.html

>> 

>> And a proposal to fix:

>> http://dpdk.org/ml/archives/dev/2016-December/051844.html

>> 

>> Later another proposal triggered the discussion:

>> http://dpdk.org/ml/archives/dev/2017-May/066317.html

>> 

>> Finally a fix patch pushed into v17.08:

>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of

>>device file")

>> 

>> Later a regression report sent related to the pushed patch:

>> http://dpdk.org/ml/archives/dev/2017-September/075236.html

>> 

>> And a fix for regression integrated into v17.11-rc1:

>> http://dpdk.org/ml/archives/dev/2017-October/079166.html

>> Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in

>>VM")

>> Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17")

>> 

>> Even after the fix qede PMD reported to be broken:

>> http://dpdk.org/ml/archives/dev/2017-October/079359.html

>> 

>> So this patch reverts original fix and related commits. The related

>> igb_uio code part turns back to v17.05 base.

>[...]

>> ---

>> It would be nice to solve this issue in LTS release, but being close to

>> the release and the error report without details makes it hard to work

>> more on this issue.

>

>With this revert, we are back to the original issue.

>We must really try the proposed solution.

>Harish, please describe your issue and think how it could be fixed.

>Jingjing made it work for i40e.

>I know it is less effort to request a simple revert.

>Please let's try to fix it once for all.


Hi Ferruh/Thomas,
I’m discussing it internally, so please hold on committing this patch till
I revert back to you.

Thanks.


>
  
Shijith Thotton Oct. 18, 2017, 6:27 a.m. UTC | #4
On Tue, Oct 17, 2017 at 09:14:36PM +0100, Ferruh Yigit wrote:
> This reverts commit 6b9ed026a8704b9e5ee5da7997617ef7cc82e114.
> This reverts commit 5f6ff30dc5075c49069d684bab229aef7ff0fdc3.
> This reverts commit b58eedfc7dd57eef6d12e2c654a52c834f36084a.
> 
> There were bug reports about terminated application may leave device in
> undesired state:
> http://dpdk.org/ml/archives/dev/2016-November/049745.html
> http://dpdk.org/ml/archives/dev/2016-November/050932.html
> 
> And a proposal to fix:
> http://dpdk.org/ml/archives/dev/2016-December/051844.html
> 
> Later another proposal triggered the discussion:
> http://dpdk.org/ml/archives/dev/2017-May/066317.html
> 
> Finally a fix patch pushed into v17.08:
> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")
> 
> Later a regression report sent related to the pushed patch:
> http://dpdk.org/ml/archives/dev/2017-September/075236.html
> 
> And a fix for regression integrated into v17.11-rc1:
> http://dpdk.org/ml/archives/dev/2017-October/079166.html
> Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in VM")
> Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17")
> 
> Even after the fix qede PMD reported to be broken:
> http://dpdk.org/ml/archives/dev/2017-October/079359.html
> 
> So this patch reverts original fix and related commits. The related
> igb_uio code part turns back to v17.05 base.
> 
> Cc: Jianfeng Tan <jianfeng.tan@intel.com>
> Cc: Jingjing Wu <jingjing.wu@intel.com>
> Cc: Shijith Thotton <shijith.thotton@caviumnetworks.com>
> Cc: Gregory Etelson <gregory@weka.io>
> Cc: Harish Patil <harish.patil@cavium.com>
> Cc: George Prekas <george.prekas@epfl.ch>
> 
> Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> It would be nice to solve this issue in LTS release, but being close to
> the release and the error report without details makes it hard to work
> more on this issue.
> 
> Thanks everyone who spent effort for this, hopefully we can continue to
> work on next release cycle.
> 
> Jingjing, there is a i40e commit, was part of igb_uio fix patchset, is
> it generic, or needs to be reverted with this patch?
> Commit: 8cacf78469a7 ("net/i40e: fix VF initialization error")

Hi Ferruh,

Please consider this patch as part of revert.
Commit: 9ed3f38770c6 ("net/liquidio: remove FLR request to PF driver")

Here I have removed extra FLR requests inside driver during init and close.
They are required now, as we remove resets in igb_uio.

Thanks,
Shijith

[...]
  
Ferruh Yigit Oct. 18, 2017, 8:47 p.m. UTC | #5
On 10/17/2017 11:27 PM, Shijith Thotton wrote:
> On Tue, Oct 17, 2017 at 09:14:36PM +0100, Ferruh Yigit wrote:
>> This reverts commit 6b9ed026a8704b9e5ee5da7997617ef7cc82e114.
>> This reverts commit 5f6ff30dc5075c49069d684bab229aef7ff0fdc3.
>> This reverts commit b58eedfc7dd57eef6d12e2c654a52c834f36084a.
>>
>> There were bug reports about terminated application may leave device in
>> undesired state:
>> http://dpdk.org/ml/archives/dev/2016-November/049745.html
>> http://dpdk.org/ml/archives/dev/2016-November/050932.html
>>
>> And a proposal to fix:
>> http://dpdk.org/ml/archives/dev/2016-December/051844.html
>>
>> Later another proposal triggered the discussion:
>> http://dpdk.org/ml/archives/dev/2017-May/066317.html
>>
>> Finally a fix patch pushed into v17.08:
>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")
>>
>> Later a regression report sent related to the pushed patch:
>> http://dpdk.org/ml/archives/dev/2017-September/075236.html
>>
>> And a fix for regression integrated into v17.11-rc1:
>> http://dpdk.org/ml/archives/dev/2017-October/079166.html
>> Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in VM")
>> Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17")
>>
>> Even after the fix qede PMD reported to be broken:
>> http://dpdk.org/ml/archives/dev/2017-October/079359.html
>>
>> So this patch reverts original fix and related commits. The related
>> igb_uio code part turns back to v17.05 base.
>>
>> Cc: Jianfeng Tan <jianfeng.tan@intel.com>
>> Cc: Jingjing Wu <jingjing.wu@intel.com>
>> Cc: Shijith Thotton <shijith.thotton@caviumnetworks.com>
>> Cc: Gregory Etelson <gregory@weka.io>
>> Cc: Harish Patil <harish.patil@cavium.com>
>> Cc: George Prekas <george.prekas@epfl.ch>
>>
>> Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>> ---
>> It would be nice to solve this issue in LTS release, but being close to
>> the release and the error report without details makes it hard to work
>> more on this issue.
>>
>> Thanks everyone who spent effort for this, hopefully we can continue to
>> work on next release cycle.
>>
>> Jingjing, there is a i40e commit, was part of igb_uio fix patchset, is
>> it generic, or needs to be reverted with this patch?
>> Commit: 8cacf78469a7 ("net/i40e: fix VF initialization error")
> 
> Hi Ferruh,
> 
> Please consider this patch as part of revert.
> Commit: 9ed3f38770c6 ("net/liquidio: remove FLR request to PF driver")

Hi Shijith,

Sure, I will add this on next version of the patch.

> 
> Here I have removed extra FLR requests inside driver during init and close.
> They are required now, as we remove resets in igb_uio.
> 
> Thanks,
> Shijith
> 
> [...]
>
  
Patil, Harish Oct. 19, 2017, 10:43 p.m. UTC | #6
-----Original Message-----
From: Harish Patil <Harish.Patil@cavium.com>

Date: Tuesday, October 17, 2017 at 9:50 PM
To: Thomas Monjalon <thomas@monjalon.net>, Ferruh Yigit
<ferruh.yigit@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, Jianfeng Tan <jianfeng.tan@intel.com>,
Jingjing Wu <jingjing.wu@intel.com>, "Thotton, Shijith"
<Shijith.Thotton@cavium.com>, Gregory Etelson <gregory@weka.io>, George
Prekas <george.prekas@epfl.ch>, "stable@dpdk.org" <stable@dpdk.org>
Subject: Re: [PATCH] igb_uio: revert open and release operations
>

>

>-----Original Message-----

>From: Thomas Monjalon <thomas@monjalon.net>

>Date: Tuesday, October 17, 2017 at 1:33 PM

>To: Ferruh Yigit <ferruh.yigit@intel.com>, Harish Patil

><Harish.Patil@cavium.com>

>Cc: "dev@dpdk.org" <dev@dpdk.org>, Jianfeng Tan <jianfeng.tan@intel.com>,

>Jingjing Wu <jingjing.wu@intel.com>, "Thotton, Shijith"

><Shijith.Thotton@cavium.com>, Gregory Etelson <gregory@weka.io>, George

>Prekas <george.prekas@epfl.ch>, "stable@dpdk.org" <stable@dpdk.org>

>Subject: Re: [PATCH] igb_uio: revert open and release operations

>

>>17/10/2017 22:14, Ferruh Yigit:

>>> There were bug reports about terminated application may leave device in

>>> undesired state:

>>> http://dpdk.org/ml/archives/dev/2016-November/049745.html

>>> http://dpdk.org/ml/archives/dev/2016-November/050932.html

>>> 

>>> And a proposal to fix:

>>> http://dpdk.org/ml/archives/dev/2016-December/051844.html

>>> 

>>> Later another proposal triggered the discussion:

>>> http://dpdk.org/ml/archives/dev/2017-May/066317.html

>>> 

>>> Finally a fix patch pushed into v17.08:

>>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of

>>>device file")

>>> 

>>> Later a regression report sent related to the pushed patch:

>>> http://dpdk.org/ml/archives/dev/2017-September/075236.html

>>> 

>>> And a fix for regression integrated into v17.11-rc1:

>>> http://dpdk.org/ml/archives/dev/2017-October/079166.html

>>> Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in

>>>VM")

>>> Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17")

>>> 

>>> Even after the fix qede PMD reported to be broken:

>>> http://dpdk.org/ml/archives/dev/2017-October/079359.html

>>> 

>>> So this patch reverts original fix and related commits. The related

>>> igb_uio code part turns back to v17.05 base.

>>[...]

>>> ---

>>> It would be nice to solve this issue in LTS release, but being close to

>>> the release and the error report without details makes it hard to work

>>> more on this issue.

>>

>>With this revert, we are back to the original issue.

>>We must really try the proposed solution.

>>Harish, please describe your issue and think how it could be fixed.

>>Jingjing made it work for i40e.

>>I know it is less effort to request a simple revert.

>>Please let's try to fix it once for all.

>

>Hi Ferruh/Thomas,

>I’m discussing it internally, so please hold on committing this patch till

>I revert back to you.

>

>Thanks.


>[Harish]


> 1) With the introduction of:

>Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of

>device file”)

   We saw failures with qede PF & SR-IOV VF initialization in PCI passthru
mode.

> 

>PF PCI passthru mode initialization failure was resolved by:

>“Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in

>VM")


>SR-IOV VF PCI passthru mode initialization issue is that PCI FLR and

>related device cleanup is not completed by the time VF driver starts

>loading. It results in the mbox command failure sent over the HW channel

>between VF and PF.

>

>Even though pci_reset_function() waits for the stipulated amount of time

>per standards, VF FLR takes longer than that and pci_reset_function() &

>igb_uio_open() call returns before FLR completes and VF PMD driver tries

>to load before FLR completes leading to VF PMD initialization failure.

> 

>

>We can work around this problem by adding driver delay/retry logic since

>there is no deterministic way of detecting FLR completion. But this is

>going to increase the driver load time.

> 


>2) With the above patch ("igb_uio: issue FLR during open and release of

>device file), FLR is going to be issued unconditionally on all devices

>during igb_uio_open. We think it’s an over kill. FLR is required only for

>previous abnormal app termination. We already handle the abnormal app

>termination by doing necessary cleanup in the driver during load. This

>cleanup is more efficient as it is done only when required. So we feel

>that the drivers/devices needing such cleanup (the two cases listed

>below)  should do it conditionally when required rather than

>igb_uio_open() unconditionally performing FLR all the time.

> 

>e.g.,

   - cdb166963cae ("net/liquidio: add API for VF FLR”)
> 

> 

>- http://dpdk.org/ml/archives/dev/2017-May/066317.html


Thanks,
Harish
>
  
Ferruh Yigit Oct. 20, 2017, 1:15 a.m. UTC | #7
On 10/19/2017 3:43 PM, Patil, Harish wrote:
> -----Original Message-----
> From: Harish Patil <Harish.Patil@cavium.com>
> Date: Tuesday, October 17, 2017 at 9:50 PM
> To: Thomas Monjalon <thomas@monjalon.net>, Ferruh Yigit
> <ferruh.yigit@intel.com>
> Cc: "dev@dpdk.org" <dev@dpdk.org>, Jianfeng Tan <jianfeng.tan@intel.com>,
> Jingjing Wu <jingjing.wu@intel.com>, "Thotton, Shijith"
> <Shijith.Thotton@cavium.com>, Gregory Etelson <gregory@weka.io>, George
> Prekas <george.prekas@epfl.ch>, "stable@dpdk.org" <stable@dpdk.org>
> Subject: Re: [PATCH] igb_uio: revert open and release operations
>>
>>
>> -----Original Message-----
>> From: Thomas Monjalon <thomas@monjalon.net>
>> Date: Tuesday, October 17, 2017 at 1:33 PM
>> To: Ferruh Yigit <ferruh.yigit@intel.com>, Harish Patil
>> <Harish.Patil@cavium.com>
>> Cc: "dev@dpdk.org" <dev@dpdk.org>, Jianfeng Tan <jianfeng.tan@intel.com>,
>> Jingjing Wu <jingjing.wu@intel.com>, "Thotton, Shijith"
>> <Shijith.Thotton@cavium.com>, Gregory Etelson <gregory@weka.io>, George
>> Prekas <george.prekas@epfl.ch>, "stable@dpdk.org" <stable@dpdk.org>
>> Subject: Re: [PATCH] igb_uio: revert open and release operations
>>
>>> 17/10/2017 22:14, Ferruh Yigit:
>>>> There were bug reports about terminated application may leave device in
>>>> undesired state:
>>>> http://dpdk.org/ml/archives/dev/2016-November/049745.html
>>>> http://dpdk.org/ml/archives/dev/2016-November/050932.html
>>>>
>>>> And a proposal to fix:
>>>> http://dpdk.org/ml/archives/dev/2016-December/051844.html
>>>>
>>>> Later another proposal triggered the discussion:
>>>> http://dpdk.org/ml/archives/dev/2017-May/066317.html
>>>>
>>>> Finally a fix patch pushed into v17.08:
>>>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of
>>>> device file")
>>>>
>>>> Later a regression report sent related to the pushed patch:
>>>> http://dpdk.org/ml/archives/dev/2017-September/075236.html
>>>>
>>>> And a fix for regression integrated into v17.11-rc1:
>>>> http://dpdk.org/ml/archives/dev/2017-October/079166.html
>>>> Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in
>>>> VM")
>>>> Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17")
>>>>
>>>> Even after the fix qede PMD reported to be broken:
>>>> http://dpdk.org/ml/archives/dev/2017-October/079359.html
>>>>
>>>> So this patch reverts original fix and related commits. The related
>>>> igb_uio code part turns back to v17.05 base.
>>> [...]
>>>> ---
>>>> It would be nice to solve this issue in LTS release, but being close to
>>>> the release and the error report without details makes it hard to work
>>>> more on this issue.
>>>
>>> With this revert, we are back to the original issue.
>>> We must really try the proposed solution.
>>> Harish, please describe your issue and think how it could be fixed.
>>> Jingjing made it work for i40e.
>>> I know it is less effort to request a simple revert.
>>> Please let's try to fix it once for all.
>>
>> Hi Ferruh/Thomas,
>> I’m discussing it internally, so please hold on committing this patch till
>> I revert back to you.
>>
>> Thanks.
> 
>> [Harish]
> 
>> 1) With the introduction of:
>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of
>> device file”)
>    We saw failures with qede PF & SR-IOV VF initialization in PCI passthru
> mode.
> 
>>
>> PF PCI passthru mode initialization failure was resolved by:
>> “Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in
>> VM")

Thank you for the update.

> 
>> SR-IOV VF PCI passthru mode initialization issue is that PCI FLR and
>> related device cleanup is not completed by the time VF driver starts
>> loading. It results in the mbox command failure sent over the HW channel
>> between VF and PF.

This seems same reason why i40e added a check and wait loop.

>>
>> Even though pci_reset_function() waits for the stipulated amount of time
>> per standards, VF FLR takes longer than that and pci_reset_function() &
>> igb_uio_open() call returns before FLR completes and VF PMD driver tries
>> to load before FLR completes leading to VF PMD initialization failure.
>>
>>
>> We can work around this problem by adding driver delay/retry logic since
>> there is no deterministic way of detecting FLR completion. But this is
>> going to increase the driver load time.
>>
> 
>> 2) With the above patch ("igb_uio: issue FLR during open and release of
>> device file), FLR is going to be issued unconditionally on all devices
>> during igb_uio_open. We think it’s an over kill. FLR is required only for
>> previous abnormal app termination. We already handle the abnormal app
>> termination by doing necessary cleanup in the driver during load. This
>> cleanup is more efficient as it is done only when required. So we feel
>> that the drivers/devices needing such cleanup (the two cases listed
>> below)  should do it conditionally when required rather than
>> igb_uio_open() unconditionally performing FLR all the time.
>>
>> e.g.,
>    - cdb166963cae ("net/liquidio: add API for VF FLR”)

Both 1) and 2) related to the pci_reset during open().
But the main functionality we are looking for is the pci_reset in release().
So we can remove reset during open() [1].

Will disabling pci_reset in open() [2] solve your problems?

Jingjing, Jianfeng,
What do you think?


[1]
Perhaps will need to revert or partially revert liquidio patch after this.

[2]
disable line "pci_reset_function(dev);" in the igbuio_pci_open()
http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/igb_uio/igb_uio.c#n339

>>
>>
>> - http://dpdk.org/ml/archives/dev/2017-May/066317.html
> 
> Thanks,
> Harish
>>
>
  
Jianfeng Tan Oct. 20, 2017, 3:26 p.m. UTC | #8
Hi Ferruh & Harish,


On 10/20/2017 9:15 AM, Ferruh Yigit wrote:
> On 10/19/2017 3:43 PM, Patil, Harish wrote:
>> -----Original Message-----
>> From: Harish Patil <Harish.Patil@cavium.com>
>> Date: Tuesday, October 17, 2017 at 9:50 PM
>> To: Thomas Monjalon <thomas@monjalon.net>, Ferruh Yigit
>> <ferruh.yigit@intel.com>
>> Cc: "dev@dpdk.org" <dev@dpdk.org>, Jianfeng Tan <jianfeng.tan@intel.com>,
>> Jingjing Wu <jingjing.wu@intel.com>, "Thotton, Shijith"
>> <Shijith.Thotton@cavium.com>, Gregory Etelson <gregory@weka.io>, George
>> Prekas <george.prekas@epfl.ch>, "stable@dpdk.org" <stable@dpdk.org>
>> Subject: Re: [PATCH] igb_uio: revert open and release operations
>>>
>>> -----Original Message-----
>>> From: Thomas Monjalon <thomas@monjalon.net>
>>> Date: Tuesday, October 17, 2017 at 1:33 PM
>>> To: Ferruh Yigit <ferruh.yigit@intel.com>, Harish Patil
>>> <Harish.Patil@cavium.com>
>>> Cc: "dev@dpdk.org" <dev@dpdk.org>, Jianfeng Tan <jianfeng.tan@intel.com>,
>>> Jingjing Wu <jingjing.wu@intel.com>, "Thotton, Shijith"
>>> <Shijith.Thotton@cavium.com>, Gregory Etelson <gregory@weka.io>, George
>>> Prekas <george.prekas@epfl.ch>, "stable@dpdk.org" <stable@dpdk.org>
>>> Subject: Re: [PATCH] igb_uio: revert open and release operations
>>>
>>>> 17/10/2017 22:14, Ferruh Yigit:
>>>>> There were bug reports about terminated application may leave device in
>>>>> undesired state:
>>>>> http://dpdk.org/ml/archives/dev/2016-November/049745.html
>>>>> http://dpdk.org/ml/archives/dev/2016-November/050932.html
>>>>>
>>>>> And a proposal to fix:
>>>>> http://dpdk.org/ml/archives/dev/2016-December/051844.html
>>>>>
>>>>> Later another proposal triggered the discussion:
>>>>> http://dpdk.org/ml/archives/dev/2017-May/066317.html
>>>>>
>>>>> Finally a fix patch pushed into v17.08:
>>>>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of
>>>>> device file")
>>>>>
>>>>> Later a regression report sent related to the pushed patch:
>>>>> http://dpdk.org/ml/archives/dev/2017-September/075236.html
>>>>>
>>>>> And a fix for regression integrated into v17.11-rc1:
>>>>> http://dpdk.org/ml/archives/dev/2017-October/079166.html
>>>>> Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in
>>>>> VM")
>>>>> Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17")
>>>>>
>>>>> Even after the fix qede PMD reported to be broken:
>>>>> http://dpdk.org/ml/archives/dev/2017-October/079359.html
>>>>>
>>>>> So this patch reverts original fix and related commits. The related
>>>>> igb_uio code part turns back to v17.05 base.
>>>> [...]
>>>>> ---
>>>>> It would be nice to solve this issue in LTS release, but being close to
>>>>> the release and the error report without details makes it hard to work
>>>>> more on this issue.
>>>> With this revert, we are back to the original issue.
>>>> We must really try the proposed solution.
>>>> Harish, please describe your issue and think how it could be fixed.
>>>> Jingjing made it work for i40e.
>>>> I know it is less effort to request a simple revert.
>>>> Please let's try to fix it once for all.
>>> Hi Ferruh/Thomas,
>>> I’m discussing it internally, so please hold on committing this patch till
>>> I revert back to you.
>>>
>>> Thanks.
>>> [Harish]
>>> 1) With the introduction of:
>>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of
>>> device file”)
>>     We saw failures with qede PF & SR-IOV VF initialization in PCI passthru
>> mode.
>>
>>> PF PCI passthru mode initialization failure was resolved by:
>>> “Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in
>>> VM")
> Thank you for the update.
>
>>> SR-IOV VF PCI passthru mode initialization issue is that PCI FLR and
>>> related device cleanup is not completed by the time VF driver starts
>>> loading. It results in the mbox command failure sent over the HW channel
>>> between VF and PF.
> This seems same reason why i40e added a check and wait loop.
>
>>> Even though pci_reset_function() waits for the stipulated amount of time
>>> per standards, VF FLR takes longer than that and pci_reset_function() &
>>> igb_uio_open() call returns before FLR completes and VF PMD driver tries
>>> to load before FLR completes leading to VF PMD initialization failure.
>>>
>>>
>>> We can work around this problem by adding driver delay/retry logic since
>>> there is no deterministic way of detecting FLR completion. But this is
>>> going to increase the driver load time.
>>>
>>> 2) With the above patch ("igb_uio: issue FLR during open and release of
>>> device file), FLR is going to be issued unconditionally on all devices
>>> during igb_uio_open. We think it’s an over kill. FLR is required only for
>>> previous abnormal app termination. We already handle the abnormal app
>>> termination by doing necessary cleanup in the driver during load. This
>>> cleanup is more efficient as it is done only when required. So we feel
>>> that the drivers/devices needing such cleanup (the two cases listed
>>> below)  should do it conditionally when required rather than
>>> igb_uio_open() unconditionally performing FLR all the time.
>>>
>>> e.g.,
>>     - cdb166963cae ("net/liquidio: add API for VF FLR”)
> Both 1) and 2) related to the pci_reset during open().
> But the main functionality we are looking for is the pci_reset in release().
> So we can remove reset during open() [1].
>
> Will disabling pci_reset in open() [2] solve your problems?
>
> Jingjing, Jianfeng,
> What do you think?

If [2] can work for all drivers, I agree with this option.

Thanks,
Jianfeng

>
>
> [1]
> Perhaps will need to revert or partially revert liquidio patch after this.
>
> [2]
> disable line "pci_reset_function(dev);" in the igbuio_pci_open()
> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/igb_uio/igb_uio.c#n339
>
  
Ferruh Yigit Oct. 20, 2017, 4:32 p.m. UTC | #9
On 10/20/2017 8:26 AM, Tan, Jianfeng wrote:
> Hi Ferruh & Harish,
> 
> 
> On 10/20/2017 9:15 AM, Ferruh Yigit wrote:
>> On 10/19/2017 3:43 PM, Patil, Harish wrote:
>>> -----Original Message-----
>>> From: Harish Patil <Harish.Patil@cavium.com>
>>> Date: Tuesday, October 17, 2017 at 9:50 PM
>>> To: Thomas Monjalon <thomas@monjalon.net>, Ferruh Yigit
>>> <ferruh.yigit@intel.com>
>>> Cc: "dev@dpdk.org" <dev@dpdk.org>, Jianfeng Tan <jianfeng.tan@intel.com>,
>>> Jingjing Wu <jingjing.wu@intel.com>, "Thotton, Shijith"
>>> <Shijith.Thotton@cavium.com>, Gregory Etelson <gregory@weka.io>, George
>>> Prekas <george.prekas@epfl.ch>, "stable@dpdk.org" <stable@dpdk.org>
>>> Subject: Re: [PATCH] igb_uio: revert open and release operations
>>>>
>>>> -----Original Message-----
>>>> From: Thomas Monjalon <thomas@monjalon.net>
>>>> Date: Tuesday, October 17, 2017 at 1:33 PM
>>>> To: Ferruh Yigit <ferruh.yigit@intel.com>, Harish Patil
>>>> <Harish.Patil@cavium.com>
>>>> Cc: "dev@dpdk.org" <dev@dpdk.org>, Jianfeng Tan <jianfeng.tan@intel.com>,
>>>> Jingjing Wu <jingjing.wu@intel.com>, "Thotton, Shijith"
>>>> <Shijith.Thotton@cavium.com>, Gregory Etelson <gregory@weka.io>, George
>>>> Prekas <george.prekas@epfl.ch>, "stable@dpdk.org" <stable@dpdk.org>
>>>> Subject: Re: [PATCH] igb_uio: revert open and release operations
>>>>
>>>>> 17/10/2017 22:14, Ferruh Yigit:
>>>>>> There were bug reports about terminated application may leave device in
>>>>>> undesired state:
>>>>>> http://dpdk.org/ml/archives/dev/2016-November/049745.html
>>>>>> http://dpdk.org/ml/archives/dev/2016-November/050932.html
>>>>>>
>>>>>> And a proposal to fix:
>>>>>> http://dpdk.org/ml/archives/dev/2016-December/051844.html
>>>>>>
>>>>>> Later another proposal triggered the discussion:
>>>>>> http://dpdk.org/ml/archives/dev/2017-May/066317.html
>>>>>>
>>>>>> Finally a fix patch pushed into v17.08:
>>>>>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of
>>>>>> device file")
>>>>>>
>>>>>> Later a regression report sent related to the pushed patch:
>>>>>> http://dpdk.org/ml/archives/dev/2017-September/075236.html
>>>>>>
>>>>>> And a fix for regression integrated into v17.11-rc1:
>>>>>> http://dpdk.org/ml/archives/dev/2017-October/079166.html
>>>>>> Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in
>>>>>> VM")
>>>>>> Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17")
>>>>>>
>>>>>> Even after the fix qede PMD reported to be broken:
>>>>>> http://dpdk.org/ml/archives/dev/2017-October/079359.html
>>>>>>
>>>>>> So this patch reverts original fix and related commits. The related
>>>>>> igb_uio code part turns back to v17.05 base.
>>>>> [...]
>>>>>> ---
>>>>>> It would be nice to solve this issue in LTS release, but being close to
>>>>>> the release and the error report without details makes it hard to work
>>>>>> more on this issue.
>>>>> With this revert, we are back to the original issue.
>>>>> We must really try the proposed solution.
>>>>> Harish, please describe your issue and think how it could be fixed.
>>>>> Jingjing made it work for i40e.
>>>>> I know it is less effort to request a simple revert.
>>>>> Please let's try to fix it once for all.
>>>> Hi Ferruh/Thomas,
>>>> I’m discussing it internally, so please hold on committing this patch till
>>>> I revert back to you.
>>>>
>>>> Thanks.
>>>> [Harish]
>>>> 1) With the introduction of:
>>>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of
>>>> device file”)
>>>     We saw failures with qede PF & SR-IOV VF initialization in PCI passthru
>>> mode.
>>>
>>>> PF PCI passthru mode initialization failure was resolved by:
>>>> “Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in
>>>> VM")
>> Thank you for the update.
>>
>>>> SR-IOV VF PCI passthru mode initialization issue is that PCI FLR and
>>>> related device cleanup is not completed by the time VF driver starts
>>>> loading. It results in the mbox command failure sent over the HW channel
>>>> between VF and PF.
>> This seems same reason why i40e added a check and wait loop.
>>
>>>> Even though pci_reset_function() waits for the stipulated amount of time
>>>> per standards, VF FLR takes longer than that and pci_reset_function() &
>>>> igb_uio_open() call returns before FLR completes and VF PMD driver tries
>>>> to load before FLR completes leading to VF PMD initialization failure.
>>>>
>>>>
>>>> We can work around this problem by adding driver delay/retry logic since
>>>> there is no deterministic way of detecting FLR completion. But this is
>>>> going to increase the driver load time.
>>>>
>>>> 2) With the above patch ("igb_uio: issue FLR during open and release of
>>>> device file), FLR is going to be issued unconditionally on all devices
>>>> during igb_uio_open. We think it’s an over kill. FLR is required only for
>>>> previous abnormal app termination. We already handle the abnormal app
>>>> termination by doing necessary cleanup in the driver during load. This
>>>> cleanup is more efficient as it is done only when required. So we feel
>>>> that the drivers/devices needing such cleanup (the two cases listed
>>>> below)  should do it conditionally when required rather than
>>>> igb_uio_open() unconditionally performing FLR all the time.
>>>>
>>>> e.g.,
>>>     - cdb166963cae ("net/liquidio: add API for VF FLR”)
>> Both 1) and 2) related to the pci_reset during open().
>> But the main functionality we are looking for is the pci_reset in release().
>> So we can remove reset during open() [1].
>>
>> Will disabling pci_reset in open() [2] solve your problems?
>>
>> Jingjing, Jianfeng,
>> What do you think?
> 
> If [2] can work for all drivers, I agree with this option.

OK, I will send this as a patch to make it easy to test and discuss.

> 
> Thanks,
> Jianfeng
> 
>>
>>
>> [1]
>> Perhaps will need to revert or partially revert liquidio patch after this.
>>
>> [2]
>> disable line "pci_reset_function(dev);" in the igbuio_pci_open()
>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/igb_uio/igb_uio.c#n339
>>
>
  
Ferruh Yigit Oct. 24, 2017, 9:32 p.m. UTC | #10
On 10/17/2017 1:14 PM, Ferruh Yigit wrote:
> This reverts commit 6b9ed026a8704b9e5ee5da7997617ef7cc82e114.
> This reverts commit 5f6ff30dc5075c49069d684bab229aef7ff0fdc3.
> This reverts commit b58eedfc7dd57eef6d12e2c654a52c834f36084a.
> 
> There were bug reports about terminated application may leave device in
> undesired state:
> http://dpdk.org/ml/archives/dev/2016-November/049745.html
> http://dpdk.org/ml/archives/dev/2016-November/050932.html
> 
> And a proposal to fix:
> http://dpdk.org/ml/archives/dev/2016-December/051844.html
> 
> Later another proposal triggered the discussion:
> http://dpdk.org/ml/archives/dev/2017-May/066317.html
> 
> Finally a fix patch pushed into v17.08:
> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")
> 
> Later a regression report sent related to the pushed patch:
> http://dpdk.org/ml/archives/dev/2017-September/075236.html
> 
> And a fix for regression integrated into v17.11-rc1:
> http://dpdk.org/ml/archives/dev/2017-October/079166.html
> Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in VM")
> Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17")
> 
> Even after the fix qede PMD reported to be broken:
> http://dpdk.org/ml/archives/dev/2017-October/079359.html
> 
> So this patch reverts original fix and related commits. The related
> igb_uio code part turns back to v17.05 base.
> 
> Cc: Jianfeng Tan <jianfeng.tan@intel.com>
> Cc: Jingjing Wu <jingjing.wu@intel.com>
> Cc: Shijith Thotton <shijith.thotton@caviumnetworks.com>
> Cc: Gregory Etelson <gregory@weka.io>
> Cc: Harish Patil <harish.patil@cavium.com>
> Cc: George Prekas <george.prekas@epfl.ch>
> 
> Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> It would be nice to solve this issue in LTS release, but being close to
> the release and the error report without details makes it hard to work
> more on this issue.
> 
> Thanks everyone who spent effort for this, hopefully we can continue to
> work on next release cycle.
> 
> Jingjing, there is a i40e commit, was part of igb_uio fix patchset, is
> it generic, or needs to be reverted with this patch?
> Commit: 8cacf78469a7 ("net/i40e: fix VF initialization error")

This patch is no more valid and will be marked as rejected, since instead of
revert, agreed on a solution [1].


[1]
http://dpdk.org/ml/archives/dev/2017-October/thread.html#79815
http://dpdk.org/dev/patchwork/patch/30654/
  

Patch

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index f7ef82554..786df68a2 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -49,6 +49,7 @@  struct rte_uio_pci_dev {
 
 static char *intr_mode;
 static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
+
 /* sriov sysfs */
 static ssize_t
 show_max_vfs(struct device *dev, struct device_attribute *attr,
@@ -207,22 +208,80 @@  igbuio_pci_irqcontrol(struct uio_info *info, s32 irq_state)
  * If yes, disable it here and will be enable later.
  */
 static irqreturn_t
-igbuio_pci_irqhandler(int irq, void *dev_id)
+igbuio_pci_irqhandler(int irq, struct uio_info *info)
 {
-	struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id;
-	struct uio_info *info = &udev->info;
+	struct rte_uio_pci_dev *udev = info->priv;
 
 	/* Legacy mode need to mask in hardware */
 	if (udev->mode == RTE_INTR_MODE_LEGACY &&
 	    !pci_check_and_mask_intx(udev->pdev))
 		return IRQ_NONE;
 
-	uio_event_notify(info);
-
 	/* Message signal mode, no share IRQ and automasked */
 	return IRQ_HANDLED;
 }
 
+/* Remap pci resources described by bar #pci_bar in uio resource n. */
+static int
+igbuio_pci_setup_iomem(struct pci_dev *dev, struct uio_info *info,
+		       int n, int pci_bar, const char *name)
+{
+	unsigned long addr, len;
+	void *internal_addr;
+
+	if (n >= ARRAY_SIZE(info->mem))
+		return -EINVAL;
+
+	addr = pci_resource_start(dev, pci_bar);
+	len = pci_resource_len(dev, pci_bar);
+	if (addr == 0 || len == 0)
+		return -1;
+	internal_addr = ioremap(addr, len);
+	if (internal_addr == NULL)
+		return -1;
+	info->mem[n].name = name;
+	info->mem[n].addr = addr;
+	info->mem[n].internal_addr = internal_addr;
+	info->mem[n].size = len;
+	info->mem[n].memtype = UIO_MEM_PHYS;
+	return 0;
+}
+
+/* Get pci port io resources described by bar #pci_bar in uio resource n. */
+static int
+igbuio_pci_setup_ioport(struct pci_dev *dev, struct uio_info *info,
+		int n, int pci_bar, const char *name)
+{
+	unsigned long addr, len;
+
+	if (n >= ARRAY_SIZE(info->port))
+		return -EINVAL;
+
+	addr = pci_resource_start(dev, pci_bar);
+	len = pci_resource_len(dev, pci_bar);
+	if (addr == 0 || len == 0)
+		return -EINVAL;
+
+	info->port[n].name = name;
+	info->port[n].start = addr;
+	info->port[n].size = len;
+	info->port[n].porttype = UIO_PORT_X86;
+
+	return 0;
+}
+
+/* Unmap previously ioremap'd resources */
+static void
+igbuio_pci_release_iomem(struct uio_info *info)
+{
+	int i;
+
+	for (i = 0; i < MAX_UIO_MAPS; i++) {
+		if (info->mem[i].internal_addr)
+			iounmap(info->mem[i].internal_addr);
+	}
+}
+
 static int
 igbuio_pci_enable_interrupts(struct rte_uio_pci_dev *udev)
 {
@@ -252,7 +311,6 @@  igbuio_pci_enable_interrupts(struct rte_uio_pci_dev *udev)
 			break;
 		}
 #endif
-
 	/* fall back to MSI */
 	case RTE_INTR_MODE_MSI:
 #ifndef HAVE_ALLOC_IRQ_VECTORS
@@ -291,28 +349,15 @@  igbuio_pci_enable_interrupts(struct rte_uio_pci_dev *udev)
 	default:
 		dev_err(&udev->pdev->dev, "invalid IRQ mode %u",
 			igbuio_intr_mode_preferred);
-		udev->info.irq = UIO_IRQ_NONE;
 		err = -EINVAL;
 	}
 
-	if (udev->info.irq != UIO_IRQ_NONE)
-		err = request_irq(udev->info.irq, igbuio_pci_irqhandler,
-				  udev->info.irq_flags, udev->info.name,
-				  udev);
-	dev_info(&udev->pdev->dev, "uio device registered with irq %lx\n",
-		 udev->info.irq);
-
 	return err;
 }
 
 static void
 igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 {
-	if (udev->info.irq) {
-		free_irq(udev->info.irq, udev);
-		udev->info.irq = 0;
-	}
-
 #ifndef HAVE_ALLOC_IRQ_VECTORS
 	if (udev->mode == RTE_INTR_MODE_MSIX)
 		pci_disable_msix(udev->pdev);
@@ -325,109 +370,6 @@  igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
-
-/**
- * This gets called while opening uio device file.
- */
-static int
-igbuio_pci_open(struct uio_info *info, struct inode *inode)
-{
-	struct rte_uio_pci_dev *udev = info->priv;
-	struct pci_dev *dev = udev->pdev;
-	int err;
-
-	pci_reset_function(dev);
-
-	/* set bus master, which was cleared by the reset function */
-	pci_set_master(dev);
-
-	/* enable interrupts */
-	err = igbuio_pci_enable_interrupts(udev);
-	if (err) {
-		dev_err(&dev->dev, "Enable interrupt fails\n");
-		return err;
-	}
-	return 0;
-}
-
-static int
-igbuio_pci_release(struct uio_info *info, struct inode *inode)
-{
-	struct rte_uio_pci_dev *udev = info->priv;
-	struct pci_dev *dev = udev->pdev;
-
-	/* disable interrupts */
-	igbuio_pci_disable_interrupts(udev);
-
-	/* stop the device from further DMA */
-	pci_clear_master(dev);
-
-	pci_reset_function(dev);
-
-	return 0;
-}
-
-/* Remap pci resources described by bar #pci_bar in uio resource n. */
-static int
-igbuio_pci_setup_iomem(struct pci_dev *dev, struct uio_info *info,
-		       int n, int pci_bar, const char *name)
-{
-	unsigned long addr, len;
-	void *internal_addr;
-
-	if (n >= ARRAY_SIZE(info->mem))
-		return -EINVAL;
-
-	addr = pci_resource_start(dev, pci_bar);
-	len = pci_resource_len(dev, pci_bar);
-	if (addr == 0 || len == 0)
-		return -1;
-	internal_addr = ioremap(addr, len);
-	if (internal_addr == NULL)
-		return -1;
-	info->mem[n].name = name;
-	info->mem[n].addr = addr;
-	info->mem[n].internal_addr = internal_addr;
-	info->mem[n].size = len;
-	info->mem[n].memtype = UIO_MEM_PHYS;
-	return 0;
-}
-
-/* Get pci port io resources described by bar #pci_bar in uio resource n. */
-static int
-igbuio_pci_setup_ioport(struct pci_dev *dev, struct uio_info *info,
-		int n, int pci_bar, const char *name)
-{
-	unsigned long addr, len;
-
-	if (n >= ARRAY_SIZE(info->port))
-		return -EINVAL;
-
-	addr = pci_resource_start(dev, pci_bar);
-	len = pci_resource_len(dev, pci_bar);
-	if (addr == 0 || len == 0)
-		return -EINVAL;
-
-	info->port[n].name = name;
-	info->port[n].start = addr;
-	info->port[n].size = len;
-	info->port[n].porttype = UIO_PORT_X86;
-
-	return 0;
-}
-
-/* Unmap previously ioremap'd resources */
-static void
-igbuio_pci_release_iomem(struct uio_info *info)
-{
-	int i;
-
-	for (i = 0; i < MAX_UIO_MAPS; i++) {
-		if (info->mem[i].internal_addr)
-			iounmap(info->mem[i].internal_addr);
-	}
-}
-
 static int
 igbuio_setup_bars(struct pci_dev *dev, struct uio_info *info)
 {
@@ -518,16 +460,19 @@  igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	/* fill uio infos */
 	udev->info.name = "igb_uio";
 	udev->info.version = "0.1";
+	udev->info.handler = igbuio_pci_irqhandler;
 	udev->info.irqcontrol = igbuio_pci_irqcontrol;
-	udev->info.open = igbuio_pci_open;
-	udev->info.release = igbuio_pci_release;
 	udev->info.priv = udev;
 	udev->pdev = dev;
 
-	err = sysfs_create_group(&dev->dev.kobj, &dev_attr_grp);
+	err = igbuio_pci_enable_interrupts(udev);
 	if (err != 0)
 		goto fail_release_iomem;
 
+	err = sysfs_create_group(&dev->dev.kobj, &dev_attr_grp);
+	if (err != 0)
+		goto fail_disable_interrupts;
+
 	/* register uio driver */
 	err = uio_register_device(&dev->dev, &udev->info);
 	if (err != 0)
@@ -535,6 +480,9 @@  igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 
 	pci_set_drvdata(dev, udev);
 
+	dev_info(&dev->dev, "uio device registered with irq %lx\n",
+		 udev->info.irq);
+
 	/*
 	 * Doing a harmless dma mapping for attaching the device to
 	 * the iommu identity mapping if kernel boots with iommu=pt.
@@ -560,6 +508,8 @@  igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 
 fail_remove_group:
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
+fail_disable_interrupts:
+	igbuio_pci_disable_interrupts(udev);
 fail_release_iomem:
 	igbuio_pci_release_iomem(&udev->info);
 	pci_disable_device(dev);
@@ -576,6 +526,7 @@  igbuio_pci_remove(struct pci_dev *dev)
 
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
 	uio_unregister_device(&udev->info);
+	igbuio_pci_disable_interrupts(udev);
 	igbuio_pci_release_iomem(&udev->info);
 	pci_disable_device(dev);
 	pci_set_drvdata(dev, NULL);