[dpdk-dev,v6,1/6] ethdev: add devop to check removal status

Message ID 1516274834-19755-2-git-send-email-matan@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Matan Azrad Jan. 18, 2018, 11:27 a.m. UTC
  There is time between the physical removal of the device until PMDs get
a RMV interrupt. At this time DPDK PMDs and applications still don't
know about the removal.

Current removal detection is achieved only by registration to device RMV
event and the notification comes asynchronously. So, there is no option
to detect a device removal synchronously.
Applications and other DPDK entities may want to check a device removal
synchronously and to take an immediate decision accordingly.

Add new dev op called is_removed to allow DPDK entities to check an
Ethernet device removal status immediately.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/librte_ether/rte_ethdev.c           | 28 +++++++++++++++++++++++++---
 lib/librte_ether/rte_ethdev.h           | 20 ++++++++++++++++++++
 lib/librte_ether/rte_ethdev_version.map |  1 +
 3 files changed, 46 insertions(+), 3 deletions(-)
  

Comments

Ferruh Yigit Jan. 18, 2018, 5:18 p.m. UTC | #1
On 1/18/2018 11:27 AM, Matan Azrad wrote:
> There is time between the physical removal of the device until PMDs get
> a RMV interrupt. At this time DPDK PMDs and applications still don't
> know about the removal.
> 
> Current removal detection is achieved only by registration to device RMV
> event and the notification comes asynchronously. So, there is no option
> to detect a device removal synchronously.
> Applications and other DPDK entities may want to check a device removal
> synchronously and to take an immediate decision accordingly.

So we will have two methods to detect device removal, one is asynchronous as you
mentioned.
Device removal will cause an interrupt which trigger to run user callback.

New method is synchronous, but still triggered from application. I mean
application should do a rte_eth_dev_is_removed() to learn about status, what is
the use case here, polling continuously? Won't this also cause some latency
unless you dedicate a core just polling device status?

> 
> Add new dev op called is_removed to allow DPDK entities to check an
> Ethernet device removal status immediately.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Thomas Monjalon <thomas@monjalon.net>
> ---
>  lib/librte_ether/rte_ethdev.c           | 28 +++++++++++++++++++++++++---
>  lib/librte_ether/rte_ethdev.h           | 20 ++++++++++++++++++++
>  lib/librte_ether/rte_ethdev_version.map |  1 +
>  3 files changed, 46 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index b349599..c93cec1 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -114,7 +114,8 @@ enum {
>  rte_eth_find_next(uint16_t port_id)
>  {
>  	while (port_id < RTE_MAX_ETHPORTS &&
> -	       rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED)
> +	       rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED &&
> +	       rte_eth_devices[port_id].state != RTE_ETH_DEV_REMOVED)

If device is removed, why we are not allowed to re-use port_id assigned to it?
Overall I am not clear with RTE_ETH_DEV_REMOVED state, why we are not directly
setting RTE_ETH_DEV_UNUSED?

And state RTE_ETH_DEV_REMOVED set in ethdev layer, and ethdev layer won't let
reusing it, so what changes the state of dev? Will it stay as it is during
lifetime of the application?

>  		port_id++;
>  
>  	if (port_id >= RTE_MAX_ETHPORTS)
> @@ -262,8 +263,7 @@ struct rte_eth_dev *
>  rte_eth_dev_is_valid_port(uint16_t port_id)
>  {
>  	if (port_id >= RTE_MAX_ETHPORTS ||
> -	    (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED &&
> -	     rte_eth_devices[port_id].state != RTE_ETH_DEV_DEFERRED))
> +	    (rte_eth_devices[port_id].state == RTE_ETH_DEV_UNUSED))
>  		return 0;
>  	else
>  		return 1;
> @@ -1094,6 +1094,28 @@ struct rte_eth_dev *
>  }
>  
>  int
> +rte_eth_dev_is_removed(uint16_t port_id)
> +{
> +	struct rte_eth_dev *dev;
> +	int ret;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> +
> +	dev = &rte_eth_devices[port_id];
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->is_removed, 0);
> +
> +	if (dev->state == RTE_ETH_DEV_REMOVED)
> +		return 1;

Isn't this conflict with below API documentation:

"
 * @return
 *   - 0 when the Ethernet device is removed, otherwise 1.
"

> +
> +	ret = dev->dev_ops->is_removed(dev);
> +	if (ret != 0)
> +		dev->state = RTE_ETH_DEV_REMOVED;

It isn't clear what "dev_ops->is_removed(dev)" should return, and this causing
incompatible usages in PMDs by time.
Please add some documentation about expected return values for dev_ops.


And this not real remove, PMD signals us and we stop using that device, but
device can be there, right?
If there is a real removal, can be possible to use eal hotplug?

<...>
  
Adrien Mazarguil Jan. 18, 2018, 5:57 p.m. UTC | #2
On Thu, Jan 18, 2018 at 05:18:22PM +0000, Ferruh Yigit wrote:
> On 1/18/2018 11:27 AM, Matan Azrad wrote:
> > There is time between the physical removal of the device until PMDs get
> > a RMV interrupt. At this time DPDK PMDs and applications still don't
> > know about the removal.
> > 
> > Current removal detection is achieved only by registration to device RMV
> > event and the notification comes asynchronously. So, there is no option
> > to detect a device removal synchronously.
> > Applications and other DPDK entities may want to check a device removal
> > synchronously and to take an immediate decision accordingly.
> 
> So we will have two methods to detect device removal, one is asynchronous as you
> mentioned.
> Device removal will cause an interrupt which trigger to run user callback.
> 
> New method is synchronous, but still triggered from application. I mean
> application should do a rte_eth_dev_is_removed() to learn about status, what is
> the use case here, polling continuously? Won't this also cause some latency
> unless you dedicate a core just polling device status?

They are complementary. The use case is when devices get suddenly physically
pulled out of their chassis (you need to picture a raging sysadmin for
that), or logically in the case of a hypervisor removing a SR-IOV device
from a VM, this happens without prior notice.

It takes time for the PCI unplug notification to travel from the kernel to
DPDK, up to several seconds, during which the DPDK application may execute
control path operations on it. These may fail due to the now non-existent
device (e.g. no ACK will be returned by the device after adding a new MAC),
and these failures may be misinterpreted (e.g. permission denied, invalid
argument and so on).

To address this problem, PMDs that support physical hotplug must have all
their devops internally check for device removal before returning any other
error, in order to possibly convert the original error code to EIO.

Now patching each and every devop in each PMD with basically the same code
being counterproductive, this series puts this check at a higher level,
inside rte_ethdev. Since this results in a new devop, it can be exposed to
applications for free, as these may find a use for it as well.

> > Add new dev op called is_removed to allow DPDK entities to check an
> > Ethernet device removal status immediately.
> > 
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > Acked-by: Thomas Monjalon <thomas@monjalon.net>
> > ---
> >  lib/librte_ether/rte_ethdev.c           | 28 +++++++++++++++++++++++++---
> >  lib/librte_ether/rte_ethdev.h           | 20 ++++++++++++++++++++
> >  lib/librte_ether/rte_ethdev_version.map |  1 +
> >  3 files changed, 46 insertions(+), 3 deletions(-)
> > 
> > diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> > index b349599..c93cec1 100644
> > --- a/lib/librte_ether/rte_ethdev.c
> > +++ b/lib/librte_ether/rte_ethdev.c
> > @@ -114,7 +114,8 @@ enum {
> >  rte_eth_find_next(uint16_t port_id)
> >  {
> >  	while (port_id < RTE_MAX_ETHPORTS &&
> > -	       rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED)
> > +	       rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED &&
> > +	       rte_eth_devices[port_id].state != RTE_ETH_DEV_REMOVED)
> 
> If device is removed, why we are not allowed to re-use port_id assigned to it?
> Overall I am not clear with RTE_ETH_DEV_REMOVED state, why we are not directly
> setting RTE_ETH_DEV_UNUSED?
> 
> And state RTE_ETH_DEV_REMOVED set in ethdev layer, and ethdev layer won't let
> reusing it, so what changes the state of dev? Will it stay as it is during
> lifetime of the application?

While it switched to the REMOVED state, the underlying PMD still holds the
entry at this point; data is still allocated and so on. It will switch to
UNUSED after the PMD instance is fully de-initialized. In the meantime the
entry still needs to be skipped.

> >  		port_id++;
> >  
> >  	if (port_id >= RTE_MAX_ETHPORTS)
> > @@ -262,8 +263,7 @@ struct rte_eth_dev *
> >  rte_eth_dev_is_valid_port(uint16_t port_id)
> >  {
> >  	if (port_id >= RTE_MAX_ETHPORTS ||
> > -	    (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED &&
> > -	     rte_eth_devices[port_id].state != RTE_ETH_DEV_DEFERRED))
> > +	    (rte_eth_devices[port_id].state == RTE_ETH_DEV_UNUSED))
> >  		return 0;
> >  	else
> >  		return 1;
> > @@ -1094,6 +1094,28 @@ struct rte_eth_dev *
> >  }
> >  
> >  int
> > +rte_eth_dev_is_removed(uint16_t port_id)
> > +{
> > +	struct rte_eth_dev *dev;
> > +	int ret;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->is_removed, 0);
> > +
> > +	if (dev->state == RTE_ETH_DEV_REMOVED)
> > +		return 1;
> 
> Isn't this conflict with below API documentation:
> 
> "
>  * @return
>  *   - 0 when the Ethernet device is removed, otherwise 1.
> "

Documentation is indeed wrong here. Matan?

> 
> > +
> > +	ret = dev->dev_ops->is_removed(dev);
> > +	if (ret != 0)
> > +		dev->state = RTE_ETH_DEV_REMOVED;
> 
> It isn't clear what "dev_ops->is_removed(dev)" should return, and this causing
> incompatible usages in PMDs by time.
> Please add some documentation about expected return values for dev_ops.

It should be clarified as a boolean value (yes = nonzero, no = zero), like
most is*() functions (isalpha(), isblank() and so on).

> And this not real remove, PMD signals us and we stop using that device, but
> device can be there, right?

"Removal" in the sense of "device removal" not "PMD removal" which is
usually described as "unbinding". This was chosen based on the
similarly-named "removal" (RMV) event for consistency.

> If there is a real removal, can be possible to use eal hotplug?

Possibly, although I think it doesn't remove the case for this devop, right?
  
Matan Azrad Jan. 18, 2018, 6:02 p.m. UTC | #3
Hi Ferruh

From: Ferruh Yigit, Thursday, January 18, 2018 7:18 PM

> On 1/18/2018 11:27 AM, Matan Azrad wrote:

> > There is time between the physical removal of the device until PMDs

> > get a RMV interrupt. At this time DPDK PMDs and applications still

> > don't know about the removal.

> >

> > Current removal detection is achieved only by registration to device

> > RMV event and the notification comes asynchronously. So, there is no

> > option to detect a device removal synchronously.

> > Applications and other DPDK entities may want to check a device

> > removal synchronously and to take an immediate decision accordingly.

> 

> So we will have two methods to detect device removal, one is asynchronous

> as you mentioned.

> Device removal will cause an interrupt which trigger to run user callback.


Yes.

> New method is synchronous, but still triggered from application. I mean

> application should do a rte_eth_dev_is_removed() to learn about status,

> what is the use case here, polling continuously? Won't this also cause some

> latency unless you dedicate a core just polling device status?

> 


It is for application and for other DPDK entities like PMDs, see fail-safe example in this series.
When hotplug in the game I think it can be used for application too.

> > Add new dev op called is_removed to allow DPDK entities to check an

> > Ethernet device removal status immediately.

> >

> > Signed-off-by: Matan Azrad <matan@mellanox.com>

> > Acked-by: Thomas Monjalon <thomas@monjalon.net>

> > ---

> >  lib/librte_ether/rte_ethdev.c           | 28 +++++++++++++++++++++++++---

> >  lib/librte_ether/rte_ethdev.h           | 20 ++++++++++++++++++++

> >  lib/librte_ether/rte_ethdev_version.map |  1 +

> >  3 files changed, 46 insertions(+), 3 deletions(-)

> >

> > diff --git a/lib/librte_ether/rte_ethdev.c

> > b/lib/librte_ether/rte_ethdev.c index b349599..c93cec1 100644

> > --- a/lib/librte_ether/rte_ethdev.c

> > +++ b/lib/librte_ether/rte_ethdev.c

> > @@ -114,7 +114,8 @@ enum {

> >  rte_eth_find_next(uint16_t port_id)

> >  {

> >  	while (port_id < RTE_MAX_ETHPORTS &&

> > -	       rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED)

> > +	       rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED &&

> > +	       rte_eth_devices[port_id].state != RTE_ETH_DEV_REMOVED)

> 

> If device is removed, why we are not allowed to re-use port_id assigned to

> it?

Sorry, don't understand.
We allow still to iterate over it here.

> Overall I am not clear with RTE_ETH_DEV_REMOVED state, why we are not

> directly setting RTE_ETH_DEV_UNUSED?

 
Someone should release the SW port resources before setting it to UNUSED.

> And state RTE_ETH_DEV_REMOVED set in ethdev layer, and ethdev layer

> won't let reusing it, so what changes the state of dev? Will it stay as it is

> during lifetime of the application?

> 

> >  		port_id++;

> >

> >  	if (port_id >= RTE_MAX_ETHPORTS)

> > @@ -262,8 +263,7 @@ struct rte_eth_dev *

> > rte_eth_dev_is_valid_port(uint16_t port_id)  {

> >  	if (port_id >= RTE_MAX_ETHPORTS ||

> > -	    (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED &&

> > -	     rte_eth_devices[port_id].state != RTE_ETH_DEV_DEFERRED))

> > +	    (rte_eth_devices[port_id].state == RTE_ETH_DEV_UNUSED))

> >  		return 0;

> >  	else

> >  		return 1;

> > @@ -1094,6 +1094,28 @@ struct rte_eth_dev *  }

> >

> >  int

> > +rte_eth_dev_is_removed(uint16_t port_id) {

> > +	struct rte_eth_dev *dev;

> > +	int ret;

> > +

> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);

> > +

> > +	dev = &rte_eth_devices[port_id];

> > +

> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->is_removed, 0);

> > +

> > +	if (dev->state == RTE_ETH_DEV_REMOVED)

> > +		return 1;

> 

> Isn't this conflict with below API documentation:

> 


Yes, You absolutely right, we need to change this documentation.

> "

>  * @return

>  *   - 0 when the Ethernet device is removed, otherwise 1.

> "

> 

> > +

> > +	ret = dev->dev_ops->is_removed(dev);

> > +	if (ret != 0)

> > +		dev->state = RTE_ETH_DEV_REMOVED;

> 

> It isn't clear what "dev_ops->is_removed(dev)" should return, and this

> causing incompatible usages in PMDs by time.

> Please add some documentation about expected return values for dev_ops.

>


OK
 
> 

> And this not real remove, PMD signals us and we stop using that device, but

> device can be there, right?


It says that the device is physically removed but there is some software resources which still were not released.

> If there is a real removal, can be possible to use eal hotplug?


I think EAL hotplug is asynchrony  as the current RMV event , so EAl hotplug event can be used instead of RMV event.
 


> <...>
  

Patch

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index b349599..c93cec1 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -114,7 +114,8 @@  enum {
 rte_eth_find_next(uint16_t port_id)
 {
 	while (port_id < RTE_MAX_ETHPORTS &&
-	       rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED)
+	       rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED &&
+	       rte_eth_devices[port_id].state != RTE_ETH_DEV_REMOVED)
 		port_id++;
 
 	if (port_id >= RTE_MAX_ETHPORTS)
@@ -262,8 +263,7 @@  struct rte_eth_dev *
 rte_eth_dev_is_valid_port(uint16_t port_id)
 {
 	if (port_id >= RTE_MAX_ETHPORTS ||
-	    (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED &&
-	     rte_eth_devices[port_id].state != RTE_ETH_DEV_DEFERRED))
+	    (rte_eth_devices[port_id].state == RTE_ETH_DEV_UNUSED))
 		return 0;
 	else
 		return 1;
@@ -1094,6 +1094,28 @@  struct rte_eth_dev *
 }
 
 int
+rte_eth_dev_is_removed(uint16_t port_id)
+{
+	struct rte_eth_dev *dev;
+	int ret;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
+
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->is_removed, 0);
+
+	if (dev->state == RTE_ETH_DEV_REMOVED)
+		return 1;
+
+	ret = dev->dev_ops->is_removed(dev);
+	if (ret != 0)
+		dev->state = RTE_ETH_DEV_REMOVED;
+
+	return ret;
+}
+
+int
 rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		       uint16_t nb_rx_desc, unsigned int socket_id,
 		       const struct rte_eth_rxconf *rx_conf,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index f0eeefe..18c14e9 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1169,6 +1169,9 @@  struct rte_eth_dcb_info {
 typedef int (*eth_dev_reset_t)(struct rte_eth_dev *dev);
 /** <@internal Function used to reset a configured Ethernet device. */
 
+typedef int (*eth_is_removed_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to detect an Ethernet device removal. */
+
 typedef void (*eth_promiscuous_enable_t)(struct rte_eth_dev *dev);
 /**< @internal Function used to enable the RX promiscuous mode of an Ethernet device. */
 
@@ -1498,6 +1501,8 @@  struct eth_dev_ops {
 	eth_dev_close_t            dev_close;     /**< Close device. */
 	eth_dev_reset_t		   dev_reset;	  /**< Reset device. */
 	eth_link_update_t          link_update;   /**< Get device link state. */
+	eth_is_removed_t           is_removed;
+	/**< Check if the device was physically removed. */
 
 	eth_promiscuous_enable_t   promiscuous_enable; /**< Promiscuous ON. */
 	eth_promiscuous_disable_t  promiscuous_disable;/**< Promiscuous OFF. */
@@ -1684,6 +1689,7 @@  enum rte_eth_dev_state {
 	RTE_ETH_DEV_UNUSED = 0,
 	RTE_ETH_DEV_ATTACHED,
 	RTE_ETH_DEV_DEFERRED,
+	RTE_ETH_DEV_REMOVED,
 };
 
 /**
@@ -1970,6 +1976,20 @@  int rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_queue,
 void _rte_eth_dev_reset(struct rte_eth_dev *dev);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Check if an Ethernet device was physically removed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @return
+ *   - 0 when the Ethernet device is removed, otherwise 1.
+ */
+int
+rte_eth_dev_is_removed(uint16_t port_id);
+
+/**
  * Allocate and set up a receive queue for an Ethernet device.
  *
  * The function allocates a contiguous block of memory for *nb_rx_desc*
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index e9681ac..88b7908 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -201,6 +201,7 @@  DPDK_17.11 {
 EXPERIMENTAL {
 	global:
 
+	rte_eth_dev_is_removed;
 	rte_mtr_capabilities_get;
 	rte_mtr_create;
 	rte_mtr_destroy;