[dpdk-dev,v2,4/4] net/failsafe: fix removed device handling

Message ID 1513175370-16583-5-git-send-email-matan@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Matan Azrad Dec. 13, 2017, 2:29 p.m. UTC
  There is time between the physical removal of the device until
sub-device PMDs get a RMV interrupt. At this time DPDK PMDs and
applications still don't know about the removal and may call sub-device
control operation which should return an error.

In previous code this error is reported to the application contrary to
fail-safe principle that the app should not be aware of device removal.

Add an removal check in each relevant control command error flow and
prevent an error report to application when the sub-device is removed.

Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD")
Fixes: b737a1e ("net/failsafe: support flow API")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/failsafe/failsafe_flow.c    | 18 ++++++++++-------
 drivers/net/failsafe/failsafe_ops.c     | 34 ++++++++++++++++++++++-----------
 drivers/net/failsafe/failsafe_private.h | 10 ++++++++++
 3 files changed, 44 insertions(+), 18 deletions(-)
  

Comments

Gaëtan Rivet Dec. 13, 2017, 3:16 p.m. UTC | #1
Hi Matan,

On Wed, Dec 13, 2017 at 02:29:30PM +0000, Matan Azrad wrote:
> There is time between the physical removal of the device until
> sub-device PMDs get a RMV interrupt. At this time DPDK PMDs and
> applications still don't know about the removal and may call sub-device
> control operation which should return an error.
> 
> In previous code this error is reported to the application contrary to
> fail-safe principle that the app should not be aware of device removal.
> 
> Add an removal check in each relevant control command error flow and
> prevent an error report to application when the sub-device is removed.
> 
> Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD")
> Fixes: b737a1e ("net/failsafe: support flow API")
> Cc: stable@dpdk.org
> 

This patch is not a fix.
It relies on an eth_dev API evolution. Without this evolution,
this patch is meaningless and would break compilation if backported in
stable branch.

Please remove those tags.

> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  drivers/net/failsafe/failsafe_flow.c    | 18 ++++++++++-------
>  drivers/net/failsafe/failsafe_ops.c     | 34 ++++++++++++++++++++++-----------
>  drivers/net/failsafe/failsafe_private.h | 10 ++++++++++
>  3 files changed, 44 insertions(+), 18 deletions(-)

< ... >

> +/*
> + * Check if sub device was removed.
> + */
> +static inline int
> +fs_is_removed(struct sub_device *sdev)
> +{
> +	if (sdev->remove == 1 || rte_eth_dev_is_removed(PORT_ID(sdev)) != 0)
> +		return 1;
> +	return 0;
> +}

Have you considered adding this check within the subdev iterator itself?
I think it would prevent you from having to add it to each return value
checks.

It is still MT-unsafe anyway.
  
Matan Azrad Dec. 13, 2017, 3:48 p.m. UTC | #2
Hi Gaetan
Thanks for the review.
Some comments..

> -----Original Message-----
> From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com]
> Sent: Wednesday, December 13, 2017 5:17 PM
> To: Matan Azrad <matan@mellanox.com>
> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Thomas Monjalon
> <thomas@monjalon.net>; dev@dpdk.org; stable@dpdk.org
> Subject: Re: [PATCH v2 4/4] net/failsafe: fix removed device handling
> 
> Hi Matan,
> 
> On Wed, Dec 13, 2017 at 02:29:30PM +0000, Matan Azrad wrote:
> > There is time between the physical removal of the device until
> > sub-device PMDs get a RMV interrupt. At this time DPDK PMDs and
> > applications still don't know about the removal and may call
> > sub-device control operation which should return an error.
> >
> > In previous code this error is reported to the application contrary to
> > fail-safe principle that the app should not be aware of device removal.
> >
> > Add an removal check in each relevant control command error flow and
> > prevent an error report to application when the sub-device is removed.
> >
> > Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD")
> > Fixes: b737a1e ("net/failsafe: support flow API")
> > Cc: stable@dpdk.org
> >
> 
> This patch is not a fix.
> It relies on an eth_dev API evolution. Without this evolution, this patch is
> meaningless and would break compilation if backported in stable branch.
> 

It is a fix because the bug is finally solved by this patch.
I agree that it cannot be backported itself, but maybe all the series should be backported.
Other idea:
Add new patch which documents the bug and backport it.
Remove it in this patch and remove cc stable from it.
What do you think?

> Please remove those tags.
> 
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > ---
> >  drivers/net/failsafe/failsafe_flow.c    | 18 ++++++++++-------
> >  drivers/net/failsafe/failsafe_ops.c     | 34 ++++++++++++++++++++++-----
> ------
> >  drivers/net/failsafe/failsafe_private.h | 10 ++++++++++
> >  3 files changed, 44 insertions(+), 18 deletions(-)
> 
> < ... >
> 
> > +/*
> > + * Check if sub device was removed.
> > + */
> > +static inline int
> > +fs_is_removed(struct sub_device *sdev) {
> > +	if (sdev->remove == 1 || rte_eth_dev_is_removed(PORT_ID(sdev))
> != 0)
> > +		return 1;
> > +	return 0;
> > +}
> 
> Have you considered adding this check within the subdev iterator itself?
> I think it would prevent you from having to add it to each return value
> checks.
> 
> It is still MT-unsafe anyway.
>

This fix doesn't come to solve the MT issue, It comes to solve the error report to application because of removal.
Adding the check in subdev iterator doesn't make sense for this issue.

Matan. 
> --
> Gaëtan Rivet
> 6WIND
  

Patch

diff --git a/drivers/net/failsafe/failsafe_flow.c b/drivers/net/failsafe/failsafe_flow.c
index 153ceee..1e39d66 100644
--- a/drivers/net/failsafe/failsafe_flow.c
+++ b/drivers/net/failsafe/failsafe_flow.c
@@ -87,7 +87,7 @@ 
 		DEBUG("Calling rte_flow_validate on sub_device %d", i);
 		ret = rte_flow_validate(PORT_ID(sdev),
 				attr, patterns, actions, error);
-		if (ret) {
+		if (ret && fs_is_removed(sdev) == 0) {
 			ERROR("Operation rte_flow_validate failed for sub_device %d"
 			      " with error %d", i, ret);
 			return ret;
@@ -111,7 +111,7 @@ 
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
 		flow->flows[i] = rte_flow_create(PORT_ID(sdev),
 				attr, patterns, actions, error);
-		if (flow->flows[i] == NULL) {
+		if (flow->flows[i] == NULL && fs_is_removed(sdev) == 0) {
 			ERROR("Failed to create flow on sub_device %d",
 				i);
 			goto err;
@@ -150,7 +150,7 @@ 
 			continue;
 		local_ret = rte_flow_destroy(PORT_ID(sdev),
 				flow->flows[i], error);
-		if (local_ret) {
+		if (local_ret && fs_is_removed(sdev) == 0) {
 			ERROR("Failed to destroy flow on sub_device %d: %d",
 					i, local_ret);
 			if (ret == 0)
@@ -175,7 +175,7 @@ 
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
 		DEBUG("Calling rte_flow_flush on sub_device %d", i);
 		ret = rte_flow_flush(PORT_ID(sdev), error);
-		if (ret) {
+		if (ret && fs_is_removed(sdev) == 0) {
 			ERROR("Operation rte_flow_flush failed for sub_device %d"
 			      " with error %d", i, ret);
 			return ret;
@@ -199,8 +199,12 @@ 
 
 	sdev = TX_SUBDEV(dev);
 	if (sdev != NULL) {
-		return rte_flow_query(PORT_ID(sdev),
-				flow->flows[SUB_ID(sdev)], type, arg, error);
+		int ret = rte_flow_query(PORT_ID(sdev),
+					 flow->flows[SUB_ID(sdev)],
+					 type, arg, error);
+
+		if (ret && fs_is_removed(sdev) == 0)
+			return ret;
 	}
 	WARN("No active sub_device to query about its flow");
 	return -1;
@@ -223,7 +227,7 @@ 
 			WARN("flow isolation mode of sub_device %d in incoherent state.",
 				i);
 		ret = rte_flow_isolate(PORT_ID(sdev), set, error);
-		if (ret) {
+		if (ret && fs_is_removed(sdev) == 0) {
 			ERROR("Operation rte_flow_isolate failed for sub_device %d"
 			      " with error %d", i, ret);
 			return ret;
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index e16a590..6799b55 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -121,6 +121,8 @@ 
 					dev->data->nb_tx_queues,
 					&dev->data->dev_conf);
 		if (ret) {
+			if (fs_is_removed(sdev) != 0)
+				continue;
 			ERROR("Could not configure sub_device %d", i);
 			return ret;
 		}
@@ -163,8 +165,11 @@ 
 			continue;
 		DEBUG("Starting sub_device %d", i);
 		ret = rte_eth_dev_start(PORT_ID(sdev));
-		if (ret)
+		if (ret) {
+			if (fs_is_removed(sdev) != 0)
+				continue;
 			return ret;
+		}
 		sdev->state = DEV_STARTED;
 	}
 	if (PRIV(dev)->state < DEV_STARTED)
@@ -196,7 +201,7 @@ 
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
 		DEBUG("Calling rte_eth_dev_set_link_up on sub_device %d", i);
 		ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
-		if (ret) {
+		if (ret && fs_is_removed(sdev) == 0) {
 			ERROR("Operation rte_eth_dev_set_link_up failed for sub_device %d"
 			      " with error %d", i, ret);
 			return ret;
@@ -215,7 +220,7 @@ 
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
 		DEBUG("Calling rte_eth_dev_set_link_down on sub_device %d", i);
 		ret = rte_eth_dev_set_link_down(PORT_ID(sdev));
-		if (ret) {
+		if (ret && fs_is_removed(sdev) == 0) {
 			ERROR("Operation rte_eth_dev_set_link_down failed for sub_device %d"
 			      " with error %d", i, ret);
 			return ret;
@@ -300,7 +305,7 @@ 
 				rx_queue_id,
 				nb_rx_desc, socket_id,
 				rx_conf, mb_pool);
-		if (ret) {
+		if (ret && fs_is_removed(sdev) == 0) {
 			ERROR("RX queue setup failed for sub_device %d", i);
 			goto free_rxq;
 		}
@@ -366,7 +371,7 @@ 
 				tx_queue_id,
 				nb_tx_desc, socket_id,
 				tx_conf);
-		if (ret) {
+		if (ret && fs_is_removed(sdev) == 0) {
 			ERROR("TX queue setup failed for sub_device %d", i);
 			goto free_txq;
 		}
@@ -445,7 +450,7 @@ 
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
 		DEBUG("Calling link_update on sub_device %d", i);
 		ret = (SUBOPS(sdev, link_update))(ETH(sdev), wait_to_complete);
-		if (ret && ret != -1) {
+		if (ret && ret != -1 && fs_is_removed(sdev) == 0) {
 			ERROR("Link update failed for sub_device %d with error %d",
 			      i, ret);
 			return ret;
@@ -469,6 +474,7 @@ 
 fs_stats_get(struct rte_eth_dev *dev,
 	     struct rte_eth_stats *stats)
 {
+	struct rte_eth_stats backup;
 	struct sub_device *sdev;
 	uint8_t i;
 	int ret;
@@ -478,14 +484,20 @@ 
 		struct rte_eth_stats *snapshot = &sdev->stats_snapshot.stats;
 		uint64_t *timestamp = &sdev->stats_snapshot.timestamp;
 
+		rte_memcpy(&backup, snapshot, sizeof(backup));
 		ret = rte_eth_stats_get(PORT_ID(sdev), snapshot);
 		if (ret) {
+			if (fs_is_removed(sdev) != 0) {
+				rte_memcpy(snapshot, &backup, sizeof(backup));
+				goto inc;
+			}
 			ERROR("Operation rte_eth_stats_get failed for sub_device %d with error %d",
 				  i, ret);
 			*timestamp = 0;
 			return ret;
 		}
 		*timestamp = rte_rdtsc();
+inc:
 		failsafe_stats_increment(stats, snapshot);
 	}
 	return 0;
@@ -598,7 +610,7 @@ 
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
 		DEBUG("Calling rte_eth_dev_set_mtu on sub_device %d", i);
 		ret = rte_eth_dev_set_mtu(PORT_ID(sdev), mtu);
-		if (ret) {
+		if (ret && fs_is_removed(sdev) == 0) {
 			ERROR("Operation rte_eth_dev_set_mtu failed for sub_device %d with error %d",
 			      i, ret);
 			return ret;
@@ -617,7 +629,7 @@ 
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
 		DEBUG("Calling rte_eth_dev_vlan_filter on sub_device %d", i);
 		ret = rte_eth_dev_vlan_filter(PORT_ID(sdev), vlan_id, on);
-		if (ret) {
+		if (ret && fs_is_removed(sdev) == 0) {
 			ERROR("Operation rte_eth_dev_vlan_filter failed for sub_device %d"
 			      " with error %d", i, ret);
 			return ret;
@@ -651,7 +663,7 @@ 
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
 		DEBUG("Calling rte_eth_dev_flow_ctrl_set on sub_device %d", i);
 		ret = rte_eth_dev_flow_ctrl_set(PORT_ID(sdev), fc_conf);
-		if (ret) {
+		if (ret && fs_is_removed(sdev) == 0) {
 			ERROR("Operation rte_eth_dev_flow_ctrl_set failed for sub_device %d"
 			      " with error %d", i, ret);
 			return ret;
@@ -688,7 +700,7 @@ 
 	RTE_ASSERT(index < FAILSAFE_MAX_ETHADDR);
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
 		ret = rte_eth_dev_mac_addr_add(PORT_ID(sdev), mac_addr, vmdq);
-		if (ret) {
+		if (ret && fs_is_removed(sdev) == 0) {
 			ERROR("Operation rte_eth_dev_mac_addr_add failed for sub_device %"
 			      PRIu8 " with error %d", i, ret);
 			return ret;
@@ -730,7 +742,7 @@ 
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
 		DEBUG("Calling rte_eth_dev_filter_ctrl on sub_device %d", i);
 		ret = rte_eth_dev_filter_ctrl(PORT_ID(sdev), type, op, arg);
-		if (ret) {
+		if (ret && fs_is_removed(sdev) == 0) {
 			ERROR("Operation rte_eth_dev_filter_ctrl failed for sub_device %d"
 			      " with error %d", i, ret);
 			return ret;
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index d81cc3c..0539782 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -375,4 +375,14 @@  int failsafe_eth_lsc_event_callback(uint16_t port_id,
 	rte_wmb();
 }
 
+/*
+ * Check if sub device was removed.
+ */
+static inline int
+fs_is_removed(struct sub_device *sdev)
+{
+	if (sdev->remove == 1 || rte_eth_dev_is_removed(PORT_ID(sdev)) != 0)
+		return 1;
+	return 0;
+}
 #endif /* _RTE_ETH_FAILSAFE_PRIVATE_H_ */