[dpdk-dev,v2,1/2] net/failsafe: fix removed sub-device cleanup
Checks
Commit Message
The fail-safe PMD registers to RMV event for each removable sub-device
port in order to cleanup the sub-device resources and switch the Tx
sub-device directly when it is plugged-out.
During removal time, the fail-safe PMD stops and closes the sub-device
but it doesn't unregister the LSC and RMV callbacks of the sub-device
port.
It can lead the callbacks to be called for a port which is no more
associated with the fail-safe sub-device, because there is not a
guarantee that a sub-device gets the same port ID for each plug-in
process. This port, for example, may belong to another sub-device of a
different fail-safe device.
Unregister the LSC and RMV callbacks for sub-devices which are not
used.
Fixes: 598fb8aec6f6 ("net/failsafe: support device removal")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
drivers/net/failsafe/failsafe_ether.c | 22 ++++++++++++++++++++++
drivers/net/failsafe/failsafe_ops.c | 5 +++++
drivers/net/failsafe/failsafe_private.h | 5 +++++
3 files changed, 32 insertions(+)
V2:
Improve the commit log and add code comments for the new sub-dev fields (Ophir suggestion).
Comments
Hello Matan,
On Mon, May 21, 2018 at 07:48:03PM +0000, Matan Azrad wrote:
> The fail-safe PMD registers to RMV event for each removable sub-device
> port in order to cleanup the sub-device resources and switch the Tx
> sub-device directly when it is plugged-out.
>
> During removal time, the fail-safe PMD stops and closes the sub-device
> but it doesn't unregister the LSC and RMV callbacks of the sub-device
> port.
>
> It can lead the callbacks to be called for a port which is no more
> associated with the fail-safe sub-device, because there is not a
> guarantee that a sub-device gets the same port ID for each plug-in
> process. This port, for example, may belong to another sub-device of a
> different fail-safe device.
>
> Unregister the LSC and RMV callbacks for sub-devices which are not
> used.
>
> Fixes: 598fb8aec6f6 ("net/failsafe: support device removal")
> Cc: stable@dpdk.org
>
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
> drivers/net/failsafe/failsafe_ether.c | 22 ++++++++++++++++++++++
> drivers/net/failsafe/failsafe_ops.c | 5 +++++
> drivers/net/failsafe/failsafe_private.h | 5 +++++
> 3 files changed, 32 insertions(+)
>
> V2:
> Improve the commit log and add code comments for the new sub-dev fields (Ophir suggestion).
>
>
> diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
> index 733e95d..2bbee82 100644
> --- a/drivers/net/failsafe/failsafe_ether.c
> +++ b/drivers/net/failsafe/failsafe_ether.c
> @@ -260,6 +260,7 @@
> sdev->state = DEV_ACTIVE;
> /* fallthrough */
> case DEV_ACTIVE:
> + failsafe_eth_dev_unregister_callbacks(sdev);
> rte_eth_dev_close(PORT_ID(sdev));
> sdev->state = DEV_PROBED;
> /* fallthrough */
> @@ -321,6 +322,27 @@
> }
>
> void
> +failsafe_eth_dev_unregister_callbacks(struct sub_device *sdev)
> +{
> + if (sdev == NULL)
> + return;
> + if (sdev->rmv_callback) {
> + rte_eth_dev_callback_unregister(PORT_ID(sdev),
> + RTE_ETH_EVENT_INTR_RMV,
> + failsafe_eth_rmv_event_callback,
> + sdev);
> + sdev->rmv_callback = 0;
I agree with Ophir here, either the return value should not be ignored,
and rmv_callback should only be set to 0 on success, or a proper
justification (and an accompanying comment) should be given.
The issue I could see is that even on error, there won't be a process to
try again unregistering the callback.
Maybe this could be added in failsafe_dev_remove()? Something like
FOREACH_SUBDEV(sdev, i, dev) {
if (sdev->rmv_callback && sdev->state <= DEV_PROBED)
if (rte_eth_dev_callback_unregister(...) == 0)
sdev->rmv_callback = 0;
/* same for lsc_callback */
}
Does it make sense to you? Do you think this is necessary, or should we
ignore this?
Thanks,
Hi Gaetan
From: Gaëtan Rivet
> Hello Matan,
>
> On Mon, May 21, 2018 at 07:48:03PM +0000, Matan Azrad wrote:
> > The fail-safe PMD registers to RMV event for each removable sub-device
> > port in order to cleanup the sub-device resources and switch the Tx
> > sub-device directly when it is plugged-out.
> >
> > During removal time, the fail-safe PMD stops and closes the sub-device
> > but it doesn't unregister the LSC and RMV callbacks of the sub-device
> > port.
> >
> > It can lead the callbacks to be called for a port which is no more
> > associated with the fail-safe sub-device, because there is not a
> > guarantee that a sub-device gets the same port ID for each plug-in
> > process. This port, for example, may belong to another sub-device of a
> > different fail-safe device.
> >
> > Unregister the LSC and RMV callbacks for sub-devices which are not
> > used.
> >
> > Fixes: 598fb8aec6f6 ("net/failsafe: support device removal")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > ---
> > drivers/net/failsafe/failsafe_ether.c | 22 ++++++++++++++++++++++
> > drivers/net/failsafe/failsafe_ops.c | 5 +++++
> > drivers/net/failsafe/failsafe_private.h | 5 +++++
> > 3 files changed, 32 insertions(+)
> >
> > V2:
> > Improve the commit log and add code comments for the new sub-dev fields
> (Ophir suggestion).
> >
> >
> > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > b/drivers/net/failsafe/failsafe_ether.c
> > index 733e95d..2bbee82 100644
> > --- a/drivers/net/failsafe/failsafe_ether.c
> > +++ b/drivers/net/failsafe/failsafe_ether.c
> > @@ -260,6 +260,7 @@
> > sdev->state = DEV_ACTIVE;
> > /* fallthrough */
> > case DEV_ACTIVE:
> > + failsafe_eth_dev_unregister_callbacks(sdev);
> > rte_eth_dev_close(PORT_ID(sdev));
> > sdev->state = DEV_PROBED;
> > /* fallthrough */
> > @@ -321,6 +322,27 @@
> > }
> >
> > void
> > +failsafe_eth_dev_unregister_callbacks(struct sub_device *sdev) {
> > + if (sdev == NULL)
> > + return;
> > + if (sdev->rmv_callback) {
> > + rte_eth_dev_callback_unregister(PORT_ID(sdev),
> > + RTE_ETH_EVENT_INTR_RMV,
> > + failsafe_eth_rmv_event_callback,
> > + sdev);
> > + sdev->rmv_callback = 0;
>
> I agree with Ophir here, either the return value should not be ignored, and
> rmv_callback should only be set to 0 on success, or a proper justification (and
> an accompanying comment) should be given.
>
> The issue I could see is that even on error, there won't be a process to try again
> unregistering the callback.
>
> Maybe this could be added in failsafe_dev_remove()? Something like
>
> FOREACH_SUBDEV(sdev, i, dev) {
> if (sdev->rmv_callback && sdev->state <= DEV_PROBED)
> if (rte_eth_dev_callback_unregister(...) == 0)
> sdev->rmv_callback = 0;
> /* same for lsc_callback */
> }
>
> Does it make sense to you? Do you think this is necessary, or should we ignore
> this?
The RMV\LSC event callbacks are called from the host thread and also the removal process is running from the host thread so I think EAGAIN is not expected in the removal time.
Other error (EINVAL) may return again every attempt and probably points to another critical issue.
Is a code comment for the above enough? Or you think we still need to check it?
> Thanks,
> --
> Gaëtan Rivet
> 6WIND
On Tue, May 22, 2018 at 10:19:14AM +0000, Matan Azrad wrote:
> Hi Gaetan
>
> From: Gaëtan Rivet
> > Hello Matan,
> >
> > On Mon, May 21, 2018 at 07:48:03PM +0000, Matan Azrad wrote:
> > > The fail-safe PMD registers to RMV event for each removable sub-device
> > > port in order to cleanup the sub-device resources and switch the Tx
> > > sub-device directly when it is plugged-out.
> > >
> > > During removal time, the fail-safe PMD stops and closes the sub-device
> > > but it doesn't unregister the LSC and RMV callbacks of the sub-device
> > > port.
> > >
> > > It can lead the callbacks to be called for a port which is no more
> > > associated with the fail-safe sub-device, because there is not a
> > > guarantee that a sub-device gets the same port ID for each plug-in
> > > process. This port, for example, may belong to another sub-device of a
> > > different fail-safe device.
> > >
> > > Unregister the LSC and RMV callbacks for sub-devices which are not
> > > used.
> > >
> > > Fixes: 598fb8aec6f6 ("net/failsafe: support device removal")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > ---
> > > drivers/net/failsafe/failsafe_ether.c | 22 ++++++++++++++++++++++
> > > drivers/net/failsafe/failsafe_ops.c | 5 +++++
> > > drivers/net/failsafe/failsafe_private.h | 5 +++++
> > > 3 files changed, 32 insertions(+)
> > >
> > > V2:
> > > Improve the commit log and add code comments for the new sub-dev fields
> > (Ophir suggestion).
> > >
> > >
> > > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > > b/drivers/net/failsafe/failsafe_ether.c
> > > index 733e95d..2bbee82 100644
> > > --- a/drivers/net/failsafe/failsafe_ether.c
> > > +++ b/drivers/net/failsafe/failsafe_ether.c
> > > @@ -260,6 +260,7 @@
> > > sdev->state = DEV_ACTIVE;
> > > /* fallthrough */
> > > case DEV_ACTIVE:
> > > + failsafe_eth_dev_unregister_callbacks(sdev);
> > > rte_eth_dev_close(PORT_ID(sdev));
> > > sdev->state = DEV_PROBED;
> > > /* fallthrough */
> > > @@ -321,6 +322,27 @@
> > > }
> > >
> > > void
> > > +failsafe_eth_dev_unregister_callbacks(struct sub_device *sdev) {
> > > + if (sdev == NULL)
> > > + return;
> > > + if (sdev->rmv_callback) {
> > > + rte_eth_dev_callback_unregister(PORT_ID(sdev),
> > > + RTE_ETH_EVENT_INTR_RMV,
> > > + failsafe_eth_rmv_event_callback,
> > > + sdev);
> > > + sdev->rmv_callback = 0;
> >
> > I agree with Ophir here, either the return value should not be ignored, and
> > rmv_callback should only be set to 0 on success, or a proper justification (and
> > an accompanying comment) should be given.
> >
> > The issue I could see is that even on error, there won't be a process to try again
> > unregistering the callback.
> >
> > Maybe this could be added in failsafe_dev_remove()? Something like
> >
> > FOREACH_SUBDEV(sdev, i, dev) {
> > if (sdev->rmv_callback && sdev->state <= DEV_PROBED)
> > if (rte_eth_dev_callback_unregister(...) == 0)
> > sdev->rmv_callback = 0;
> > /* same for lsc_callback */
> > }
> >
> > Does it make sense to you? Do you think this is necessary, or should we ignore
> > this?
>
> The RMV\LSC event callbacks are called from the host thread and also the removal process is running from the host thread so I think EAGAIN is not expected in the removal time.
> Other error (EINVAL) may return again every attempt and probably points to another critical issue.
>
> Is a code comment for the above enough? Or you think we still need to check it?
>
>
Ok, that makes sense.
If EINVAL is possible however, I think a warning would be helpful for
the user to be aware of the issue. The callback flag would then be
meaningless anyway.
From: Gaëtan Rivet
> On Tue, May 22, 2018 at 10:19:14AM +0000, Matan Azrad wrote:
> > Hi Gaetan
> >
> > From: Gaëtan Rivet
> > > Hello Matan,
> > >
> > > On Mon, May 21, 2018 at 07:48:03PM +0000, Matan Azrad wrote:
> > > > The fail-safe PMD registers to RMV event for each removable
> > > > sub-device port in order to cleanup the sub-device resources and
> > > > switch the Tx sub-device directly when it is plugged-out.
> > > >
> > > > During removal time, the fail-safe PMD stops and closes the
> > > > sub-device but it doesn't unregister the LSC and RMV callbacks of
> > > > the sub-device port.
> > > >
> > > > It can lead the callbacks to be called for a port which is no more
> > > > associated with the fail-safe sub-device, because there is not a
> > > > guarantee that a sub-device gets the same port ID for each plug-in
> > > > process. This port, for example, may belong to another sub-device
> > > > of a different fail-safe device.
> > > >
> > > > Unregister the LSC and RMV callbacks for sub-devices which are not
> > > > used.
> > > >
> > > > Fixes: 598fb8aec6f6 ("net/failsafe: support device removal")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > > ---
> > > > drivers/net/failsafe/failsafe_ether.c | 22 ++++++++++++++++++++++
> > > > drivers/net/failsafe/failsafe_ops.c | 5 +++++
> > > > drivers/net/failsafe/failsafe_private.h | 5 +++++
> > > > 3 files changed, 32 insertions(+)
> > > >
> > > > V2:
> > > > Improve the commit log and add code comments for the new sub-dev
> > > > fields
> > > (Ophir suggestion).
> > > >
> > > >
> > > > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > > > b/drivers/net/failsafe/failsafe_ether.c
> > > > index 733e95d..2bbee82 100644
> > > > --- a/drivers/net/failsafe/failsafe_ether.c
> > > > +++ b/drivers/net/failsafe/failsafe_ether.c
> > > > @@ -260,6 +260,7 @@
> > > > sdev->state = DEV_ACTIVE;
> > > > /* fallthrough */
> > > > case DEV_ACTIVE:
> > > > + failsafe_eth_dev_unregister_callbacks(sdev);
> > > > rte_eth_dev_close(PORT_ID(sdev));
> > > > sdev->state = DEV_PROBED;
> > > > /* fallthrough */
> > > > @@ -321,6 +322,27 @@
> > > > }
> > > >
> > > > void
> > > > +failsafe_eth_dev_unregister_callbacks(struct sub_device *sdev) {
> > > > + if (sdev == NULL)
> > > > + return;
> > > > + if (sdev->rmv_callback) {
> > > > + rte_eth_dev_callback_unregister(PORT_ID(sdev),
> > > > + RTE_ETH_EVENT_INTR_RMV,
> > > > + failsafe_eth_rmv_event_callback,
> > > > + sdev);
> > > > + sdev->rmv_callback = 0;
> > >
> > > I agree with Ophir here, either the return value should not be
> > > ignored, and rmv_callback should only be set to 0 on success, or a
> > > proper justification (and an accompanying comment) should be given.
> > >
> > > The issue I could see is that even on error, there won't be a
> > > process to try again unregistering the callback.
> > >
> > > Maybe this could be added in failsafe_dev_remove()? Something like
> > >
> > > FOREACH_SUBDEV(sdev, i, dev) {
> > > if (sdev->rmv_callback && sdev->state <= DEV_PROBED)
> > > if (rte_eth_dev_callback_unregister(...) == 0)
> > > sdev->rmv_callback = 0;
> > > /* same for lsc_callback */
> > > }
> > >
> > > Does it make sense to you? Do you think this is necessary, or should
> > > we ignore this?
> >
> > The RMV\LSC event callbacks are called from the host thread and also the
> removal process is running from the host thread so I think EAGAIN is not
> expected in the removal time.
> > Other error (EINVAL) may return again every attempt and probably points to
> another critical issue.
> >
> > Is a code comment for the above enough? Or you think we still need to check
> it?
> >
> >
>
> Ok, that makes sense.
>
> If EINVAL is possible however, I think a warning would be helpful for the user to
> be aware of the issue. The callback flag would then be meaningless anyway.
Ok, thanks, V3 is coming.
>
> --
> Gaëtan Rivet
> 6WIND
@@ -260,6 +260,7 @@
sdev->state = DEV_ACTIVE;
/* fallthrough */
case DEV_ACTIVE:
+ failsafe_eth_dev_unregister_callbacks(sdev);
rte_eth_dev_close(PORT_ID(sdev));
sdev->state = DEV_PROBED;
/* fallthrough */
@@ -321,6 +322,27 @@
}
void
+failsafe_eth_dev_unregister_callbacks(struct sub_device *sdev)
+{
+ if (sdev == NULL)
+ return;
+ if (sdev->rmv_callback) {
+ rte_eth_dev_callback_unregister(PORT_ID(sdev),
+ RTE_ETH_EVENT_INTR_RMV,
+ failsafe_eth_rmv_event_callback,
+ sdev);
+ sdev->rmv_callback = 0;
+ }
+ if (sdev->lsc_callback) {
+ rte_eth_dev_callback_unregister(PORT_ID(sdev),
+ RTE_ETH_EVENT_INTR_LSC,
+ failsafe_eth_lsc_event_callback,
+ sdev);
+ sdev->lsc_callback = 0;
+ }
+}
+
+void
failsafe_dev_remove(struct rte_eth_dev *dev)
{
struct sub_device *sdev;
@@ -146,6 +146,8 @@
if (ret)
WARN("Failed to register RMV callback for sub_device %d",
SUB_ID(sdev));
+ else
+ sdev->rmv_callback = 1;
}
dev->data->dev_conf.intr_conf.rmv = 0;
if (lsc_interrupt) {
@@ -156,6 +158,8 @@
if (ret)
WARN("Failed to register LSC callback for sub_device %d",
SUB_ID(sdev));
+ else
+ sdev->lsc_callback = 1;
}
dev->data->dev_conf.intr_conf.lsc = lsc_enabled;
sdev->state = DEV_ACTIVE;
@@ -282,6 +286,7 @@
PRIV(dev)->state = DEV_ACTIVE - 1;
FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
DEBUG("Closing sub_device %d", i);
+ failsafe_eth_dev_unregister_callbacks(sdev);
rte_eth_dev_close(PORT_ID(sdev));
sdev->state = DEV_ACTIVE - 1;
}
@@ -119,6 +119,10 @@ struct sub_device {
volatile unsigned int remove:1;
/* flow isolation state */
int flow_isolated:1;
+ /* RMV callback registration state */
+ unsigned int rmv_callback:1;
+ /* LSC callback registration state */
+ unsigned int lsc_callback:1;
};
struct fs_priv {
@@ -211,6 +215,7 @@ uint16_t failsafe_tx_burst_fast(void *txq,
/* ETH_DEV */
int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev);
+void failsafe_eth_dev_unregister_callbacks(struct sub_device *sdev);
void failsafe_dev_remove(struct rte_eth_dev *dev);
void failsafe_stats_increment(struct rte_eth_stats *to,
struct rte_eth_stats *from);