[dpdk-dev] net/failsafe: improve stats accuracy

Message ID 1508139710-12798-1-git-send-email-matan@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Matan Azrad Oct. 16, 2017, 7:41 a.m. UTC
  The stats_get API was changed to return error in case of failure at
stats getting process time.
By this way, failsafe can get stats snapshot in removal process for
each PMD which can give stats after removal event.

This patch implements ultimate stats snapshot on removal time by
trying to get the removed device stats before calling to dev_close.

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/failsafe/failsafe_ether.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
  

Comments

Gaëtan Rivet Oct. 16, 2017, 8:27 a.m. UTC | #1
Hey Matan,

Thanks for the patch. Here are some remarks:

On Mon, Oct 16, 2017 at 07:41:50AM +0000, Matan Azrad wrote:
> The stats_get API was changed to return error in case of failure at
> stats getting process time.
> By this way, failsafe can get stats snapshot in removal process for
> each PMD which can give stats after removal event.
> 
> This patch implements ultimate stats snapshot on removal time by
> trying to get the removed device stats before calling to dev_close.
> 

I would reformulate the commit log as such:

The stats_get API has changed to signal a potential failure to read
stats. Furthermore, some PMDs are able to provide statistics even
after a removal event occured.

Considering this, the fail-safe can try to access the latest statistics
of a PMD to improve statistics readings.

Attempt an ultimate statistics read on removal time;
if that fails, use the latest recorded snapshot.

> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  drivers/net/failsafe/failsafe_ether.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
> index f4db423..2758d4c 100644
> --- a/drivers/net/failsafe/failsafe_ether.c
> +++ b/drivers/net/failsafe/failsafe_ether.c
> @@ -312,8 +312,12 @@
>  static void
>  fs_dev_stats_save(struct sub_device *sdev)
>  {
> +	struct rte_eth_stats stats;
> +
> +	/* Get stats now or take it from last snapshot. */
>  	failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
> -			&sdev->stats_snapshot);
> +		rte_eth_stats_get(PORT_ID(sdev), &stats) ?
> +		&sdev->stats_snapshot : &stats);

The conditional is awkward within the parameter list here.
Something like:

    struct rte_eth_stats stats;
    int err;

    /* Attempt to read current stats. */
    err = rte_eth_stats_get(PORT_ID(sdev), &stats);
    if (err)
        WARN("Could not access latest statistics from sub-device %d, using latest snapshot",
             SUB_ID(sdev));
    /* Use either current stats or latest snapshot. */
    failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
                             err ? &sdev->stats_snapshot : &stats);

Seems easier to read and allows warning the user of potential inaccuracies.
On that matter, it may be interesting to measure the time since the last snapshot
and display it as well.

>  	memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
>  }
>  
> -- 
> 1.8.3.1
>
  
Matan Azrad Oct. 16, 2017, 8:51 a.m. UTC | #2
Hi Geatan
> -----Original Message-----
> From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com]
> Sent: Monday, October 16, 2017 11:28 AM
> To: Matan Azrad <matan@mellanox.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH] net/failsafe: improve stats accuracy
> 
> Hey Matan,
> 
> Thanks for the patch. Here are some remarks:
> 
> On Mon, Oct 16, 2017 at 07:41:50AM +0000, Matan Azrad wrote:
> > The stats_get API was changed to return error in case of failure at
> > stats getting process time.
> > By this way, failsafe can get stats snapshot in removal process for
> > each PMD which can give stats after removal event.
> >
> > This patch implements ultimate stats snapshot on removal time by
> > trying to get the removed device stats before calling to dev_close.
> >
> 
> I would reformulate the commit log as such:
> 
> The stats_get API has changed to signal a potential failure to read stats.
> Furthermore, some PMDs are able to provide statistics even after a removal
> event occured.
> 
> Considering this, the fail-safe can try to access the latest statistics of a PMD to
> improve statistics readings.
> 
> Attempt an ultimate statistics read on removal time; if that fails, use the
> latest recorded snapshot.
> 
Looks ok for me, thanks.

> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > ---
> >  drivers/net/failsafe/failsafe_ether.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > b/drivers/net/failsafe/failsafe_ether.c
> > index f4db423..2758d4c 100644
> > --- a/drivers/net/failsafe/failsafe_ether.c
> > +++ b/drivers/net/failsafe/failsafe_ether.c
> > @@ -312,8 +312,12 @@
> >  static void
> >  fs_dev_stats_save(struct sub_device *sdev)  {
> > +	struct rte_eth_stats stats;
> > +
> > +	/* Get stats now or take it from last snapshot. */
> >  	failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
> > -			&sdev->stats_snapshot);
> > +		rte_eth_stats_get(PORT_ID(sdev), &stats) ?
> > +		&sdev->stats_snapshot : &stats);
> 
> The conditional is awkward within the parameter list here.
> Something like:
> 
>     struct rte_eth_stats stats;
>     int err;
> 
>     /* Attempt to read current stats. */
>     err = rte_eth_stats_get(PORT_ID(sdev), &stats);
>     if (err)
>         WARN("Could not access latest statistics from sub-device %d, using latest
> snapshot",
>              SUB_ID(sdev));
>     /* Use either current stats or latest snapshot. */
>     failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
>                              err ? &sdev->stats_snapshot : &stats);
> 
> Seems easier to read and allows warning the user of potential inaccuracies.
> On that matter, it may be interesting to measure the time since the last
> snapshot and display it as well.
>
OK, I think it is good idea, I will try to add this here.
 
> >  	memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));  }
> >
> > --
> > 1.8.3.1
> >
> 
> --
> Gaëtan Rivet
> 6WIND
  

Patch

diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index f4db423..2758d4c 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -312,8 +312,12 @@ 
 static void
 fs_dev_stats_save(struct sub_device *sdev)
 {
+	struct rte_eth_stats stats;
+
+	/* Get stats now or take it from last snapshot. */
 	failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
-			&sdev->stats_snapshot);
+		rte_eth_stats_get(PORT_ID(sdev), &stats) ?
+		&sdev->stats_snapshot : &stats);
 	memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
 }