net/bonding: ensure fairness among slaves

Message ID 20180919154825.5183-1-3chas3@gmail.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series net/bonding: ensure fairness among slaves |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Chas Williams Sept. 19, 2018, 3:48 p.m. UTC
  From: Chas Williams <chas3@att.com>

Some PMDs, especially ones with vector receives, require a minimum number
of receive buffers in order to receive any packets.  If the first slave
read leaves less than this number available, a read from the next slave
may return 0 implying that the slave doesn't have any packets which
results in skipping over that slave as the next active slave.

To fix this, implement round robin for the slaves during receive that
is only advanced to the next slave at the end of each receive burst.
This should also provide some additional fairness in processing in
bond_ethdev_rx_burst as well.

Fixes: 2efb58cbab6e ("bond: new link bonding library")
Cc: stable@dpdk.org

Signed-off-by: Chas Williams <chas3@att.com>
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 50 ++++++++++++++++++++++------------
 1 file changed, 32 insertions(+), 18 deletions(-)
  

Comments

Luca Boccassi Sept. 19, 2018, 4:06 p.m. UTC | #1
On Wed, 2018-09-19 at 11:48 -0400, Chas Williams wrote:
> From: Chas Williams <chas3@att.com>
> 
> Some PMDs, especially ones with vector receives, require a minimum
> number
> of receive buffers in order to receive any packets.  If the first
> slave
> read leaves less than this number available, a read from the next
> slave
> may return 0 implying that the slave doesn't have any packets which
> results in skipping over that slave as the next active slave.
> 
> To fix this, implement round robin for the slaves during receive that
> is only advanced to the next slave at the end of each receive burst.
> This should also provide some additional fairness in processing in
> bond_ethdev_rx_burst as well.
> 
> Fixes: 2efb58cbab6e ("bond: new link bonding library")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Chas Williams <chas3@att.com>
> ---
>  drivers/net/bonding/rte_eth_bond_pmd.c | 50 ++++++++++++++++++++++
> ------------
>  1 file changed, 32 insertions(+), 18 deletions(-)

Acked-by: Luca Boccassi <bluca@debian.org>
  
Matan Azrad Sept. 20, 2018, 6:28 a.m. UTC | #2
Hi Chas
Please see small comments.

> From: Chas Williams 
> Some PMDs, especially ones with vector receives, require a minimum
> number of receive buffers in order to receive any packets.  If the first slave
> read leaves less than this number available, a read from the next slave may
> return 0 implying that the slave doesn't have any packets which results in
> skipping over that slave as the next active slave.

It is true not only in case of 0.
It makes sense that the first polling slave gets the majority part of the burst while the others just get smaller part
I suggest to rephrase to the general issue . 

> 
> To fix this, implement round robin for the slaves during receive that is only
> advanced to the next slave at the end of each receive burst.
> This should also provide some additional fairness in processing in
> bond_ethdev_rx_burst as well.
> 
> Fixes: 2efb58cbab6e ("bond: new link bonding library")

If it is a fix, why not to use a fix title?
Maybe
net/bonding: fix the slaves Rx fairness 

> Cc: stable@dpdk.org
> 
> Signed-off-by: Chas Williams <chas3@att.com>
Besides that:
Acked-by: Matan Azrad <matan@mellanox.com>

> ---
>  drivers/net/bonding/rte_eth_bond_pmd.c | 50
> ++++++++++++++++++++++------------
>  1 file changed, 32 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> b/drivers/net/bonding/rte_eth_bond_pmd.c
> index b84f32263..f25faa75c 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -58,28 +58,33 @@ bond_ethdev_rx_burst(void *queue, struct
> rte_mbuf **bufs, uint16_t nb_pkts)  {
>  	struct bond_dev_private *internals;
> 
> -	uint16_t num_rx_slave = 0;
>  	uint16_t num_rx_total = 0;
> -
> +	uint16_t slave_count;
> +	uint16_t active_slave;
>  	int i;
> 
>  	/* Cast to structure, containing bonded device's port id and queue id
> */
>  	struct bond_rx_queue *bd_rx_q = (struct bond_rx_queue *)queue;
> -
>  	internals = bd_rx_q->dev_private;
> +	slave_count = internals->active_slave_count;
> +	active_slave = internals->active_slave;
> 
> +	for (i = 0; i < slave_count && nb_pkts; i++) {
> +		uint16_t num_rx_slave;
> 
> -	for (i = 0; i < internals->active_slave_count && nb_pkts; i++) {
>  		/* Offset of pointer to *bufs increases as packets are
> received
>  		 * from other slaves */
> -		num_rx_slave = rte_eth_rx_burst(internals-
> >active_slaves[i],
> +		num_rx_slave = rte_eth_rx_burst(
> +				internals->active_slaves[active_slave],
>  				bd_rx_q->queue_id, bufs + num_rx_total,
> nb_pkts);
> -		if (num_rx_slave) {
> -			num_rx_total += num_rx_slave;
> -			nb_pkts -= num_rx_slave;
> -		}
> +		num_rx_total += num_rx_slave;
> +		nb_pkts -= num_rx_slave;
> +		if (++active_slave == slave_count)
> +			active_slave = 0;
>  	}
> 
> +	if (++internals->active_slave == slave_count)
> +		internals->active_slave = 0;
>  	return num_rx_total;
>  }
> 
> @@ -258,25 +263,32 @@ bond_ethdev_rx_burst_8023ad_fast_queue(void
> *queue, struct rte_mbuf **bufs,
>  	uint16_t num_rx_total = 0;	/* Total number of received packets
> */
>  	uint16_t slaves[RTE_MAX_ETHPORTS];
>  	uint16_t slave_count;
> -
> -	uint16_t i, idx;
> +	uint16_t active_slave;
> +	uint16_t i;
> 
>  	/* Copy slave list to protect against slave up/down changes during tx
>  	 * bursting */
>  	slave_count = internals->active_slave_count;
> +	active_slave = internals->active_slave;
>  	memcpy(slaves, internals->active_slaves,
>  			sizeof(internals->active_slaves[0]) * slave_count);
> 
> -	for (i = 0, idx = internals->active_slave;
> -			i < slave_count && num_rx_total < nb_pkts; i++,
> idx++) {
> -		idx = idx % slave_count;
> +	for (i = 0; i < slave_count && nb_pkts; i++) {
> +		uint16_t num_rx_slave;
> 
>  		/* Read packets from this slave */
> -		num_rx_total += rte_eth_rx_burst(slaves[idx], bd_rx_q-
> >queue_id,
> -				&bufs[num_rx_total], nb_pkts -
> num_rx_total);
> +		num_rx_slave = rte_eth_rx_burst(slaves[active_slave],
> +						bd_rx_q->queue_id,
> +						bufs + num_rx_total,
> nb_pkts);
> +		num_rx_total += num_rx_slave;
> +		nb_pkts -= num_rx_slave;
> +
> +		if (++active_slave == slave_count)
> +			active_slave = 0;
>  	}
> 
> -	internals->active_slave = idx;
> +	if (++internals->active_slave == slave_count)
> +		internals->active_slave = 0;
> 
>  	return num_rx_total;
>  }
> @@ -459,7 +471,9 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct
> rte_mbuf **bufs,
>  			idx = 0;
>  	}
> 
> -	internals->active_slave = idx;
> +	if (++internals->active_slave == slave_count)
> +		internals->active_slave = 0;
> +
>  	return num_rx_total;
>  }
> 
> --
> 2.14.4
  
Chas Williams Sept. 20, 2018, 12:47 p.m. UTC | #3
On Thu, Sep 20, 2018 at 2:28 AM Matan Azrad <matan@mellanox.com> wrote:
>
> Hi Chas
> Please see small comments.
>
> > From: Chas Williams
> > Some PMDs, especially ones with vector receives, require a minimum
> > number of receive buffers in order to receive any packets.  If the first slave
> > read leaves less than this number available, a read from the next slave may
> > return 0 implying that the slave doesn't have any packets which results in
> > skipping over that slave as the next active slave.
>
> It is true not only in case of 0.
> It makes sense that the first polling slave gets the majority part of the burst while the others just get smaller part
> I suggest to rephrase to the general issue .

It doesn't happen for the 802.3ad burst routines in general.  If you
run out of buffers
then you don't advance to the next slave and that slave picks up where
you left off
during the next rx burst.  If some slave is attempting to do this, it
will consume all
the buffers and you will be at the next slave for the next rx and all
is well.  There are
just some odd corner cases, where you read just slow (or fast?) enough
that the first
slave leaves just a few buffers.  But reading the next slave results
in a 0 (because
of the vector RX), and you don't loop back around to the first slave.
So next time
around you start back at the troublesome slave.

The fix for the other RX burst routines is just completeness.

>
> >
> > To fix this, implement round robin for the slaves during receive that is only
> > advanced to the next slave at the end of each receive burst.
> > This should also provide some additional fairness in processing in
> > bond_ethdev_rx_burst as well.
> >
> > Fixes: 2efb58cbab6e ("bond: new link bonding library")
>
> If it is a fix, why not to use a fix title?
> Maybe
> net/bonding: fix the slaves Rx fairness

I can use the word fix I suppose.

>
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Chas Williams <chas3@att.com>
> Besides that:
> Acked-by: Matan Azrad <matan@mellanox.com>
>
> > ---
> >  drivers/net/bonding/rte_eth_bond_pmd.c | 50
> > ++++++++++++++++++++++------------
> >  1 file changed, 32 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> > b/drivers/net/bonding/rte_eth_bond_pmd.c
> > index b84f32263..f25faa75c 100644
> > --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> > +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> > @@ -58,28 +58,33 @@ bond_ethdev_rx_burst(void *queue, struct
> > rte_mbuf **bufs, uint16_t nb_pkts)  {
> >       struct bond_dev_private *internals;
> >
> > -     uint16_t num_rx_slave = 0;
> >       uint16_t num_rx_total = 0;
> > -
> > +     uint16_t slave_count;
> > +     uint16_t active_slave;
> >       int i;
> >
> >       /* Cast to structure, containing bonded device's port id and queue id
> > */
> >       struct bond_rx_queue *bd_rx_q = (struct bond_rx_queue *)queue;
> > -
> >       internals = bd_rx_q->dev_private;
> > +     slave_count = internals->active_slave_count;
> > +     active_slave = internals->active_slave;
> >
> > +     for (i = 0; i < slave_count && nb_pkts; i++) {
> > +             uint16_t num_rx_slave;
> >
> > -     for (i = 0; i < internals->active_slave_count && nb_pkts; i++) {
> >               /* Offset of pointer to *bufs increases as packets are
> > received
> >                * from other slaves */
> > -             num_rx_slave = rte_eth_rx_burst(internals-
> > >active_slaves[i],
> > +             num_rx_slave = rte_eth_rx_burst(
> > +                             internals->active_slaves[active_slave],
> >                               bd_rx_q->queue_id, bufs + num_rx_total,
> > nb_pkts);
> > -             if (num_rx_slave) {
> > -                     num_rx_total += num_rx_slave;
> > -                     nb_pkts -= num_rx_slave;
> > -             }
> > +             num_rx_total += num_rx_slave;
> > +             nb_pkts -= num_rx_slave;
> > +             if (++active_slave == slave_count)
> > +                     active_slave = 0;
> >       }
> >
> > +     if (++internals->active_slave == slave_count)
> > +             internals->active_slave = 0;
> >       return num_rx_total;
> >  }
> >
> > @@ -258,25 +263,32 @@ bond_ethdev_rx_burst_8023ad_fast_queue(void
> > *queue, struct rte_mbuf **bufs,
> >       uint16_t num_rx_total = 0;      /* Total number of received packets
> > */
> >       uint16_t slaves[RTE_MAX_ETHPORTS];
> >       uint16_t slave_count;
> > -
> > -     uint16_t i, idx;
> > +     uint16_t active_slave;
> > +     uint16_t i;
> >
> >       /* Copy slave list to protect against slave up/down changes during tx
> >        * bursting */
> >       slave_count = internals->active_slave_count;
> > +     active_slave = internals->active_slave;
> >       memcpy(slaves, internals->active_slaves,
> >                       sizeof(internals->active_slaves[0]) * slave_count);
> >
> > -     for (i = 0, idx = internals->active_slave;
> > -                     i < slave_count && num_rx_total < nb_pkts; i++,
> > idx++) {
> > -             idx = idx % slave_count;
> > +     for (i = 0; i < slave_count && nb_pkts; i++) {
> > +             uint16_t num_rx_slave;
> >
> >               /* Read packets from this slave */
> > -             num_rx_total += rte_eth_rx_burst(slaves[idx], bd_rx_q-
> > >queue_id,
> > -                             &bufs[num_rx_total], nb_pkts -
> > num_rx_total);
> > +             num_rx_slave = rte_eth_rx_burst(slaves[active_slave],
> > +                                             bd_rx_q->queue_id,
> > +                                             bufs + num_rx_total,
> > nb_pkts);
> > +             num_rx_total += num_rx_slave;
> > +             nb_pkts -= num_rx_slave;
> > +
> > +             if (++active_slave == slave_count)
> > +                     active_slave = 0;
> >       }
> >
> > -     internals->active_slave = idx;
> > +     if (++internals->active_slave == slave_count)
> > +             internals->active_slave = 0;
> >
> >       return num_rx_total;
> >  }
> > @@ -459,7 +471,9 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct
> > rte_mbuf **bufs,
> >                       idx = 0;
> >       }
> >
> > -     internals->active_slave = idx;
> > +     if (++internals->active_slave == slave_count)
> > +             internals->active_slave = 0;
> > +
> >       return num_rx_total;
> >  }
> >
> > --
> > 2.14.4
>
  

Patch

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index b84f32263..f25faa75c 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -58,28 +58,33 @@  bond_ethdev_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 {
 	struct bond_dev_private *internals;
 
-	uint16_t num_rx_slave = 0;
 	uint16_t num_rx_total = 0;
-
+	uint16_t slave_count;
+	uint16_t active_slave;
 	int i;
 
 	/* Cast to structure, containing bonded device's port id and queue id */
 	struct bond_rx_queue *bd_rx_q = (struct bond_rx_queue *)queue;
-
 	internals = bd_rx_q->dev_private;
+	slave_count = internals->active_slave_count;
+	active_slave = internals->active_slave;
 
+	for (i = 0; i < slave_count && nb_pkts; i++) {
+		uint16_t num_rx_slave;
 
-	for (i = 0; i < internals->active_slave_count && nb_pkts; i++) {
 		/* Offset of pointer to *bufs increases as packets are received
 		 * from other slaves */
-		num_rx_slave = rte_eth_rx_burst(internals->active_slaves[i],
+		num_rx_slave = rte_eth_rx_burst(
+				internals->active_slaves[active_slave],
 				bd_rx_q->queue_id, bufs + num_rx_total, nb_pkts);
-		if (num_rx_slave) {
-			num_rx_total += num_rx_slave;
-			nb_pkts -= num_rx_slave;
-		}
+		num_rx_total += num_rx_slave;
+		nb_pkts -= num_rx_slave;
+		if (++active_slave == slave_count)
+			active_slave = 0;
 	}
 
+	if (++internals->active_slave == slave_count)
+		internals->active_slave = 0;
 	return num_rx_total;
 }
 
@@ -258,25 +263,32 @@  bond_ethdev_rx_burst_8023ad_fast_queue(void *queue, struct rte_mbuf **bufs,
 	uint16_t num_rx_total = 0;	/* Total number of received packets */
 	uint16_t slaves[RTE_MAX_ETHPORTS];
 	uint16_t slave_count;
-
-	uint16_t i, idx;
+	uint16_t active_slave;
+	uint16_t i;
 
 	/* Copy slave list to protect against slave up/down changes during tx
 	 * bursting */
 	slave_count = internals->active_slave_count;
+	active_slave = internals->active_slave;
 	memcpy(slaves, internals->active_slaves,
 			sizeof(internals->active_slaves[0]) * slave_count);
 
-	for (i = 0, idx = internals->active_slave;
-			i < slave_count && num_rx_total < nb_pkts; i++, idx++) {
-		idx = idx % slave_count;
+	for (i = 0; i < slave_count && nb_pkts; i++) {
+		uint16_t num_rx_slave;
 
 		/* Read packets from this slave */
-		num_rx_total += rte_eth_rx_burst(slaves[idx], bd_rx_q->queue_id,
-				&bufs[num_rx_total], nb_pkts - num_rx_total);
+		num_rx_slave = rte_eth_rx_burst(slaves[active_slave],
+						bd_rx_q->queue_id,
+						bufs + num_rx_total, nb_pkts);
+		num_rx_total += num_rx_slave;
+		nb_pkts -= num_rx_slave;
+
+		if (++active_slave == slave_count)
+			active_slave = 0;
 	}
 
-	internals->active_slave = idx;
+	if (++internals->active_slave == slave_count)
+		internals->active_slave = 0;
 
 	return num_rx_total;
 }
@@ -459,7 +471,9 @@  bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf **bufs,
 			idx = 0;
 	}
 
-	internals->active_slave = idx;
+	if (++internals->active_slave == slave_count)
+		internals->active_slave = 0;
+
 	return num_rx_total;
 }