[dpdk-dev,1/2] net/mlx5: fix return value of start operation

Message ID 20180118130043.31773-1-olivier.matz@6wind.com (mailing list archive)
State Superseded, archived
Delegated to: Shahaf Shuler
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Olivier Matz Jan. 18, 2018, 1 p.m. UTC
  On error, mlx5_dev_start() does not return a negative value
as it is supposed to do. The consequence is that the application
(ex: testpmd) does not notice that the port is not started
and begins the rxtx on an uninitialized port, which crashes.

Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/mlx5/mlx5_trigger.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
  

Comments

Nélio Laranjeiro Jan. 18, 2018, 4:04 p.m. UTC | #1
On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> On error, mlx5_dev_start() does not return a negative value
> as it is supposed to do. The consequence is that the application
> (ex: testpmd) does not notice that the port is not started
> and begins the rxtx on an uninitialized port, which crashes.
> 
> Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  drivers/net/mlx5/mlx5_trigger.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> index 1a20967a2..44f702daa 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
>  		ERROR("%p: an error occurred while configuring control flows:"
>  		      " %s",
>  		      (void *)priv, strerror(err));
> +		err = -err;
>  		goto error;
>  	}
>  	err = priv_flow_start(priv, &priv->flows);
> @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
>  		ERROR("%p: an error occurred while configuring flows:"
>  		      " %s",
>  		      (void *)priv, strerror(err));
> +		err = -err;
>  		goto error;
>  	}
>  	err = priv_rx_intr_vec_enable(priv);
> @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
>  	priv_rxq_stop(priv);
>  	priv_flow_delete_drop_queue(priv);
>  	priv_unlock(priv);
> -	return -err;
> +	return err;
>  }
>  
>  /**

err in the function is handled with positives errno's, adding only those
two and returning err will make the other positive.

Regards,
  
Olivier Matz Jan. 18, 2018, 4:13 p.m. UTC | #2
On Thu, Jan 18, 2018 at 05:04:27PM +0100, Nélio Laranjeiro wrote:
> On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> > On error, mlx5_dev_start() does not return a negative value
> > as it is supposed to do. The consequence is that the application
> > (ex: testpmd) does not notice that the port is not started
> > and begins the rxtx on an uninitialized port, which crashes.
> > 
> > Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > ---
> >  drivers/net/mlx5/mlx5_trigger.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> > index 1a20967a2..44f702daa 100644
> > --- a/drivers/net/mlx5/mlx5_trigger.c
> > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> >  		ERROR("%p: an error occurred while configuring control flows:"
> >  		      " %s",
> >  		      (void *)priv, strerror(err));
> > +		err = -err;
> >  		goto error;
> >  	}
> >  	err = priv_flow_start(priv, &priv->flows);
> > @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> >  		ERROR("%p: an error occurred while configuring flows:"
> >  		      " %s",
> >  		      (void *)priv, strerror(err));
> > +		err = -err;
> >  		goto error;
> >  	}
> >  	err = priv_rx_intr_vec_enable(priv);
> > @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> >  	priv_rxq_stop(priv);
> >  	priv_flow_delete_drop_queue(priv);
> >  	priv_unlock(priv);
> > -	return -err;
> > +	return err;
> >  }
> >  
> >  /**
> 
> err in the function is handled with positives errno's, adding only those
> two and returning err will make the other positive.

I tried to check the return value of all functions called by mlx5_dev_start()
(negative or positive). Do you see something wrong?
  
Yongseok Koh Jan. 19, 2018, 6:28 a.m. UTC | #3
> On Jan 18, 2018, at 8:13 AM, Olivier Matz <olivier.matz@6wind.com> wrote:

> 

> On Thu, Jan 18, 2018 at 05:04:27PM +0100, Nélio Laranjeiro wrote:

>> On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:

>>> On error, mlx5_dev_start() does not return a negative value

>>> as it is supposed to do. The consequence is that the application

>>> (ex: testpmd) does not notice that the port is not started

>>> and begins the rxtx on an uninitialized port, which crashes.

>>> 

>>> Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")

>>> Cc: stable@dpdk.org

>>> 

>>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

>>> ---

>>> drivers/net/mlx5/mlx5_trigger.c | 4 +++-

>>> 1 file changed, 3 insertions(+), 1 deletion(-)

>>> 

>>> diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c

>>> index 1a20967a2..44f702daa 100644

>>> --- a/drivers/net/mlx5/mlx5_trigger.c

>>> +++ b/drivers/net/mlx5/mlx5_trigger.c

>>> @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)

>>> 		ERROR("%p: an error occurred while configuring control flows:"

>>> 		      " %s",

>>> 		      (void *)priv, strerror(err));

>>> +		err = -err;

>>> 		goto error;

>>> 	}

>>> 	err = priv_flow_start(priv, &priv->flows);

>>> @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)

>>> 		ERROR("%p: an error occurred while configuring flows:"

>>> 		      " %s",

>>> 		      (void *)priv, strerror(err));

>>> +		err = -err;

>>> 		goto error;

>>> 	}

>>> 	err = priv_rx_intr_vec_enable(priv);

>>> @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)

>>> 	priv_rxq_stop(priv);

>>> 	priv_flow_delete_drop_queue(priv);

>>> 	priv_unlock(priv);

>>> -	return -err;

>>> +	return err;

>>> }

>>> 

>>> /**

>> 

>> err in the function is handled with positives errno's, adding only those

>> two and returning err will make the other positive.

> 

> I tried to check the return value of all functions called by mlx5_dev_start()

> (negative or positive). Do you see something wrong?


Those two func calls have been moved recently. [1]
Please rebase it on top of dpdk-next-net-mlx/for-next-net

Then, the last change is okay to return negative values.

Nelio, we should make all the return values consistent someday, shouldn't we?

[1] http://dpdk.org/browse/next/dpdk-next-net-mlx/commit/?h=for-next-net&id=ed3d6afc9295bc16ab9ed2cad26af0c8cd9bd14e

Thanks,
Yongseok
  
Nélio Laranjeiro Jan. 19, 2018, 8:35 a.m. UTC | #4
On Thu, Jan 18, 2018 at 05:13:08PM +0100, Olivier Matz wrote:
> On Thu, Jan 18, 2018 at 05:04:27PM +0100, Nélio Laranjeiro wrote:
> > On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> > > On error, mlx5_dev_start() does not return a negative value
> > > as it is supposed to do. The consequence is that the application
> > > (ex: testpmd) does not notice that the port is not started
> > > and begins the rxtx on an uninitialized port, which crashes.
> > > 
> > > Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> > > Cc: stable@dpdk.org
> > > 
> > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > ---
> > >  drivers/net/mlx5/mlx5_trigger.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> > > index 1a20967a2..44f702daa 100644
> > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > >  		ERROR("%p: an error occurred while configuring control flows:"
> > >  		      " %s",
> > >  		      (void *)priv, strerror(err));
> > > +		err = -err;
> > >  		goto error;
> > >  	}
> > >  	err = priv_flow_start(priv, &priv->flows);
> > > @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > >  		ERROR("%p: an error occurred while configuring flows:"
> > >  		      " %s",
> > >  		      (void *)priv, strerror(err));
> > > +		err = -err;
> > >  		goto error;
> > >  	}
> > >  	err = priv_rx_intr_vec_enable(priv);
> > > @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > >  	priv_rxq_stop(priv);
> > >  	priv_flow_delete_drop_queue(priv);
> > >  	priv_unlock(priv);
> > > -	return -err;
> > > +	return err;
> > >  }
> > >  
> > >  /**
> > 
> > err in the function is handled with positives errno's, adding only those
> > two and returning err will make the other positive.
> 
> I tried to check the return value of all functions called by mlx5_dev_start()
> (negative or positive). Do you see something wrong?

What I mean is priv_flow_start() is returning a positive errno as all
other functions priv_*() that's the main reason for the final rteurn -err;

Internally MLX5 driver only works with positives errnos as lot of them
are retuning the values from ioctl directly.  Only the public ones are
returning negatives.
  
Olivier Matz Jan. 19, 2018, 8:43 a.m. UTC | #5
On Fri, Jan 19, 2018 at 09:35:01AM +0100, Nélio Laranjeiro wrote:
> On Thu, Jan 18, 2018 at 05:13:08PM +0100, Olivier Matz wrote:
> > On Thu, Jan 18, 2018 at 05:04:27PM +0100, Nélio Laranjeiro wrote:
> > > On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> > > > On error, mlx5_dev_start() does not return a negative value
> > > > as it is supposed to do. The consequence is that the application
> > > > (ex: testpmd) does not notice that the port is not started
> > > > and begins the rxtx on an uninitialized port, which crashes.
> > > > 
> > > > Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> > > > Cc: stable@dpdk.org
> > > > 
> > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > ---
> > > >  drivers/net/mlx5/mlx5_trigger.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> > > > index 1a20967a2..44f702daa 100644
> > > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > > @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > >  		ERROR("%p: an error occurred while configuring control flows:"
> > > >  		      " %s",
> > > >  		      (void *)priv, strerror(err));
> > > > +		err = -err;
> > > >  		goto error;
> > > >  	}
> > > >  	err = priv_flow_start(priv, &priv->flows);
> > > > @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > >  		ERROR("%p: an error occurred while configuring flows:"
> > > >  		      " %s",
> > > >  		      (void *)priv, strerror(err));
> > > > +		err = -err;
> > > >  		goto error;
> > > >  	}
> > > >  	err = priv_rx_intr_vec_enable(priv);
> > > > @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > >  	priv_rxq_stop(priv);
> > > >  	priv_flow_delete_drop_queue(priv);
> > > >  	priv_unlock(priv);
> > > > -	return -err;
> > > > +	return err;
> > > >  }
> > > >  
> > > >  /**
> > > 
> > > err in the function is handled with positives errno's, adding only those
> > > two and returning err will make the other positive.
> > 
> > I tried to check the return value of all functions called by mlx5_dev_start()
> > (negative or positive). Do you see something wrong?
> 
> What I mean is priv_flow_start() is returning a positive errno as all
> other functions priv_*() that's the main reason for the final rteurn -err;
> 
> Internally MLX5 driver only works with positives errnos as lot of them
> are retuning the values from ioctl directly.  Only the public ones are
> returning negatives.

So, what should I modify in this patch for v2?
Do you agree that there is a bug regarding the return value of mlx5_dev_start()?

It can be reproduced easily by starting testpmd on a dual-socket machine,
use the core and memory from socket 0, and have the mlx device on socket 1.
Then start traffic forwarding, it will crash.
  
Olivier Matz Jan. 19, 2018, 1:30 p.m. UTC | #6
On Fri, Jan 19, 2018 at 09:43:14AM +0100, Olivier Matz wrote:
> On Fri, Jan 19, 2018 at 09:35:01AM +0100, Nélio Laranjeiro wrote:
> > On Thu, Jan 18, 2018 at 05:13:08PM +0100, Olivier Matz wrote:
> > > On Thu, Jan 18, 2018 at 05:04:27PM +0100, Nélio Laranjeiro wrote:
> > > > On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> > > > > On error, mlx5_dev_start() does not return a negative value
> > > > > as it is supposed to do. The consequence is that the application
> > > > > (ex: testpmd) does not notice that the port is not started
> > > > > and begins the rxtx on an uninitialized port, which crashes.
> > > > > 
> > > > > Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> > > > > Cc: stable@dpdk.org
> > > > > 
> > > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > > ---
> > > > >  drivers/net/mlx5/mlx5_trigger.c | 4 +++-
> > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> > > > > index 1a20967a2..44f702daa 100644
> > > > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > > > @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > >  		ERROR("%p: an error occurred while configuring control flows:"
> > > > >  		      " %s",
> > > > >  		      (void *)priv, strerror(err));
> > > > > +		err = -err;
> > > > >  		goto error;
> > > > >  	}
> > > > >  	err = priv_flow_start(priv, &priv->flows);
> > > > > @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > >  		ERROR("%p: an error occurred while configuring flows:"
> > > > >  		      " %s",
> > > > >  		      (void *)priv, strerror(err));
> > > > > +		err = -err;
> > > > >  		goto error;
> > > > >  	}
> > > > >  	err = priv_rx_intr_vec_enable(priv);
> > > > > @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > >  	priv_rxq_stop(priv);
> > > > >  	priv_flow_delete_drop_queue(priv);
> > > > >  	priv_unlock(priv);
> > > > > -	return -err;
> > > > > +	return err;
> > > > >  }
> > > > >  
> > > > >  /**
> > > > 
> > > > err in the function is handled with positives errno's, adding only those
> > > > two and returning err will make the other positive.
> > > 
> > > I tried to check the return value of all functions called by mlx5_dev_start()
> > > (negative or positive). Do you see something wrong?
> > 
> > What I mean is priv_flow_start() is returning a positive errno as all
> > other functions priv_*() that's the main reason for the final rteurn -err;

priv_txq_start(priv);
  -> negative on error
priv_rxq_start(priv);
  -> negative on error
priv_dev_traffic_enable(priv, dev);
  -> positive (+errno) on error
priv_flow_start(priv, &priv->flows);
  -> positive (+errno) on error
priv_rx_intr_vec_enable(priv);
  -> negative (-1 or -errno) on error


> > Internally MLX5 driver only works with positives errnos as lot of them
> > are retuning the values from ioctl directly.  Only the public ones are
> > returning negatives.
> 
> So, what should I modify in this patch for v2?
> Do you agree that there is a bug regarding the return value of mlx5_dev_start()?
> 
> It can be reproduced easily by starting testpmd on a dual-socket machine,
> use the core and memory from socket 0, and have the mlx device on socket 1.
> Then start traffic forwarding, it will crash.
  
Olivier Matz Jan. 19, 2018, 1:42 p.m. UTC | #7
On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> On error, mlx5_dev_start() does not return a negative value
> as it is supposed to do. The consequence is that the application
> (ex: testpmd) does not notice that the port is not started
> and begins the rxtx on an uninitialized port, which crashes.
> 
> Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

For reference, here is how to reproduce the problem.


The topology of the target:

   socket 0                 socket 1
  +---------------------+  +---------------------+
  |        c0        c1 |  |        c0        c1 |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  | |  0| 16| |  1| 17| |  | |  8| 24| |  9| 25| |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  |        c2        c3 |  |        c2        c3 |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  | |  2| 18| |  3| 19| |  | | 10| 26| | 11| 27| |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  |        c4        c5 |  |        c4        c5 |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  | |  4| 20| |  5| 21| |  | | 12| 28| | 13| 29| |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  |        c6        c7 |  |        c6        c7 |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  | |  6| 22| |  7| 23| |  | | 14| 30| | 15| 31| |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  +---------------------+  +---------------------+

The cx4 devices are on socket 1, but I use cores and memory from
socket 0. I know it is not optimal, but it should work.



root@dut-cx4:~# cd dpdk.org
root@dut-cx4:~/dpdk.org# make config T=x86_64-native-linuxapp-gcc
root@dut-cx4:~/dpdk.org# make -j32
root@dut-cx4:~/dpdk.org# mkdir -p /mnt/huge
root@dut-cx4:~/dpdk.org# mount -t hugetlbfs nodev /mnt/huge
root@dut-cx4:~/dpdk.org# echo 256 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
root@dut-cx4:~/dpdk.org# testpmd -l 0,2 -- --total-num-mbufs=16384 -i --port-topology=chained --no-numa
EAL: Detected 32 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:02:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:02:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:02:00.2 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:02:00.3 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL:   probe driver: 14e4:16d7 net_bnxt
EAL: PCI device 0000:04:00.1 on NUMA socket 0
EAL:   probe driver: 14e4:16d7 net_bnxt
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:83:00.0 on NUMA socket 1
EAL:   probe driver: 8086:10fb net_ixgbe
EAL: PCI device 0000:83:00.1 on NUMA socket 1
EAL:   probe driver: 8086:10fb net_ixgbe
EAL: PCI device 0000:86:00.0 on NUMA socket 1
EAL:   probe driver: 15b3:1013 net_mlx5
PMD: net_mlx5: PCI information matches, using device "mlx5_0" (SR-IOV: false)
PMD: net_mlx5: 1 port(s) detected
PMD: net_mlx5: MPS is disabled
PMD: net_mlx5: port 1 MAC address is e4:1d:2d:e7:0d:06
EAL: PCI device 0000:86:00.1 on NUMA socket 1
EAL:   probe driver: 15b3:1013 net_mlx5
PMD: net_mlx5: PCI information matches, using device "mlx5_1" (SR-IOV: false)
PMD: net_mlx5: 1 port(s) detected
PMD: net_mlx5: MPS is disabled
PMD: net_mlx5: port 1 MAC address is e4:1d:2d:e7:0d:07
Interactive-mode selected
USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=16384, size=2176, socket=0
Configuring Port 0 (socket 0)
PMD: net_mlx5: 0x1459e40: TX queues number update: 0 -> 1
PMD: net_mlx5: 0x1459e40: RX queues number update: 0 -> 1
PMD: net_mlx5: cannot allocate CQ for drop queue
PMD: net_mlx5: 0x1459e40: Drop queue allocation failed: Unknown error -1
Port 0: E4:1D:2D:E7:0D:06
Configuring Port 1 (socket 0)
PMD: net_mlx5: 0x145dec0: TX queues number update: 0 -> 1
PMD: net_mlx5: 0x145dec0: RX queues number update: 0 -> 1
PMD: net_mlx5: cannot allocate CQ for drop queue
PMD: net_mlx5: 0x145dec0: Drop queue allocation failed: Unknown error -1
Port 1: E4:1D:2D:E7:0D:07
Checking link statuses...
Done
testpmd> start
Segmentation fault (core dumped)
  
Nélio Laranjeiro Jan. 19, 2018, 1:43 p.m. UTC | #8
On Fri, Jan 19, 2018 at 02:30:45PM +0100, Olivier Matz wrote:
> On Fri, Jan 19, 2018 at 09:43:14AM +0100, Olivier Matz wrote:
> > On Fri, Jan 19, 2018 at 09:35:01AM +0100, Nélio Laranjeiro wrote:
> > > On Thu, Jan 18, 2018 at 05:13:08PM +0100, Olivier Matz wrote:
> > > > On Thu, Jan 18, 2018 at 05:04:27PM +0100, Nélio Laranjeiro wrote:
> > > > > On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> > > > > > On error, mlx5_dev_start() does not return a negative value
> > > > > > as it is supposed to do. The consequence is that the application
> > > > > > (ex: testpmd) does not notice that the port is not started
> > > > > > and begins the rxtx on an uninitialized port, which crashes.
> > > > > > 
> > > > > > Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> > > > > > Cc: stable@dpdk.org
> > > > > > 
> > > > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > > > ---
> > > > > >  drivers/net/mlx5/mlx5_trigger.c | 4 +++-
> > > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> > > > > > index 1a20967a2..44f702daa 100644
> > > > > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > > > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > > > > @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > >  		ERROR("%p: an error occurred while configuring control flows:"
> > > > > >  		      " %s",
> > > > > >  		      (void *)priv, strerror(err));
> > > > > > +		err = -err;
> > > > > >  		goto error;
> > > > > >  	}
> > > > > >  	err = priv_flow_start(priv, &priv->flows);
> > > > > > @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > >  		ERROR("%p: an error occurred while configuring flows:"
> > > > > >  		      " %s",
> > > > > >  		      (void *)priv, strerror(err));
> > > > > > +		err = -err;
> > > > > >  		goto error;
> > > > > >  	}
> > > > > >  	err = priv_rx_intr_vec_enable(priv);
> > > > > > @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > >  	priv_rxq_stop(priv);
> > > > > >  	priv_flow_delete_drop_queue(priv);
> > > > > >  	priv_unlock(priv);
> > > > > > -	return -err;
> > > > > > +	return err;
> > > > > >  }
> > > > > >  
> > > > > >  /**
> > > > > 
> > > > > err in the function is handled with positives errno's, adding only those
> > > > > two and returning err will make the other positive.
> > > > 
> > > > I tried to check the return value of all functions called by mlx5_dev_start()
> > > > (negative or positive). Do you see something wrong?
> > > 
> > > What I mean is priv_flow_start() is returning a positive errno as all
> > > other functions priv_*() that's the main reason for the final rteurn -err;
> 
> priv_txq_start(priv);
>   -> negative on error
> priv_rxq_start(priv);
>   -> negative on error
> priv_dev_traffic_enable(priv, dev);
>   -> positive (+errno) on error
> priv_flow_start(priv, &priv->flows);
>   -> positive (+errno) on error
> priv_rx_intr_vec_enable(priv);
>   -> negative (-1 or -errno) on error

What a mess, it should be all positives...
 
> > > Internally MLX5 driver only works with positives errnos as lot of them
> > > are retuning the values from ioctl directly.  Only the public ones are
> > > returning negatives.
> > 
> > So, what should I modify in this patch for v2?
> > Do you agree that there is a bug regarding the return value of mlx5_dev_start()?
> > 
> > It can be reproduced easily by starting testpmd on a dual-socket machine,
> > use the core and memory from socket 0, and have the mlx device on socket 1.
> > Then start traffic forwarding, it will crash.
  
Nélio Laranjeiro Jan. 19, 2018, 2:18 p.m. UTC | #9
On Fri, Jan 19, 2018 at 02:30:45PM +0100, Olivier Matz wrote:
> On Fri, Jan 19, 2018 at 09:43:14AM +0100, Olivier Matz wrote:
> > On Fri, Jan 19, 2018 at 09:35:01AM +0100, Nélio Laranjeiro wrote:
> > > On Thu, Jan 18, 2018 at 05:13:08PM +0100, Olivier Matz wrote:
> > > > On Thu, Jan 18, 2018 at 05:04:27PM +0100, Nélio Laranjeiro wrote:
> > > > > On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> > > > > > On error, mlx5_dev_start() does not return a negative value
> > > > > > as it is supposed to do. The consequence is that the application
> > > > > > (ex: testpmd) does not notice that the port is not started
> > > > > > and begins the rxtx on an uninitialized port, which crashes.
> > > > > > 
> > > > > > Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> > > > > > Cc: stable@dpdk.org
> > > > > > 
> > > > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > > > ---
> > > > > >  drivers/net/mlx5/mlx5_trigger.c | 4 +++-
> > > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> > > > > > index 1a20967a2..44f702daa 100644
> > > > > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > > > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > > > > @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > >  		ERROR("%p: an error occurred while configuring control flows:"
> > > > > >  		      " %s",
> > > > > >  		      (void *)priv, strerror(err));
> > > > > > +		err = -err;
> > > > > >  		goto error;
> > > > > >  	}
> > > > > >  	err = priv_flow_start(priv, &priv->flows);
> > > > > > @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > >  		ERROR("%p: an error occurred while configuring flows:"
> > > > > >  		      " %s",
> > > > > >  		      (void *)priv, strerror(err));
> > > > > > +		err = -err;
> > > > > >  		goto error;
> > > > > >  	}
> > > > > >  	err = priv_rx_intr_vec_enable(priv);
> > > > > > @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > >  	priv_rxq_stop(priv);
> > > > > >  	priv_flow_delete_drop_queue(priv);
> > > > > >  	priv_unlock(priv);
> > > > > > -	return -err;
> > > > > > +	return err;
> > > > > >  }
> > > > > >  
> > > > > >  /**
> > > > > 
> > > > > err in the function is handled with positives errno's, adding only those
> > > > > two and returning err will make the other positive.
> > > > 
> > > > I tried to check the return value of all functions called by mlx5_dev_start()
> > > > (negative or positive). Do you see something wrong?
> > > 
> > > What I mean is priv_flow_start() is returning a positive errno as all
> > > other functions priv_*() that's the main reason for the final rteurn -err;
> 
> priv_txq_start(priv);
>   -> negative on error
> priv_rxq_start(priv);
>   -> negative on error
> priv_dev_traffic_enable(priv, dev);
>   -> positive (+errno) on error
> priv_flow_start(priv, &priv->flows);
>   -> positive (+errno) on error
> priv_rx_intr_vec_enable(priv);
>   -> negative (-1 or -errno) on error

Indeed,

All this needs to be re-worked to only return negatives values, in the
mean time,

Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

> > > Internally MLX5 driver only works with positives errnos as lot of them
> > > are retuning the values from ioctl directly.  Only the public ones are
> > > returning negatives.
> > 
> > So, what should I modify in this patch for v2?
> > Do you agree that there is a bug regarding the return value of mlx5_dev_start()?
> > 
> > It can be reproduced easily by starting testpmd on a dual-socket machine,
> > use the core and memory from socket 0, and have the mlx device on socket 1.
> > Then start traffic forwarding, it will crash.
  

Patch

diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 1a20967a2..44f702daa 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -166,6 +166,7 @@  mlx5_dev_start(struct rte_eth_dev *dev)
 		ERROR("%p: an error occurred while configuring control flows:"
 		      " %s",
 		      (void *)priv, strerror(err));
+		err = -err;
 		goto error;
 	}
 	err = priv_flow_start(priv, &priv->flows);
@@ -173,6 +174,7 @@  mlx5_dev_start(struct rte_eth_dev *dev)
 		ERROR("%p: an error occurred while configuring flows:"
 		      " %s",
 		      (void *)priv, strerror(err));
+		err = -err;
 		goto error;
 	}
 	err = priv_rx_intr_vec_enable(priv);
@@ -196,7 +198,7 @@  mlx5_dev_start(struct rte_eth_dev *dev)
 	priv_rxq_stop(priv);
 	priv_flow_delete_drop_queue(priv);
 	priv_unlock(priv);
-	return -err;
+	return err;
 }
 
 /**