[PATCH v2 2/2] app/testpmd:fix testpmd quit failure

Wu, WenxuanX wenxuanx.wu at intel.com
Thu Mar 10 08:02:53 CET 2022



> -----Original Message-----
> From: Wu, WenxuanX
> Sent: 2022年3月9日 11:07
> To: Yigit, Ferruh <ferruh.yigit at intel.com>; Li, Xiaoyun <xiaoyun.li at intel.com>;
> dev at dpdk.org
> Cc: stable at dpdk.org
> Subject: RE: [PATCH v2 2/2] app/testpmd:fix testpmd quit failure
> 
> 
> 
> > -----Original Message-----
> > From: Yigit, Ferruh <ferruh.yigit at intel.com>
> > Sent: 2022年3月5日 0:16
> > To: Wu, WenxuanX <wenxuanx.wu at intel.com>; Li, Xiaoyun
> > <xiaoyun.li at intel.com>; dev at dpdk.org
> > Cc: stable at dpdk.org
> > Subject: Re: [PATCH v2 2/2] app/testpmd:fix testpmd quit failure
> >
> > On 3/3/2022 1:22 PM, Wu, WenxuanX wrote:
> >
> > moved down, please don't top post
> >
> > >> -----Original Message-----
> > >> From: Wu, WenxuanX <wenxuanx.wu at intel.com>
> > >> Sent: 2022年2月23日 19:33
> > >> To: Li, Xiaoyun <xiaoyun.li at intel.com>; Yigit, Ferruh
> > >> <ferruh.yigit at intel.com>; dev at dpdk.org
> > >> Cc: Wu, WenxuanX <wenxuanx.wu at intel.com>; stable at dpdk.org
> > >> Subject: [PATCH v2 2/2] app/testpmd:fix testpmd quit failure
> > >>
> > >> From: wenxuan wu <wenxuanx.wu at intel.com>
> > >>
> > >> When testpmd start ed with 1 pf and 2 vfs, testpmd quited while vfs
> > >> were still alive would result in failure. Root cause is that pf had
> > >> been released already but vfs were still accessing by func
> > >> rte_eth_dev_info_get, which would result in heap-free-after-use error.
> > >>
> > >> By quitting our ports in reverse order to avoid this.And the order
> > >> is guaranteed that vf are created after pfs.
> > >>
> > >> Fixes: d3a274ce9dee ("app/testpmd: handle SIGINT and SIGTERM")
> > >> Cc: stable at dpdk.org
> > >>
> > >> Signed-off-by: wenxuan wu <wenxuanx.wu at intel.com>
> > >> ---
> > >>   app/test-pmd/testpmd.c | 4 ++--
> > >>   1 file changed, 2 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> > >> e1da961311..698b6d8cc4 100644
> > >> --- a/app/test-pmd/testpmd.c
> > >> +++ b/app/test-pmd/testpmd.c
> > >> @@ -3384,12 +3384,12 @@ pmd_test_exit(void)  #endif
> > >>   	if (ports != NULL) {
> > >>   		no_link_check = 1;
> > >> -		RTE_ETH_FOREACH_DEV(pt_id) {
> > >> +		RTE_ETH_FOREACH_DEV_REVERSE(pt_id) {
> > >>   			printf("\nStopping port %d...\n", pt_id);
> > >>   			fflush(stdout);
> > >>   			stop_port(pt_id);
> > >>   		}
> > >> -		RTE_ETH_FOREACH_DEV(pt_id) {
> > >> +		RTE_ETH_FOREACH_DEV_REVERSE(pt_id) {
> > >>   			printf("\nShutting down port %d...\n", pt_id);
> > >>   			fflush(stdout);
> > >>   			close_port(pt_id);
> > >> --
> > >> 2.25.1
> > >
> > >
> > > I found this meaning in DPDK testplan.
> > > Note that currently hot-plugging of representor ports is not
> > > supported so all
> > the required representors must be specified on the creation of the PF
> > or the trusted VF.
> > > When testpmd is started with pf and vf representors, the order of
> > representor is determined on creation. So it is guaranteed that ,pf is
> > beneath the vf representors, we implemented in a reverse way is
> > acceptable just at present,  depends on when the hot-plugging of
> representor is supported.
> > >
> >
> > The patch mentions from PF and VFs, and now you are referring to port
> > representor.
> > Is the problem related to VF or port representor.
> >
> > For both, VF and port reporesentor should be closed before PF, that
> > part is OK. My comment is if reversing port id traverse will fix the
> > issue or do we need more complex solution.
> > Like have APIs to get VF and representor ports from a given port id,
> > and free them first.
The problem is that when I attempted to use a func like get_representor_info(pid,info); I didn't found this func implemented by driver ,
so I can not get the type of port(VF or PF ) directly by get_representor_info(pid,info),especially when the connected driver is i40e, 
representor_info_get occurred below.

./drivers/net/mlx5/mlx5.c:      .representor_info_get = mlx5_representor_info_get,
./drivers/net/mlx5/mlx5.c:      .representor_info_get = mlx5_representor_info_get,
./drivers/net/mlx5/mlx5.c:      .representor_info_get = mlx5_representor_info_get,
./drivers/net/mlx5/mlx5.h:int mlx5_representor_info_get(struct rte_eth_dev *dev,
./drivers/net/mlx5/mlx5_ethdev.c:mlx5_representor_info_get(struct rte_eth_dev *dev,
./drivers/net/sfc/sfc_ethdev.c:sfc_representor_info_get(struct rte_eth_dev *dev,
./drivers/net/sfc/sfc_ethdev.c: .representor_info_get           = sfc_representor_info_get,
./app/test-pmd/cmdline.c:       ret = rte_eth_representor_info_get(res->cmd_pid, NULL);
./app/test-pmd/cmdline.c:       ret = rte_eth_representor_info_get(res->cmd_pid, info);
./app/test-pmd/testpmd.c:       ret = rte_eth_representor_info_get(pi,&info);
./lib/ethdev/version.map:       rte_eth_representor_info_get;
./lib/ethdev/rte_ethdev.h:int rte_eth_representor_info_get(uint16_t port_id,
./lib/ethdev/ethdev_driver.h:typedef int (*eth_representor_info_get_t)(struct rte_eth_dev *dev,
./lib/ethdev/ethdev_driver.h:   eth_representor_info_get_t representor_info_get;
./lib/ethdev/ethdev_driver.h: * The mapping is retrieved from rte_eth_representor_info_get().
./lib/ethdev/rte_ethdev.c:      ret = rte_eth_representor_info_get(port_id, NULL);
./lib/ethdev/rte_ethdev.c:      ret = rte_eth_representor_info_get(port_id, info);
./lib/ethdev/rte_ethdev.c:rte_eth_representor_info_get(uint16_t port_id,
./lib/ethdev/rte_ethdev.c:      RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->representor_info_get, -ENOTSUP);
./lib/ethdev/rte_ethdev.c:      return eth_err(port_id, (*dev->dev_ops->representor_info_get)(dev, info));
We should focus on changing the logic of is_bonding_slave to avoid this bug ,right ? The procedure of port_close is like this :
FOR_EACH_DEV(pi):
	If Is_bonding_slave(pi)
		Continue;
	.... 
	Etc.
In is_bonding_slave(pi): 

This func get_dev_info(pid,dev_info) would be called to get dev.dev_flag &SLAVE ,this would 
be result in error ,when port sequence is like port 0  PF ,port 1 VF_REPR, port 2 VF_REPR, there is no 
obstacles in closing port 0 ,then pid iterate to port 1.

But port 1 is a VF_REPR which based on port 0 ,when in func is_bonding_slave(port 1), it would
 call get_dev_info(1,dev_info) ,error occurred. Because we use get_dev_info(especially when PF is released )
 not ports[1] which had been pre allocated . would result in this error.
 Two solutions :
                        1. Reverse order to close like I mentioned before (PF and VF_repr  order is determined at creation time with no representor hot_plug).
                        2.change func get_dev_info(pid,info) to  ports[pid)  to get dev_flag
Both can resolve this bug ,which one do you prefer?
             


More information about the stable mailing list