[PATCH 2/3] net/nfp: fix free resource problem

Chaoyong He chaoyong.he at corigine.com
Fri Jan 12 02:19:05 CET 2024


> On 1/11/2024 2:02 AM, Chaoyong He wrote:
> >> On 1/9/2024 7:56 AM, Chaoyong He wrote:
> >>>> On 12/18/2023 1:50 AM, Chaoyong He wrote:
> >>>>>> On 12/14/2023 10:24 AM, Chaoyong He wrote:
> >>>>>>> From: Long Wu <long.wu at corigine.com>
> >>>>>>>
> >>>>>>> Set the representor array to NULL to avoid that close interface
> >>>>>>> does not free some resource.
> >>>>>>>
> >>>>>>> Fixes: a135bc1644d6 ("net/nfp: fix resource leak for flower
> >>>>>>> firmware")
> >>>>>>> Cc: chaoyong.he at corigine.com
> >>>>>>> Cc: stable at dpdk.org
> >>>>>>>
> >>>>>>> Signed-off-by: Long Wu <long.wu at corigine.com>
> >>>>>>> Reviewed-by: Chaoyong He <chaoyong.he at corigine.com>
> >>>>>>> Reviewed-by: Peng Zhang <peng.zhang at corigine.com>
> >>>>>>> ---
> >>>>>>>  drivers/net/nfp/flower/nfp_flower_representor.c | 15
> >>>>>>> ++++++++++++++-
> >>>>>>>  1 file changed, 14 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/net/nfp/flower/nfp_flower_representor.c
> >>>>>>> b/drivers/net/nfp/flower/nfp_flower_representor.c
> >>>>>>> index 27ea3891bd..5f7c1fa737 100644
> >>>>>>> --- a/drivers/net/nfp/flower/nfp_flower_representor.c
> >>>>>>> +++ b/drivers/net/nfp/flower/nfp_flower_representor.c
> >>>>>>> @@ -294,17 +294,30 @@ nfp_flower_repr_tx_burst(void
> *tx_queue,
> >>>>>>> static int  nfp_flower_repr_uninit(struct rte_eth_dev *eth_dev)
> >>>>>>> {
> >>>>>>> +	uint16_t index;
> >>>>>>>  	struct nfp_flower_representor *repr;
> >>>>>>>
> >>>>>>>  	repr = eth_dev->data->dev_private;
> >>>>>>>  	rte_ring_free(repr->ring);
> >>>>>>>
> >>>>>>> +	if (repr->repr_type == NFP_REPR_TYPE_PHYS_PORT) {
> >>>>>>> +		index =
> NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM(repr-
> >>>>>>> port_id);
> >>>>>>> +		repr->app_fw_flower->phy_reprs[index] = NULL;
> >>>>>>> +	} else {
> >>>>>>> +		index = repr->vf_id;
> >>>>>>> +		repr->app_fw_flower->vf_reprs[index] = NULL;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>>  	return 0;
> >>>>>>>  }
> >>>>>>>
> >>>>>>>  static int
> >>>>>>> -nfp_flower_pf_repr_uninit(__rte_unused struct rte_eth_dev
> >>>>>>> *eth_dev)
> >>>>>>> +nfp_flower_pf_repr_uninit(struct rte_eth_dev *eth_dev)
> >>>>>>>  {
> >>>>>>> +	struct nfp_flower_representor *repr =
> >>>>>>> +eth_dev->data->dev_private;
> >>>>>>> +
> >>>>>>> +	repr->app_fw_flower->pf_repr = NULL;
> >>>>>>>
> >>>>>>
> >>>>>> Here it is assigned to NULL but is it freed? If freed, why not
> >>>>>> set to NULL where it is freed?
> >>>>>>
> >>>>>> Same for above phy_reprs & vf_reprs.
> >>>>>
> >>>>> The whole invoke view:
> >>>>> rte_eth_dev_close()
> >>>>>     --> nfp_flower_repr_dev_close()
> >>>>>         --> nfp_flower_repr_free()
> >>>>>             --> nfp_flower_pf_repr_uninit()
> >>>>>             --> nfp_flower_repr_uninit()
> >>>>>            // In these two functions, we just assigned to NULL but
> >>>>> not freed
> >> yet.
> >>>>>            // It is still refer by the `eth_dev->data->dev_private`.
> >>>>>     --> rte_eth_dev_release_port()
> >>>>>         --> rte_free(eth_dev->data->dev_private);
> >>>>>         // And here it is really freed (by the rte framework).
> >>>>>
> >>>>
> >>>> 'rte_eth_dev_release_port()' frees the device private data, but not
> >>>> all pointers, like 'repr->app_fw_flower->pf_repr', in the struct
> >>>> are freed, it is dev_close() or
> >>>> unint() functions responsibility.
> >>>>
> >>>> Can you please double check if
> >>>> 'eth_dev->data->dev_private->app_fw_flower->pf_repr' freed or not?
> >>>
> >>> (gdb) b nfp_flower_repr_dev_close
> >>> Breakpoint 1 at 0x7f839a4ad37f:
> >> file ../drivers/net/nfp/flower/nfp_flower_representor.c, line 356.
> >>> (gdb) c
> >>> Continuing.
> >>>
> >>> Thread 1 "dpdk-testpmd" hit Breakpoint 1, nfp_flower_repr_dev_close
> >> (dev=0x7f839aed2340 <rte_eth_devices>)
> >>>     at ../drivers/net/nfp/flower/nfp_flower_representor.c:356
> >>> 356             if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> >>> (gdb) n
> >>> 359             repr = dev->data->dev_private;
> >>> (gdb)
> >>> 360             app_fw_flower = repr->app_fw_flower;
> >>> (gdb)
> >>> 361             hw = app_fw_flower->pf_hw;
> >>> (gdb)
> >>> 362             pf_dev = hw->pf_dev;
> >>> (gdb)
> >>> 368             nfp_net_disable_queues(dev);
> >>> (gdb) p repr
> >>> $1 = (struct nfp_flower_representor *) 0x17c49c800
> >>> (gdb) p dev->data->dev_private
> >>> $2 = (void *) 0x17c49c800
> >>> (gdb) p repr->app_fw_flower->pf_repr
> >>> $3 = (struct nfp_flower_representor *) 0x17c49c800
> >>>
> >>> As we can see, these three pointers point the same block of memory.
> >>>
> >>
> >> Ahh, I missed that 'repr->app_fw_flower->pf_repr' points to
> >> 'dev_private', so your code makes sense.
> >>
> >> But if it is 'dev_private', why free it in 'nfp_pf_uninit()' as it
> >> will be freed by 'rte_eth_dev_release_port()'?
> >
> > Sorry, I'm not understanding this.
> > The 'dev_private' is a 'struct nfp_flower_representor *', and it will be freed in
> 'rte_eth_dev_release_port()'.
> > What I freed in 'nfp_pf_uninit()' is a 'struct nfp_pf_dev *', so I'm not catch
> your point about this.
> >
> >> Won't removing 'rte_free(pf_dev);' from 'nfp_pf_uninit()' will have
> >> the same effect, instead of setting it NULL in advance?
> >>
> >
> > If I remove the 'rte_free(pf_dev);' from 'nfp_pf_uninit()', there will be a
> memory leak as no one will free it, and actually I'm not 'setting it NULL in
> advance'.
> >
> > 359             repr = dev->data->dev_private;
> > 360             app_fw_flower = repr->app_fw_flower;
> > 361             hw = app_fw_flower->pf_hw;
> > 362             pf_dev = hw->pf_dev;
> >
> > Maybe you just confuse the 'pf_repr' and 'pf_dev'? Just a guess.
> >
> 
> Yes I did confuse those two, sorry about that.
> 
> 'repr->app_fw_flower->pf_repr' is 'dev_private', and I assumed you are setting
> it NULL to escape from double free (and was checking where that double free
> happens), but I guess that is not the case.
> 
> 'rte_eth_dev_destroy()' calls 'rte_eth_dev_release_port()' and frees
> 'dev_private' but 'repr->app_fw_flower->pf_repr' remains as dangling pointer
> and perhaps prevents 'nfp_flower_repr_dev_close()' move forward (because
> of "if (app_fw_flower->pf_repr != NULL)" check), and you are fixing it, is it the
> case?

Correct, that's what we want to do by this patch and where the problem is, your description is very clear and brief.



More information about the stable mailing list