[PATCH] net/iavf:fix slow memory allocation

You, KaisenX kaisenx.you at intel.com
Tue Dec 20 11:11:58 CET 2022



> -----Original Message-----
> From: David Marchand <david.marchand at redhat.com>
> Sent: 2022年12月20日 17:34
> To: You, KaisenX <kaisenx.you at intel.com>
> Cc: Ferruh Yigit <ferruh.yigit at amd.com>; dev at dpdk.org; Burakov, Anatoly
> <anatoly.burakov at intel.com>; stable at dpdk.org; Yang, Qiming
> <qiming.yang at intel.com>; Zhou, YidingX <yidingx.zhou at intel.com>; Wu,
> Jingjing <jingjing.wu at intel.com>; Xing, Beilei <beilei.xing at intel.com>; Zhang,
> Qi Z <qi.z.zhang at intel.com>; Luca Boccassi <bluca at debian.org>; Mcnamara,
> John <john.mcnamara at intel.com>; Kevin Traynor <ktraynor at redhat.com>
> Subject: Re: [PATCH] net/iavf:fix slow memory allocation
> 
> On Tue, Dec 20, 2022 at 7:52 AM You, KaisenX <kaisenx.you at intel.com>
> wrote:
> > > >> As to the reason for not using rte_malloc_socket. I thought
> > > >> rte_malloc_socket() could solve the problem too. And the
> > > >> appropriate parameter should be the socket_id that created the
> > > >> memory pool for DPDK initialization. Assuming that> the socket_id
> > > >> of the initially allocated memory = 1, first let the
> > > > eal_intr_thread
> > > >> determine if it is on the socket_id, then record this socket_id
> > > >> in the eal_intr_thread and pass it to the iavf_event_thread.  But
> > > >> there seems no way to link this parameter to the
> > > >> iavf_dev_event_post()
> > > function. That is why rte_malloc_socket is not used.
> > > >>
> > > >
> > > > I was thinking socket id of device can be used, but that won't
> > > > help if the core that interrupt handler runs is in different socket.
> > > > And I also don't know if there is a way to get socket that
> > > > interrupt thread is on. @David may help perhaps.
> > > >
> > > > So question is why interrupt thread is not running on main lcore.
> > > >
> > >
> > > OK after some talk with David, what I am missing is
> 'rte_ctrl_thread_create()'
> > > does NOT run on main lcore, it can run on any core except data plane
> cores.
> > >
> > > Driver "iavf-event-thread" thread (iavf_dev_event_handle()) and
> > > interrupt thread (so driver interrupt callback
> > > iavf_dev_event_post()) can run on any core, making it hard to manage.
> > > And it seems it is not possible to control where interrupt thread to run.
> > >
> > > One option can be allocating hugepages for all sockets, but this
> > > requires user involvement, and can't happen transparently.
> > >
> > > Other option can be to control where "iavf-event-thread" run, like
> > > using 'rte_thread_create()' to create thread and provide attribute
> > > to run it on main lcore (rte_lcore_cpuset(rte_get_main_lcore()))?
> > >
> > > Can you please test above option?
> > >
> > >
> > The first option can solve this issue. but to borrow from your
> > previous saying, "in a dual socket system, if all used cores are in
> > socket 1 and the NIC is in socket 1,  no memory is allocated for socket 0.
> This is to optimize memory consumption."
> > I think it's unreasonable to do so.
> >
> > About other option. In " rte_eal_intr_init" function, After the thread
> > is created, I set the thread affinity for eal-intr-thread, but it does not solve
> this issue.
> 
> Jumping in this thread.
> 
> I tried to play a bit with a E810 nic on a dual numa and I can't see anything
> wrong for now.
> Can you provide a simple and small reproducer of your issue?
> 
> Thanks.
> 
This is my environment:
Enter "lscpu" on the command line:
NUMA:
	NUMA node(s): 2
	NUMA node0 CPU(S) : 0-27,56-83
	NUMA node1 CPU(S) : 28-55,84-111

List the steps to reproduce the issue:

1. create vf and blind to dpdk
echo 1 > /sys/bus/pci/devices/0000\:ca\:00.0/sriov_ numvfs
./usertools/dpdk-devbind. py -b vfio-pci 0000:ca:01.0
2. launch testpmd
./x86_ 64-native-linuxapp-clang/app/dpdk-testpmd -l 28-48 -n 4 -a 0000:ca:01.0 
--file-prefix=dpdk_ 525342_ 20221104042659 -- -i --rxq=256 --txq=256 
--total-num-mbufs=500000

Parameter Description:
 "-l 28-48":The range of parameter values after "-l" must be on "NUMA node1 CPU(S)"
 "0000:ca:01.0":inset on node1
> --
> David Marchand



More information about the stable mailing list