[dpdk-users] Slow DPDK startup with many 1G hugepages

Imre Pinter imre.pinter at ericsson.com
Tue Jun 6 14:39:47 CEST 2017


Hi guys,

Thanks for the replies. See my comments inline.


-----Original Message-----
From: Tan, Jianfeng [mailto:jianfeng.tan at intel.com] 
Sent: 2017. június 2. 3:40
To: Marco Varlese <marco.varlese at suse.com>; Imre Pinter <imre.pinter at ericsson.com>; users at dpdk.org
Cc: Gabor Halász <gabor.halasz at ericsson.com>; Péter Suskovics <peter.suskovics at ericsson.com>
Subject: RE: [dpdk-users] Slow DPDK startup with many 1G hugepages



> -----Original Message-----
> From: Marco Varlese [mailto:marco.varlese at suse.com]
> Sent: Thursday, June 1, 2017 6:12 PM
> To: Tan, Jianfeng; Imre Pinter; users at dpdk.org
> Cc: Gabor Halász; Péter Suskovics
> Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
> 
> On Thu, 2017-06-01 at 08:50 +0000, Tan, Jianfeng wrote:
> >
> > >
> > > -----Original Message-----
> > > From: users [mailto:users-bounces at dpdk.org] On Behalf Of Imre 
> > > Pinter
> > > Sent: Thursday, June 1, 2017 3:55 PM
> > > To: users at dpdk.org
> > > Cc: Gabor Halász; Péter Suskovics
> > > Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages
> > >
> > > Hi,
> > >
> > > We experience slow startup time in DPDK-OVS, when backing memory
> with
> > > 1G hugepages instead of 2M hugepages.
> > > Currently we're mapping 2M hugepages as memory backend for DPDK
> OVS.
> > > In the future we would like to allocate this memory from the 1G
> hugepage
> > > pool. Currently in our deployments we have significant amount of 
> > > 1G hugepages allocated (min. 54G) for VMs and only 2G memory on 2M 
> > > hugepages.
> > >
> > > Typical setup for 2M hugepages:
> > >                 GRUB:
> > > hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54 
> > > default_hugepagesz=1G
> > >
> > > $ grep hugetlbfs /proc/mounts
> > > nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0 nodev 
> > > /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> > >
> > > Typical setup for 1GB hugepages:
> > > GRUB:
> > > hugepagesz=1G hugepages=56 default_hugepagesz=1G
> > >
> > > $ grep hugetlbfs /proc/mounts
> > > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> > >
> > > DPDK OVS startup times based on the ovs-vswitchd.log logs:
> > >
> > >   *   2M (2G memory allocated) - startup time ~3 sec:
> > >
> > > 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
> 0x1
> > > --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024
> > >
> > > 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev at ovs-
> netdev:
> > > Datapath supports recirculation
> > >
> > >   *   1G (56G memory allocated) - startup time ~13 sec:
> > > 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
> 0x1
> > > --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
> > > 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev at ovs-
> netdev:
> > > Datapath supports recirculation
> > > I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04 
> > > with kernel 3.13.0-117-generic and 4.4.0-78-generic.
> >
> >
> > You can shorten the time by this:
> >
> > (1) Mount 1 GB hugepages into two directories.
> > nodev /mnt/huge_ovs_1G hugetlbfs rw,relatime,pagesize=1G,size=<how
> much you
> > want to use in OVS> 0 0
> > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> I understood (reading Imre) that this does not really work because of 
> non- deterministic allocation of hugepages in a NUMA architecture.
> e.g. we would end up (potentially) using hugepages allocated on 
> different nodes even when accessing the OVS directory.
> Did I understand this correctly?

Did you try step 2? And Sergio also gives more options on another email in this thread for your reference.

Thanks,
Jianfeng

@Jianfeng: Step (1) will not help in our case. Hence 'mount' will not allocate hugepages from NUMA1 till the system has free hugepages on NUMA0.
I have 56G hugepages allocated from 1G size. This means 28-28G hugepages available per NUMA node. If mounting action is performed via fstab, then we'll end up in one of the following scenarios randomly.
First mount for OVS, then for VMs:
+---------------------------------------+---------------------------------------+
|                 NUMA0                 |                 NUMA1                 |
+---------------------------------------+---------------------------------------+
| OVS(2G) |           VMs(26G)          |               VMs (28G)               |
+---------------------------------------+---------------------------------------+

First mount for VMs, then OVS:
+---------------------------------------+---------------------------------------+
|                 NUMA0                 |                 NUMA1                 |
+---------------------------------------+---------------------------------------+
|               VMs (28G)               |           VMs(26G)          | OVS(2G) |
+---------------------------------------+---------------------------------------+
@Marco: After the hugepages were allocated, the ones in OVS directory were either from NUMA0, or NUMA1, but not from both (different setup come after a roboot). This caused error in DPDK startup, hence 1-1 hugepages were requested from both NUMA nodes, and there was no hugepages allocated to the other NUMA node.

> 
> >
> > (2) Force to use memory  interleave policy $ numactl 
> > --interleave=all ovs-vswitchd ...
> >
> > Note: keep the huge-dir and socket-mem option, "--huge-dir
> /mnt/huge_ovs_1G --
> > socket-mem 1024,1024".
> >
@Jianfeng: If I perform Step (1), then Step (2) 'numactl --interleave=all ovs-vswitchd ...' cannot help, because all the hugepages mounted to OVS directory will be from one of the NUMA nodes. The DPDK application requires 1-1G hugepage from both of the NUMA nodes, so DPDK returns with an error.
I have also tried without Step (1), and we still has the slower startup.
Currently I'm looking into Sergio's mail.

Br,
Imre


More information about the users mailing list