[root@localhost bin]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 58 Model name: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz Stepping: 9 CPU MHz: 3700.073 CPU max MHz: 3900.0000 CPU min MHz: 1600.0000 BogoMIPS: 6784.24 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts [root@localhost bin]# Not supported pdpe1gb There are many free 2M HugePages. HugePages_Total: 6656 HugePages_Free: 5682 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 236476 kB DirectMap2M: 33228800 kB test code char * t_mem1; char * t_mem2; int t_size = 1024*1024*1024; t_mem1 = rte_malloc(NULL,t_size,RTE_CACHE_LINE_SIZE); t_mem2 = rte_malloc(NULL,t_size,RTE_CACHE_LINE_SIZE); printf("rte_malloc1 t_mem1=%p \n",t_mem1); printf("rte_malloc1 t_mem2=%p \n",t_mem2); memset(t_mem1,0,t_size); memset(t_mem2,1,t_size); int t_i; for(t_i=0;t_i<t_size;t_i++) { if (t_mem1[t_i] ==1) { printf("rte_malloc find t_mem1=%p error t_i=%d %p=%d\n",t_mem1, t_i,&t_mem1[t_i],t_mem1[t_i] ); t_mem1[t_i] = 2; break; } } for(t_i=0;t_i<t_size;t_i++) { if (t_mem2[t_i] ==2) { printf("rte_malloc find t_mem2=%p error t_i=%d %p=%d\n",t_mem2, t_i,&t_mem2[t_i],t_mem2[t_i] ); break; } } run print: rte_malloc1 t_mem1=0x107c00000 rte_malloc1 t_mem2=0x140c00000 rte_malloc find t_mem1=0x107c00000 error t_i=956301312 0x140c00000=1 rte_malloc find t_mem2=0x140c00000 error t_i=0 0x140c00000=2 The two allocated blocks of memory overlap partially.
Unable to reproduce in 21.11 or main. It's weird that (t_mem2 - t_mem1) = 912 MB precisely on your system. It should be and it is 1024 MB + 192 B (requested + overhead) on my system. Tried both with and without -Dc_args=-DRTE_MALLOC_DEBUG=1. What is the distribution, compiler version, build flags?
The problem occurs only on low-end CPUs that lack hardware support for PDPE1GB. It resurfaces when using 2M Hugepages. DPDK 20 also experiences this issue!
When using 2M hugepages, there is an issue with allocating large chunks of memory, such as allocating several hundred megabytes of memory.
I ran my tests on a system with pdpe1gb support, but without 1G hugepages. For the sample code, DPDK allocated memory in two chunks of 513 x 2M hugepages. On v23.03-23-gd034467249: rte_malloc1 t_mem1=0x11805fffc0 rte_malloc1 t_mem2=0x11c07fffc0 Computing (t_mem2 - tmem1) = 0x11c07fffc0 - 0x11805fffc0 = 1073741824 + 2097152 = 1G + 2M exactly, which is OK. Please tell the distribution (uname -a), compiler version, and build flags.