[dpdk-dev] NUMA CPU Sockets and DPDK

Prashant Upadhyaya prashant.upadhyaya at aricent.com
Wed Feb 12 13:03:02 CET 2014


Hi Etai,

Ofcourse all DPDK threads consume 100 % (unless some waits are introduced for some power saving etc., all typical DPDK threads are while(1) loops)
When I said core 1 is unusually busy, I meant to say that it is not able to read beyond 2 Gbps or so and the packets are dropping at NIC.
(I have my own custom way of calculating the cpu utilization of core 1 based on how many empty polls were done and how many polls got me data which I then process)
On the 8 core machine with single socket, the core 1 was being able to lift successfully much higher data rates, hence the question.

Regards
-Prashant


-----Original Message-----
From: Etai Lev Ran [mailto:elevran at gmail.com]
Sent: Wednesday, February 12, 2014 5:18 PM
To: Prashant Upadhyaya
Cc: dev at dpdk.org
Subject: RE: [dpdk-dev] NUMA CPU Sockets and DPDK

Hi Prashant,

Based on our experience, using DPDK cross CPU sockets may indeed result in some performance degradation (~10% for our application vs. staying in socket. YMMV based on HW, application structure, etc.).

Regarding CPU utilization on core 1, the one picking up traffic: perhaps I had misunderstood your comment, but I would expect it to always be close to 100% since it's  polling the device via the PMD and not driven by interrupts.

Regards,
Etai

-----Original Message-----
From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Prashant Upadhyaya
Sent: Wednesday, February 12, 2014 1:28 PM
To: dev at dpdk.org
Subject: [dpdk-dev] NUMA CPU Sockets and DPDK

Hi guys,

What has been your experience of using DPDK based app's in NUMA mode with multiple sockets where some cores are present on one socket and other cores on some other socket.

I am migrating my application from one intel machine with 8 cores, all in one socket to a 32 core machine where 16 cores are in one socket and 16 other cores in the second socket.
My core 0 does all initialization for mbuf's, nic ports, queues etc. and uses SOCKET_ID_ANY for socket related parameters.

The usecase works, but I think I am running into performance issues on the
32 core machine.
The lscpu output on my 32 core machine shows the following - NUMA node0
CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
I am using core 1 to lift all the data from a single queue of an 82599EB port and I see that the cpu utilization for this core 1 is way too high even for lifting traffic of 1 Gbps with packet size of 650 bytes.

In general, does one need to be careful in working with multiple sockets and so forth, any comments would be helpful.

Regards
-Prashant





============================================================================
===
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
============================================================================
===





===============================================================================
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===============================================================================


More information about the dev mailing list