[dpdk-dev] Question regarding throughput number with DPDK l2fwd with Wind River System's pktgen

Wiles, Roger Keith keith.wiles at windriver.com
Sun Sep 22 19:57:15 CEST 2013


Hi Chis,

Here is the email I replied to via pktgen at gmail.com<mailto:pktgen at gmail.com> to Jun Han, which happens to match Venky's statements as well :-) Let me know if you see anything else that maybe wrong with Pktgen, but the numbers are correct :-)

--------------------------------------------------------
Hi Jun,

That does make more sense with 12x10G ports. By other papers do you have some links or able to share those papers?

>From what I can tell you now have 12x10G which means you have 6x20GBits or 120Gbits of bi-directional bandwidth in the system. My previous email still holds true as Pktgen can send and receive traffic at 10Gbits/s for 64 bytes packets, which for a full-duplex port that is 20GBits of bandwidth. Using (PortCnt/2) * 20Gbit = 120Gbit is the way I calculate the performance. You can check with Intel, but the performance looks correct to me.

Only getting 80Gbits of performance for 64 byte packets seems low to me, as I would have expected 120Gbits or the same as 1500 byte packets. It is possible with your system has hit some bottleneck around the number of total packets per second. Normally this is memory bandwidth or PCI bandwidth or transactions per second on the PCI bus.

Run the system with 10 ports, 8 port, 6 ports, ... and see if the 64byte packet rate changes as this will tell you something about the system total bandwidth. For 10 you should get (10/2) * 20 = 100Gbits, ...

Thank you, ++Keith
-------------------------------
Keith Wiles
pktgen.dpdk at gmail.com<mailto:pktgen.dpdk at gmail.com>
Principal Technologist for Networking
Wind River Systems

On Sep 21, 2013, at 9:05 AM, Jun Han <junhanece at gmail.com<mailto:junhanece at gmail.com>> wrote:

Hi Keith,

I think you misunderstood my setup. As mentioned in my previous email, I have 6 dual-port 10Gbps NICs, meaning a total of 12 10G ports per machine. They are connected back to back to another machine with identical setup. Hence, we get a total of 120Gbps for 1500 Byte packets, and 80 Gbps for 64 Byte packets. We did our theoretical calculation and find that it should theoretically be possible as it does not hit the PCIe bandwidth or our machine, nor does it exceed QPI bandwidth when packets are forwarded over the NUMA node. Our machine block diagram is as shown below, with three NICs per riser slot. We were careful to pin the NIC ports appropriately to the cores of CPU sockets that are directly connected to their Riser Slots.

Do these numbers make sense to you? As stated in the previous email, we find that these numbers are much higher than other papers in this domain so I wanted to ask for your input or thought in this.

<image.png>

Thank you very much,

Jun


On Sat, Sep 21, 2013 at 4:58 AM, Pktgen DPDK <pktgen.dpdk at gmail.com<mailto:pktgen.dpdk at gmail.com>> wrote:
Hi Jun,

I do not have any numbers with that many ports as I have a very limited number of machines and 10G NICs. I can tell you that Pktgen if setup correctly and send 14.885 Mpps  (million packet per second) or wire rate for 64 byte packets. DPDK L2FWD code is able to forward wire rate for 64 byte packets. If each port is sending wire rate traffic and receiving wire rate traffic then you could have  10Gbits each direction or 20Gbits per port pair. You have 6 ports or 3 port pairs doing 20Gbits x 3 = 60Gbits of traffic at 64 byte packets. Assuming you do not hit a limit on the PCIe bus or NIC.

On my Westmere machine with total of 4 10G ports on two NIC cards I can not get 40Gbits of data, but I hit a PCIe bug and can only get about 32Gbits if I remember correctly. The newer systems do not have this bug.

Sending larger frames then 64bytes means you send fewer packets per second to obtain 10Gbits of data throughput. You can not get more then 10Gbits or 20Gbits (bi-directional traffic) per port.

If Pktgen is reporting more then 60Gbits per second for 6 ports of throughput then Pktgen has a bug. If Pktgen is reporting more then 10Gbits of traffic RX or Tx then Pktgen has a bug. I have never seen Pktgen report more then 10Gbits Rx or Tx.

The most thoughtput for 6 ports in this forwarding configuration would be 60Gbits (3 x 20Gbits). If you had each port sending and receiving traffic and not in a forwarding configuration then you could get 20Gbits per port or 120Gbits. Does this make sense?

Lets say on a single machine you loopback the Tx/Rx on each port so the packet sent is received by the same port then you would have 20Gbits of bi-directional traffic per port. The problem is that is not how your system is configured you are consuming two ports per 20Gbits of traffic.

I hope I have the above correct as it is late for me :-) If you see something wrong with my statements please let me know what I did wrong in my logic.

Thank you, ++Keith
-------------------------------
Keith Wiles
pktgen.dpdk at gmail.com<mailto:pktgen.dpdk at gmail.com>
Principal Technologist for Networking
Wind River Systems

On Sep 20, 2013, at 2:11 PM, Jun Han <junhanece at gmail.com<mailto:junhanece at gmail.com>> wrote:

Hi Ketih,

Thanks so much for all your prompt replies. Thanks to you, we are now utilizing your packet gen code.

We have a question about the performance numbers we are getting measured through your packet gen program. The current setup is the following:

We have two machines, each equipped with 6 dual-port 10 GbE NICs. Machine 0 runs DPDK L2FWD code, and Machine 1 runs your packet gen. L2FWD is modified to forward the incoming packets to other statically assigned output port. With this setup, we are getting 120 Gbps throughput measured by your packet gen with packet size 1500 Bytes. For 64 Byte packets, we are getting around 80 Gbps.

Do these performance numbers make sense? We are reading related papers in this domain, and seems like our numbers are unusually high. Could you please give us your thoughts on this or share your performance numbers with your setup?

Thank you so much,

JunKeith Wiles, Principal Technologist for Networking member of the CTO office, Wind River
direct 972.434.4136  mobile 940.213.5533  fax 000.000.0000
[Powering 30 Years of Innovation]<http://www.windriver.com/announces/wr30/>

On Sep 22, 2013, at 1:41 PM, Venkatesan, Venky <venky.venkatesan at intel.com<mailto:venky.venkatesan at intel.com>> wrote:

Chris,

The numbers you are getting are correct. :)

Practically speaking, most motherboards pin out between 4 and 5 x8 slots to every CPU socket. At PCI-E Gen 2 speeds (5 GT/s), each slot is capable of carrying 20 Gb/s of traffic  (limited to ~16 Gb/s of 64B packets). I would have expected the 64-byte  traffic capacity to be a bit higher than 80 Gb/s, but either way the numbers you are achieving are well within the capability of the system if you are careful about pinning cores to ports, which you seem to be doing. QPI is not a limiter either for the amount of traffic you are generating currently.

Regards,
-Venky

-----Original Message-----
From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Chris Pappas
Sent: Sunday, September 22, 2013 7:32 AM
To: dev at dpdk.org<mailto:dev at dpdk.org>
Subject: [dpdk-dev] Question regarding throughput number with DPDK l2fwd with Wind River System's pktgen

Hi,

We have a question about the performance numbers we are getting measured through the pktgen application provided by Wind River Systems. The current setup is the following:

We have two machines, each equipped with 6 dual-port 10 GbE NICs (with a total of 12 ports). Machine 0 runs DPDK L2FWD code, and Machine 1 runs Wind River System's pktgen. L2FWD is modified to forward the incoming packets to other statically assigned output port.

Our machines have two Intel Xeon E5-2600 CPUs connected via QPI, and has two riser slots each having three 10Gbps NICs. Two NICS in riser slot 1
(NIC0 and NIC1) is connected to CPU 1 via PCIe Gen3, while the remaining
NIC2 is connected to CPU2 also via PCIe Gen3. In riser slot 2, all NICs (NICs 3,4, and 5) are connected to CPU2 via PCIe Gen3. We were careful to assign the NIC ports to cores of CPU sockets that have direct physical connection to achieve max performance.


With this setup, we are getting 120 Gbps throughput measured by pktgen with packet size 1500 Bytes. For 64 Byte packets, we are getting around 80 Gbps.
Do these performance numbers make sense? We are reading related papers in this domain, and seems like our numbers are unusually high. We did our theoretical calculation and find that it should theoretically be possible because it does not hit the PCIe bandwidth or our machine, nor does it exceed QPI bandwidth when packets are forwarded over the NUMA node. Can you share your thoughts / experience with this?

Thank you,

Chris



More information about the dev mailing list