[dpdk-dev] ways to generate 40Gbps with two NICs x two ports?

Wiles, Roger Keith keith.wiles at windriver.com
Tue Nov 19 22:18:55 CET 2013


Give this a try, if that does not work then something else is going on here. I am trying to make sure we do not cross the QPI for any reason putting the RX/TX queues related to a port on the same core.

sudo ./app/build/pktgen -c 3ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
"[2:4].0, [6:8].1, [3:5].2, [7:9].3" -f test/forward.lua

sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
"[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua

cores =  [0, 1, 2, 8, 9, 10]
sockets =  [1, 0]
       Socket 1        Socket 0
       ---------       ---------
Core 0  [0, 12]         [1, 13]
Core 1  [2, 14]         [3, 15]
Core 2  [4, 16]         [5, 17]
Core 8  [6, 18]         [7, 19]
Core 9  [8, 20]         [9, 21]
Core 10         [10, 22]        [11, 23]

Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River
mobile 940.213.5533
[Powering 30 Years of Innovation]<http://www.windriver.com/announces/wr30/>

On Nov 19, 2013, at 11:35 AM, jinho hwang <hwang.jinho at gmail.com<mailto:hwang.jinho at gmail.com>> wrote:

On Tue, Nov 19, 2013 at 12:24 PM, Wiles, Roger Keith
<keith.wiles at windriver.com<mailto:keith.wiles at windriver.com>> wrote:
Normally when I see this problem it means the the lcores are not mapped
correctly. What can happen is you have a Rx and a TX on the same physical
core or two RX/TX on the same physical core.

Make sure you have a Rx or Tx running on a single core look at the
cpu_layout.py output and verify the configuration is correct. If you have 8
physical cores in the then you need to make sure on one of the lcores on
that core is being used.

Let me know what happens.

Keith Wiles, Principal Technologist for Networking member of the CTO office,
Wind River
mobile 940.213.5533

On Nov 19, 2013, at 11:04 AM, jinho hwang <hwang.jinho at gmail.com<mailto:hwang.jinho at gmail.com>> wrote:

On Tue, Nov 19, 2013 at 11:54 AM, Wiles, Roger Keith
<keith.wiles at windriver.com<mailto:keith.wiles at windriver.com>> wrote:


BTW, the configuration looks fine, but you need to make sure the lcores are
not split between two different CPU sockets. You can use the
dpdk/tools/cpu_layout.py to do dump out the system configuration.


Keith Wiles, Principal Technologist for Networking member of the CTO office,
Wind River
mobile 940.213.5533


On Nov 19, 2013, at 10:42 AM, jinho hwang <hwang.jinho at gmail.com<mailto:hwang.jinho at gmail.com>> wrote:

On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith
<keith.wiles at windriver.com<mailto:keith.wiles at windriver.com>> wrote:

How do you have Pktgen configured in this case?

On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC
82599x two ports. My machine has a PCIe bug that does not allow me to send
on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but
the forth port does is about 70% of wire rate because of the PCIe hardware
bottle neck problem.

Keith Wiles, Principal Technologist for Networking member of the CTO office,
Wind River
direct 972.434.4136  mobile 940.213.5533  fax 000.000.0000

On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho at gmail.com<mailto:hwang.jinho at gmail.com>> wrote:

Hi All,

I have two NICs (82599) x two ports that are used as packet generators. I
want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
seem to be able to do it when two port in a NIC are used simultaneously.
Does anyone know how to generate 40Gbps without replicating packets in the
switch?

Thank you,

Jinho



Hi Keith,

Thank you for the e-mail. I am not sure how I figure out whether my
PCIe also has any problems to prevent me from sending full line-rates.
I use Intel(R) Xeon(R) CPU           E5649  @ 2.53GHz. It is hard for
me to figure out where is the bottleneck.

My configuration is:

sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
"[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua


=== port to lcore mapping table (# lcores 9) ===

lcore:     0     1     2     3     4     5     6     7     8

port   0:  D: T  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0 =  1: 1

port   1:  D: T  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0 =  1: 1

port   2:  D: T  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0 =  1: 1

port   3:  D: T  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1 =  1: 1

Total   :  0: 0  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1

 Display and Timer on lcore 0, rx:tx counts per port/lcore


Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128

Lcore:

 1, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
0: 0) , TX (pid:qid):

 2, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 0: 0)

 3, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
1: 0) , TX (pid:qid):

 4, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 1: 0)

 5, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
2: 0) , TX (pid:qid):

 6, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 2: 0)

 7, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
3: 0) , TX (pid:qid):

 8, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 3: 0)


Port :

 0, nb_lcores  2, private 0x6fd5a0, lcores:  1  2

 1, nb_lcores  2, private 0x700208, lcores:  3  4

 2, nb_lcores  2, private 0x702e70, lcores:  5  6

 3, nb_lcores  2, private 0x705ad8, lcores:  7  8



Initialize Port 0 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a4

 Create: Default RX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


 Create: Default TX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Range TX    0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Sequence TX 0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Special TX  0:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 1 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a5

 Create: Default RX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


 Create: Default TX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Range TX    1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Sequence TX 1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Special TX  1:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 2 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1c

 Create: Default RX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


 Create: Default TX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Range TX    2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Sequence TX 2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Special TX  2:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 3 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1d

 Create: Default RX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


 Create: Default TX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Range TX    3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Sequence TX 3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Special TX  3:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB


Total memory used =  41003 KB

Port  0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  2: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  3: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>


=== Display processing on lcore 0

=== RX processing on lcore  1, rxcnt 1, port/qid, 0/0

=== TX processing on lcore  2, txcnt 1, port/qid, 0/0

=== RX processing on lcore  3, rxcnt 1, port/qid, 1/0

=== TX processing on lcore  4, txcnt 1, port/qid, 1/0

=== RX processing on lcore  5, rxcnt 1, port/qid, 2/0

=== TX processing on lcore  6, txcnt 1, port/qid, 2/0

=== RX processing on lcore  7, rxcnt 1, port/qid, 3/0

=== TX processing on lcore  8, txcnt 1, port/qid, 3/0


Please, advise me if you have time.

Thank you always for your help!

Jinho



The phenomenon is that when I start one port in one NIC, it reaches
10Gbps. Also, when I start one port per each NIC, they achieve 10Gbps
each = 20Gbps. But, when I start two port in one NIC, it becomes
5.8Gbps each. This is persistent when cores are assigned
differently---cross sockets and the same sockets. Since the size of
huge pages are fixed, it will not be a problem. Should we say this is
the limitation on NIC or bus? The reason I think this may be a hw
limitation is that regardless of packet sizes, two ports in one NIC
can only send 5.8Gbps maximum.

Do you have any way that I can calculate the hw limitation?

Jinho



My cpu configuration is as follows:

============================================================

Core and Socket Information (as reported by '/proc/cpuinfo')

============================================================
cores =  [0, 1, 2, 8, 9, 10]
sockets =  [1, 0]
       Socket 1        Socket 0
       ---------       ---------
Core 0  [0, 12]         [1, 13]
Core 1  [2, 14]         [3, 15]
Core 2  [4, 16]         [5, 17]
Core 8  [6, 18]         [7, 19]
Core 9  [8, 20]         [9, 21]
Core 10         [10, 22]        [11, 23]

When I use just two ports for testing, I use this configuration.

sudo ./app/build/pktgen -c 1ff -n 3 $BLACK_LIST -- -p 0x30 -P -m
"[2:4].0, [6:8].1" -f test/forward.lua

As you can see the core numbers, 2, 4, 6, 8 are all in different
physical cores and are assigned separately. I am not sure that happens
with core configuration. Do have any other thoughts we may try?

Thanks,

Jinho



More information about the dev mailing list