[dpdk-dev] Performance issues with Mellanox Connectx-3 EN

Gilad Berman giladb at mellanox.com
Sun Aug 16 08:38:09 CEST 2015


Xiaozhou,

I will take this thread offline and mail you. I promise to post the solution back in the list for future reference.
I do not want to spam everyone..

Thx!

From: Xiaozhou Li [mailto:xl at CS.Princeton.EDU]
Sent: Friday, August 14, 2015 7:11 AM
To: Gilad Berman <giladb at mellanox.com>
Cc: Xu, Qian Q <qian.q.xu at intel.com>; dev at dpdk.org
Subject: Re: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN

Hi Qian and Gilad,

Thanks for your reply. We are using dpdk-2.0.0 and mlnx-en-2.4-1.0.0.1 on a Mellanox Connectx-3 EN with a single 40G port.

I ran testpmd on the server with following commands: sudo ./testpmd -c 0xff -n 4 -- -i --portmask=0x1 --port-topology=chained --rxq=4 --txq=4 --nb-cores=4; set fwd macswap

I have multiple clients send packets and receive replies. The server throughput is still only about 2Mpps. Testpmd shows no RX-dropped packet, but "ifconfig port" shows many dropped packets.

Please let me know if I am doing anything wrong and what else should I check. I am also copying the output when starting testpmd at the end of this email. Not sure if there is any useful information.

Thanks!
Xiaozhou


EAL: Detected lcore 0 as core 0 on socket 0
         ... (omit) ...
EAL: Detected 32 lcore(s)
EAL: VFIO modules not all loaded, skip VFIO support...
EAL: Setting up memory...
         ... (omit) ...
EAL: Ask a virtual area of 0xa00000 bytes
EAL: Virtual area found at 0x7f2d2fe00000 (size = 0xa00000)
EAL: Requesting 8192 pages of size 2MB from socket 0
EAL: Requesting 8192 pages of size 2MB from socket 1
EAL: TSC frequency is ~2199994 KHz
EAL: Master lcore 0 is ready (tid=39add900;cpuset=[0])
PMD: ENICPMD trace: rte_enic_pmd_init
EAL: lcore 4 is ready (tid=3676b700;cpuset=[4])
EAL: lcore 6 is ready (tid=35769700;cpuset=[6])
EAL: lcore 5 is ready (tid=35f6a700;cpuset=[5])
EAL: lcore 2 is ready (tid=3776d700;cpuset=[2])
EAL: lcore 1 is ready (tid=37f6e700;cpuset=[1])
EAL: lcore 3 is ready (tid=36f6c700;cpuset=[3])
EAL: lcore 7 is ready (tid=34f68700;cpuset=[7])
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL:   Not managed by a supported kernel driver, skipped
EAL: PCI device 0000:04:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL:   Not managed by a supported kernel driver, skipped
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1003 librte_pmd_mlx4
PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_0" (VF: false)
PMD: librte_pmd_mlx4: 1 port(s) detected
PMD: librte_pmd_mlx4: port 1 MAC address is f4:52:14:5a:8f:70
EAL: PCI device 0000:81:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1528 rte_ixgbe_pmd
EAL:   Not managed by a supported kernel driver, skipped
EAL: PCI device 0000:81:00.1 on NUMA socket 1
EAL:   probe driver: 8086:1528 rte_ixgbe_pmd
EAL:   Not managed by a supported kernel driver, skipped
Interactive-mode selected
Configuring Port 0 (socket 0)
PMD: librte_pmd_mlx4: 0x884360: TX queues number update: 0 -> 4
PMD: librte_pmd_mlx4: 0x884360: RX queues number update: 0 -> 4
Port 0: F4:52:14:5A:8F:70
Checking link statuses...
Port 0 Link Up - speed 40000 Mbps - full-duplex
Done

testpmd> show config rxtx
  macswap packet forwarding - CRC stripping disabled - packets/burst=32
  nb forwarding cores=4 - nb forwarding ports=1
  RX queues=4 - RX desc=128 - RX free threshold=0
  RX threshold registers: pthresh=0 hthresh=0 wthresh=0
  TX queues=4 - TX desc=512 - TX free threshold=0
  TX threshold registers: pthresh=0 hthresh=0 wthresh=0
  TX RS bit threshold=0 - TXQ flags=0x0
testpmd> show config fwd
macswap packet forwarding - ports=1 - cores=4 - streams=4 - NUMA support disabled, MP over anonymous pages disabled
Logical Core 1 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
Logical Core 2 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00
Logical Core 3 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=2 (socket 0) -> TX P=0/Q=2 (socket 0) peer=02:00:00:00:00:00
Logical Core 4 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=3 (socket 0) -> TX P=0/Q=3 (socket 0) peer=02:00:00:00:00:00



On Thu, Aug 13, 2015 at 6:13 AM, Gilad Berman <giladb at mellanox.com<mailto:giladb at mellanox.com>> wrote:
Xiaozhou,
Following Qian answer - 2Mpps is VERY (VERY) low and far below what we see even with single core.
Which version of DPDK and PMD are you using? Are you using MLNX optimized libs for PMD? Can you provide more details on the exact setup?
Can you run a simple test with testpmd and see if you are getting the same results?

Just to be clear - it does not matter which version you are using, 2Mpps is very far from what you should get :)

-----Original Message-----
From: dev [mailto:dev-bounces at dpdk.org<mailto:dev-bounces at dpdk.org>] On Behalf Of Xu, Qian Q
Sent: Thursday, August 13, 2015 6:25 AM
To: Xiaozhou Li <xl at CS.Princeton.EDU<mailto:xl at CS.Princeton.EDU>>; dev at dpdk.org<mailto:dev at dpdk.org>
Subject: Re: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN

Xiaozhou
So seems the performance bottleneck is not at the core, have you checked that the Mellanox NIC's configuration? How many queues per port are you using? Could you try l3fwd example with Mellanox to check if the performance is good enough? I'm not familiar with Mellanox NIC, but if you have tried Intel Fortville 40G NIC, I can give more suggestions about the NIC's configurations.

Thanks
Qian


-----Original Message-----
From: dev [mailto:dev-bounces at dpdk.org<mailto:dev-bounces at dpdk.org>] On Behalf Of Xiaozhou Li
Sent: Thursday, August 13, 2015 7:20 AM
To: dev at dpdk.org<mailto:dev at dpdk.org>
Subject: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN

Hi folks,

I am getting performance scalability issues with DPDK on Mellanox Connectx-3 .

Each of our machine has 16 cores and a single-port 40G Mellanox Connectx-3 EN. We find out the server throughput *does not scale* with number of cores. With a single thread on one core, we can get about 2 Mpps with a simple echo server implementation. However, the performance number does not increase as we use more cores. Our implementation is based on the l2fwd example.

I'd greatly appreciate it if anyone could provide some insights on what might be the problem and how can we improve the performance with Mellanox Connectx-3 EN. Thanks!

Best,
Xiaozhou



More information about the dev mailing list