[dpdk-dev] DPDK: Inter VM communication of iperf3 TCP throughput is very low on same host compare to non DPDK throughput

Bodireddy, Bhanuprakash bhanuprakash.bodireddy at intel.com
Wed Dec 28 14:16:05 CET 2016


>-----Original Message-----
>From: Rajalakshmi Prabhakar [mailto:krajalakshmi at tataelxsi.co.in]
>Sent: Tuesday, December 27, 2016 9:52 AM
>To: dev at dpdk.org; users at dpdk.org
>Cc: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy at intel.com>
>Subject: DPDK: Inter VM communication of iperf3 TCP throughput is very low
>on same host compare to non DPDK throughput
>
>Hello,
>Kindly Support me to get high throughput in inter VM communication of iperf3
>TCP in OpenStack DPDK host. I am not sure that I am mailing to the right ID
>sorry for the inconvenience.
OVS mailing list should be the appropriate one for the problem you reported here. 
Use  ovs-discuss at openvswitch.org (or) dev at openvswitch.org. 

>
>Host - ubuntu16.04
>devstack - stable/newton
>which install DPDK 16.07 and OVS 2.6 versions
>with DPDK plugin and following DPDK configurations
>Grub changes
>GRUB_CMDLINE_LINUX_DEFAULT="quiet splash default_hugepagesz=1G
>hugepagesz=1G hugepages=8 iommu=pt intel_iommu=on"
>local.conf - changes for DPDK
>enable_plugin networking-ovs-dpdk
>https://git.openstack.org/openstack/networking-ovs-dpdk master
>OVS_DPDK_MODE=controller_ovs_dpdk
>OVS_NUM_HUGEPAGES=8
>OVS_CORE_MASK=2
>OVS_PMD_CORE_MASK=4
Only one PMD core is used in your case, scaling the PMD threads can be one option for higher throughputs. 

>OVS_DPDK_BIND_PORT=False
>OVS_SOCKET_MEM=2048
>OVS_DPDK_VHOST_USER_DEBUG=n
>OVS_ALLOCATE_HUGEPAGES=True
>OVS_HUGEPAGE_MOUNT_PAGESIZE=1G
>MULTI_HOST=1
>OVS_DATAPATH_TYPE=netdev
>before VM creation
>#nova flavor-key m1.small set hw:mem_page_size=1048576
>Able to create two ubuntu instance in flavor m1.small
How many cores are assigned for the VM and have you tried CPU pinning options instead of allowing the threads to float across the cores?

>Achieved iperf3 tcp throughput of ~7.5Gbps
Are you seeing high drops at the vHost ports and retransmissions?  Do you see the same throughput difference with UDP traffic?
However I can't explain now the throughput gap you are observing here. Couple of things worth checking

- For thread starvation (htop to see thread activity on the cores)
-  I see that you have single socket setup and no QPI involved. As you have HT enabled, check if appropriate thread siblings are used.
- Check pmd thread/port statistics for anomaly. 

BTW, the responses can be slow at this point due to yearend vacation. 

Regards,
Bhanuprakash. 

>Ensured the vhostport is created and HugePage is consumed at the end of
>2VM created each of 2GB ie 4GB for VMs and 2GB for socket totally 6GB
>$ sudo cat /proc/meminfo |grep Huge
>AnonHugePages: 0 kB
>HugePages_Total: 8
>HugePages_Free: 2
>HugePages_Rsvd: 0
>HugePages_Surp: 0
>Hugepagesize: 1048576 kB
>The same scenario carried for without DPDK case of openstack and achieved
>higher throughput of ~19Gbps, which is contradictory to the expected results.
>Kindly suggest me what additional DPDK configuration to be done for high
>throughput. Also tried cpu pinning and multi queue for OpenStack DPDK but
>no improvement in the result.
>Test PC is single NUMA only.I am not doing NIC binding as only trying to
>validate inter-VM communication in same host. PFB my PC configurations.
>$ lscpu
>Architecture: x86_64
>CPU op-mode(s): 32-bit, 64-bit
>Byte Order: Little Endian
>CPU(s): 12
>On-line CPU(s) list: 0-11
>Thread(s) per core: 2
>Core(s) per socket: 6
>Socket(s): 1
>NUMA node(s): 1
>Vendor ID: GenuineIntel
>CPU family: 6
>Model: 63
>Model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
>Stepping: 2
>CPU MHz: 1212.000
>CPU max MHz: 2400.0000
>CPU min MHz: 1200.0000
>BogoMIPS: 4794.08
>Virtualization: VT-x
>L1d cache: 32K
>L1i cache: 32K
>L2 cache: 256K
>L3 cache: 15360K
>NUMA node0 CPU(s): 0-11
>Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
>pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1g b
>rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
>nonstop_t sc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx
>smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
>movbe popcnt tsc_deadline _timer aes xsave avx f16c rdrand lahf_lm abm
>epb tpr_shadow vnmi flexpriority ep t vpid fsgsbase tsc_adjust bmi1 avx2
>smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat
>pln pts
>I am following INSTALL.DPDK.ADVANCED.md but no clue on low throughput.
>
>
>Best Regards
>Rajalakshmi Prabhakar
>
>Specialist - Communication BU | Wireless Division
>TATA ELXSI
>IITM Research park , Kanagam road ,  Taramani ,   Chennai 600 113   India
>Tel +91 44 66775031   Cell +91 9789832957
>www.tataelxsi.com


More information about the dev mailing list