[dpdk-users] Intel NIC Flow director?

Dave Myer dmyer705 at gmail.com
Sat Dec 3 16:56:48 CET 2016


G'day,

Thanks to the DPDK community for all the interesting questions and answers
on this list.


This isn't entirely a DPDK question, but I've been trying to use the Flow
Bifurcation, or Intel "Flow Director, but I'm a little confused on exactly
how multicast is supposed to work, and am also having difficulty defining
the flow-type rules (l4proto not working).

My general objective is to use DPDK for manipulation of multicast data
plane traffic, so ideally, I'd like to forward non-link local multicast to
a DPDK VF device, and everything else including ICMP, IGMP, PIM to the main
linux kernel PF device.  The idea would be to allow the multicast control
plane, like IGMP to be handled by smcroute running in linux, and let DPDK
handle the data plane.


Using the the various guides, the VFs are created as follows, but I'm
including lots of output to be clear and so others can follow in my
footsteps.  Most guides didn't highlight the dmesg, so I hope that helps
others understand better what's happening:

I found that the latest 4.4.6 ixgbe driver is required, as the Ubuntu 16.04
driver is too old.

These are the steps to create the VF NIC:
#---------------------------------------------------------------
# Kernel version
uname -a
Linux dpdkhost 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux

# Ubuntu release
cat /etc/*release | grep -i description
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"

# Kernel boot parameters
# Note the "iommu=pt" allows the VFs
cat /etc/default/grub | grep huge
GRUB_CMDLINE_LINUX_DEFAULT="hugepages=8192 isolcpus=2-15 iommu=pt"

# IXGBE configuration, where 3 allows the 256K buffer size
#root at dpdkhost:/home/das/ixgbe-4.4.6# cat README | grep -A 6 -e
'^FdirPballoc'
#FdirPballoc
#-----------
#Valid Range: 1-3
#Specifies the Flow Director allocated packet buffer size.
#1 = 64k
#2 = 128k
#3 = 256
# See also:
https://github.com/torvalds/linux/blob/master/Documentation/networking/ixgbe.txt
cat /etc/modprobe.d/ixgbe.conf
options ixgbe FdirPballoc=3

# DPDK NIC status BEFORE binding to ixgbe
/root/dpdk/tools/dpdk-devbind.py --status | grep 0000
0000:05:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio
unused=ixgbe
0000:05:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio
unused=ixgbe
0000:01:00.0 '82576 Gigabit Network Connection' if=enp1s0f0 drv=igb
unused=igb_uio
0000:01:00.1 '82576 Gigabit Network Connection' if=enp1s0f1 drv=igb
unused=igb_uio

# Bind target NIC to Linux IXGBE
/root/dpdk/tools/dpdk-devbind.py -b ixgbe 0000:05:00.1

# Dmesg when binding NIC to ixgbe, noting the "Enabled Features: RxQ: 16
TxQ: 16 FdirHash DCA"
[  668.609718] ixgbe: 0000:05:00.1: ixgbe_check_options: FCoE Offload
feature enabled
[  668.766451] ixgbe 0000:05:00.1 enp5s0f1: renamed from eth0
[  668.766527] ixgbe 0000:05:00.1: PCI Express bandwidth of 32GT/s available
[  668.766530] ixgbe 0000:05:00.1: (Speed:5.0GT/s, Width: x8, Encoding
Loss:20%)
[  668.766615] ixgbe 0000:05:00.1 enp5s0f1: MAC: 2, PHY: 18, SFP+: 6, PBA
No: E66560-002
[  668.766617] ixgbe 0000:05:00.1: 00:1b:21:66:a9:81
[  668.766619] ixgbe 0000:05:00.1 enp5s0f1: Enabled Features: RxQ: 16 TxQ:
16 FdirHash DCA
[  668.777948] ixgbe 0000:05:00.1 enp5s0f1: Intel(R) 10 Gigabit Network
Connection

# DPDK NIC status AFTER binding to ixgbe
/root/dpdk/tools/dpdk-devbind.p# DPDK NIC status BEFORE binding to ixgbey
--status | grep 0000
0000:05:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio
unused=ixgbe
0000:01:00.0 '82576 Gigabit Network Connection' if=enp1s0f0 drv=igb
unused=igb_uio
0000:01:00.1 '82576 Gigabit Network Connection' if=enp1s0f1 drv=igb
unused=igb_uio
0000:05:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=enp5s0f1
drv=ixgbe unused=igb_uio

# ixgbe driver version
ethtool -i enp5s0f1
driver: ixgbe
version: 4.4.6
firmware-version: 0x18b30001
expansion-rom-version:
bus-info: 0000:05:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

# Create the VF
echo 1 > /sys/bus/pci/devices/0000\:05\:00.1/sriov_numvfs

# DPDK NIC status AFTER creating the new VF
/root/dpdk/tools/dpdk-devbind.py --status | grep 0000
0000:05:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio
unused=ixgbe
0000:01:00.0 '82576 Gigabit Network Connection' if=enp1s0f0 drv=igb
unused=igb_uio
0000:01:00.1 '82576 Gigabit Network Connection' if=enp1s0f1 drv=igb
unused=igb_uio
0000:05:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=enp5s0f1
drv=ixgbe unused=igb_uio
0000:05:10.1 '82599 Ethernet Controller Virtual Function' if=eth0
drv=ixgbevf unused=igb_uio

# Highlighting the new virtual VF device that just got created
/root/dpdk/tools/dpdk-devbind.py --status | grep 0000 | grep Virt
0000:05:10.1 '82599 Ethernet Controller Virtual Function' if=eth0
drv=ixgbevf unused=igb_uio

# Dmesg when creating the VF
[  736.643865] ixgbe 0000:05:00.1: SR-IOV enabled with 1 VFs
[  736.643870] ixgbe 0000:05:00.1: configure port vlans to keep your VFs
secure
[  736.744382] pci 0000:05:10.1: [8086:10ed] type 00 class 0x020000
[  736.744436] pci 0000:05:10.1: can't set Max Payload Size to 256; if
necessary, use "pci=pcie_bus_safe" and report a bug
[  736.762714] ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function
Network Driver - version 2.12.1-k
[  736.762717] ixgbevf: Copyright (c) 2009 - 2015 Intel Corporation.
[  736.762771] ixgbevf 0000:05:10.1: enabling device (0000 -> 0002)
[  736.763967] ixgbevf 0000:05:10.1: PF still in reset state.  Is the PF
interface up?
[  736.763968] ixgbevf 0000:05:10.1: Assigning random MAC address
[  736.806083] ixgbevf 0000:05:10.1: 0a:ae:33:3c:2c:79
[  736.806087] ixgbevf 0000:05:10.1: MAC: 1
[  736.806090] ixgbevf 0000:05:10.1: Intel(R) 82599 Virtual Function
#---------------------------------------------------------------

I'm not sure if that message "can't set Max Payload Size to 256" is bad?
Maybe this is why the flow-director l4proto flows shown below don't work?

#---------------------------------------------------------------
# Ethtool shows the new VRF interface is ixgbevf
ethtool -i eth0
driver: ixgbevf
version: 2.12.1-k
firmware-version:
expansion-rom-version:
bus-info: 0000:05:10.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

# Bind the new VF to DPDK
/root/dpdk/tools/dpdk-devbind.py --bind=igb_uio 0000:05:10.1

/root/dpdk/tools/dpdk-devbind.py --status | grep 0000
0000:05:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio
unused=ixgbe
0000:05:10.1 '82599 Ethernet Controller Virtual Function' drv=igb_uio
unused=ixgbevf
0000:01:00.0 '82576 Gigabit Network Connection' if=enp1s0f0 drv=igb
unused=igb_uio
0000:01:00.1 '82576 Gigabit Network Connection' if=enp1s0f1 drv=igb
unused=igb_uio
0000:05:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=enp5s0f1
drv=ixgbe unused=igb_uio
#---------------------------------------------------------------

The VF is now setup, but we need to direct traffic to it before the DPDK
process could receive traffic.

Enable flow director:
#---------------------------------------------------------------
# Enable the flow-director feature
ethtool -K enp5s0f1 ntuple on

ethtool --show-ntuple enp5s0f1
4 RX rings available
Total 0 rules

#---------------------------------------------------------------

Now time for rules, where first I'd like to explicitly direct traffic to
linux kernel
#---------------------------------------------------------------
# ICMP to main queue, so ping will be via Linux kernel
ethtool --config-ntuple enp5s0f1 flow-type ip4 l4proto 1 action 0 loc 1
rmgr: Cannot insert RX class rule: Operation not supported

# Try again with latest ethtool 4.8
root at dpdkhost:/home/das/ethtool-4.8# ./ethtool --config-ntuple enp5s0f1
flow-type ip4 l4proto 1 action 0 loc 1
rmgr: Cannot insert RX class rule: Operation not supported
#---------------------------------------------------------------

Any thoughts on why "l4proto" doesn't seem to work?  Am I being sillly?

I'd also like to direct IGMP and PIM to linux via:
#---------------
# IGMP
ethtool --config-ntuple enp5s0f1 flow-type ip4 l4proto 2 action 0
# PIM
ethtool --config-ntuple enp5s0f1 flow-type ip4 l4proto 103 action 0
#---------------

Regardless of the l4proto filters, continue trying to direct multicast to
the VF:
#---------------------------------------------------------------
# Multicast local to main queue (224.0.0.0/24)
ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m
255.255.255.0 action 0 loc 2

# Great the rule went in, as shown:
ethtool --show-ntuple enp5s0f1
4 RX rings available
Total 1 rules

Filter: 2
    Rule Type: Raw IPv4
    Src IP addr: 0.0.0.0 mask: 255.255.255.255
    Dest IP addr: 0.0.0.0 mask: 255.255.255.0
    TOS: 0x0 mask: 0xff
    Protocol: 0 mask: 0xff
    L4 bytes: 0x0 mask: 0xffffffff
    VLAN EtherType: 0x0 mask: 0xffff
    VLAN: 0x0 mask: 0xffff
    User-defined: 0x0 mask: 0xffffffffffffffff
    Action: Direct to queue 0

# Now direct all other multicast to the DPDK VF queue 1 (224.0.0.0/4)
ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m 240.0.0.0
action 1 loc 3
rmgr: Cannot insert RX class rule: Invalid argument
#---------------------------------------------------------------
That's weird, what's wrong?  Why won't the second (2nd) rule go in?

Try removing and adding only the non-local multicast
#---------------------------------------------------------------
# Remove the link local multicast rule
ethtool --config-ntuple enp5s0f1 delete 2

# Direct all multicast to queue 1 (224.0.0.0/4)
ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m 240.0.0.0
action 1 loc 3

# Great the rule went in that time, as shown:
ethtool --show-ntuple enp5s0f1
4 RX rings available
Total 1 rules

Filter: 3
    Rule Type: Raw IPv4
    Src IP addr: 0.0.0.0 mask: 255.255.255.255
    Dest IP addr: 0.0.0.0 mask: 240.0.0.0
    TOS: 0x0 mask: 0xff
    Protocol: 0 mask: 0xff
    L4 bytes: 0x0 mask: 0xffffffff
    VLAN EtherType: 0x0 mask: 0xffff
    VLAN: 0x0 mask: 0xffff
    User-defined: 0x0 mask: 0xffffffffffffffff
    Action: Direct to queue 1

# Can't seem to add the other rule that worked before now either.  Weird.
ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m 0.0.0.255
action 0 loc 2
rmgr: Cannot insert RX class rule: Invalid argument

# What about not specificying the "location" or rule number?
ethtool --config-ntuple enp5s0f1 delete 3
ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m
255.255.255.0 action 0
Added rule with ID 2045

# Fingers crossed...
ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m 240.0.0.0
action 1
rmgr: Cannot insert RX class rule: Invalid argument

# Doh!
#---------------------------------------------------------------

Ok, so only one (1) multicast rule seems to go in, but I wonder if that's
working?

For this test, there is a Cisco switch/router doing a multicast ping from
172.16.1.1 to 226.1.1.1, and also doing a unicast ping from enp5s0f1.

#---------------------------------------------------------------
# Bind the VF back to linux to allow tcpdump
/root/dpdk/tools/dpdk-devbind.py --bind=ixgbevf 0000:05:10.1

#Bring up PF interface
ifconfig enp5s0f1 up

# Bring up VF
ifconfig eth0 up

# Check for the traffic on the PF enp5s0f1
tcpdump -c 10 -nei enp5s0f1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp5s0f1, link-type EN10MB (Ethernet), capture size 262144
bytes
15:33:22.472673 00:1b:21:66:a9:81 > 00:21:55:84:a3:3f, ethertype IPv4
(0x0800), length 98: 172.16.1.20 > 172.16.1.1: ICMP echo request, id 2074,
seq 841, length 64
15:33:22.473482 00:21:55:84:a3:3f > 00:1b:21:66:a9:81, ethertype IPv4
(0x0800), length 98: 172.16.1.1 > 172.16.1.20: ICMP echo reply, id 2074,
seq 841, length 64
15:33:23.009617 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
(0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq
476, length 180
15:33:23.471987 00:1b:21:66:a9:81 > 00:21:55:84:a3:3f, ethertype IPv4
(0x0800), length 98: 172.16.1.20 > 172.16.1.1: ICMP echo request, id 2074,
seq 842, length 64
15:33:23.472414 00:21:55:84:a3:3f > 00:1b:21:66:a9:81, ethertype IPv4
(0x0800), length 98: 172.16.1.1 > 172.16.1.20: ICMP echo reply, id 2074,
seq 842, length 64
15:33:24.471968 00:1b:21:66:a9:81 > 00:21:55:84:a3:3f, ethertype IPv4
(0x0800), length 98: 172.16.1.20 > 172.16.1.1: ICMP echo request, id 2074,
seq 843, length 64
15:33:24.523256 00:21:55:84:a3:3f > 00:1b:21:66:a9:81, ethertype IPv4
(0x0800), length 98: 172.16.1.1 > 172.16.1.20: ICMP echo reply, id 2074,
seq 843, length 64
15:33:25.009659 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
(0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq
477, length 180
15:33:25.473568 00:1b:21:66:a9:81 > 00:21:55:84:a3:3f, ethertype IPv4
(0x0800), length 98: 172.16.1.20 > 172.16.1.1: ICMP echo request, id 2074,
seq 844, length 64
15:33:25.478962 00:21:55:84:a3:3f > 00:1b:21:66:a9:81, ethertype IPv4
(0x0800), length 98: 172.16.1.1 > 172.16.1.20: ICMP echo reply, id 2074,
seq 844, length 64
10 packets captured
10 packets received by filter
0 packets dropped by kernel


# That's weird.  Didn't expect any traffic multicast, but we got a mix of
the unicast ( 172.16.1.1 > 172.16.1.20 ) and multicast (172.16.1.1 >
226.1.1.1).  Why is the multicast traffic to 226.1.1.1 still hitting this
interface?

# Check the VF eth0
tcpdump -c 10 -nei eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:34:48.435652 00:21:55:84:a3:3f > 01:00:5e:00:00:01, ethertype IPv4
(0x0800), length 60: 172.16.1.1 > 224.0.0.1: igmp query v3
15:34:49.019941 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
(0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq
519, length 180
15:34:51.056169 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
(0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq
520, length 180
15:34:53.056236 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
(0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq
521, length 180
15:34:55.056390 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
(0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq
522, length 180
15:34:57.056578 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
(0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq
523, length 180
15:34:58.743276 00:21:55:84:a3:3f > 01:00:5e:00:00:01, ethertype IPv4
(0x0800), length 60: 172.16.1.1 > 224.0.0.1: igmp query v3
15:34:59.056686 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
(0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq
524, length 180
15:35:01.056869 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
(0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq
525, length 180
15:35:03.056988 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
(0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq
526, length 180
10 packets captured
10 packets received by filter
0 packets dropped by kernel


# Ok, so that's good.  Multicast traffic only going to the VF.
#---------------------------------------------------------------
This is kind of working as I would expect, as the VF is getting the
multicast traffic, but why is the PF enp5s0f1 still getting multicast?

I thought flow director would have sent all the multicast ONLY to the VF
eth0?


Just for future searching, it seem the PF only gets the flow director
(fdir) counters:
#---------------------------------------------------------------
root at smtwin1:/home/das/ethtool-4.8# ethtool -S enp5s0f1 | grep fdir
     fdir_match: 1040
     fdir_miss: 2653
     fdir_overflow: 0
root at smtwin1:/home/das/ethtool-4.8# ethtool -S eth0
NIC statistics:
     rx_packets: 934
     tx_packets: 8
     rx_bytes: 191520
     tx_bytes: 648
     tx_busy: 0
     tx_restart_queue: 0
     tx_timeout_count: 0
     multicast: 933
     rx_csum_offload_errors: 0
     tx_queue_0_packets: 8
     tx_queue_0_bytes: 648
     tx_queue_0_bp_napi_yield: 0
     tx_queue_0_bp_misses: 0
     tx_queue_0_bp_cleaned: 0
     rx_queue_0_packets: 934
     rx_queue_0_bytes: 191520
     rx_queue_0_bp_poll_yield: 0
     rx_queue_0_bp_misses: 0
     rx_queue_0_bp_cleaned: 0
#---------------------------------------------------------------

Thanks in advance for your help!

regards,
Dave

Helpful reference pages:
http://dpdk.org/doc/guides/howto/flow_bifurcation.html

https://dpdksummit.com/Archive/pdf/2016Userspace/Day02-Session05-JingjingWu-Userspace2016.pdf

http://rhelblog.redhat.com/2015/10/02/getting-the-best-of-both-worlds-with-queue-splitting-bifurcated-driver/

https://github.com/pavel-odintsov/fastnetmon/wiki/Traffic-filtration-using-NIC-capabilities-on-wire-speed-(10GE,-14Mpps)


More information about the users mailing list