[dpdk-users] Intel NIC Flow director?

Dave Myer dmyer705 at gmail.com
Thu Dec 8 20:19:28 CET 2016


No response is a bit of a surprise, as I thought bifurcation was something
commonly used.

Maybe this is the wrong place to ask about these Intel NICs?  Where else
can I seek help please?

On Sat, Dec 3, 2016 at 7:56 AM, Dave Myer <dmyer705 at gmail.com> wrote:

> G'day,
>
> Thanks to the DPDK community for all the interesting questions and answers
> on this list.
>
>
> This isn't entirely a DPDK question, but I've been trying to use the Flow
> Bifurcation, or Intel "Flow Director, but I'm a little confused on exactly
> how multicast is supposed to work, and am also having difficulty defining
> the flow-type rules (l4proto not working).
>
> My general objective is to use DPDK for manipulation of multicast data
> plane traffic, so ideally, I'd like to forward non-link local multicast to
> a DPDK VF device, and everything else including ICMP, IGMP, PIM to the main
> linux kernel PF device.  The idea would be to allow the multicast control
> plane, like IGMP to be handled by smcroute running in linux, and let DPDK
> handle the data plane.
>
>
> Using the the various guides, the VFs are created as follows, but I'm
> including lots of output to be clear and so others can follow in my
> footsteps.  Most guides didn't highlight the dmesg, so I hope that helps
> others understand better what's happening:
>
> I found that the latest 4.4.6 ixgbe driver is required, as the Ubuntu
> 16.04 driver is too old.
>
> These are the steps to create the VF NIC:
> #---------------------------------------------------------------
> # Kernel version
> uname -a
> Linux dpdkhost 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
>
> # Ubuntu release
> cat /etc/*release | grep -i description
> DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"
>
> # Kernel boot parameters
> # Note the "iommu=pt" allows the VFs
> cat /etc/default/grub | grep huge
> GRUB_CMDLINE_LINUX_DEFAULT="hugepages=8192 isolcpus=2-15 iommu=pt"
>
> # IXGBE configuration, where 3 allows the 256K buffer size
> #root at dpdkhost:/home/das/ixgbe-4.4.6# cat README | grep -A 6 -e
> '^FdirPballoc'
> #FdirPballoc
> #-----------
> #Valid Range: 1-3
> #Specifies the Flow Director allocated packet buffer size.
> #1 = 64k
> #2 = 128k
> #3 = 256
> # See also: https://github.com/torvalds/linux/blob/master/
> Documentation/networking/ixgbe.txt
> cat /etc/modprobe.d/ixgbe.conf
> options ixgbe FdirPballoc=3
>
> # DPDK NIC status BEFORE binding to ixgbe
> /root/dpdk/tools/dpdk-devbind.py --status | grep 0000
> 0000:05:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio
> unused=ixgbe
> 0000:05:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio
> unused=ixgbe
> 0000:01:00.0 '82576 Gigabit Network Connection' if=enp1s0f0 drv=igb
> unused=igb_uio
> 0000:01:00.1 '82576 Gigabit Network Connection' if=enp1s0f1 drv=igb
> unused=igb_uio
>
> # Bind target NIC to Linux IXGBE
> /root/dpdk/tools/dpdk-devbind.py -b ixgbe 0000:05:00.1
>
> # Dmesg when binding NIC to ixgbe, noting the "Enabled Features: RxQ: 16
> TxQ: 16 FdirHash DCA"
> [  668.609718] ixgbe: 0000:05:00.1: ixgbe_check_options: FCoE Offload
> feature enabled
> [  668.766451] ixgbe 0000:05:00.1 enp5s0f1: renamed from eth0
> [  668.766527] ixgbe 0000:05:00.1: PCI Express bandwidth of 32GT/s
> available
> [  668.766530] ixgbe 0000:05:00.1: (Speed:5.0GT/s, Width: x8, Encoding
> Loss:20%)
> [  668.766615] ixgbe 0000:05:00.1 enp5s0f1: MAC: 2, PHY: 18, SFP+: 6, PBA
> No: E66560-002
> [  668.766617] ixgbe 0000:05:00.1: 00:1b:21:66:a9:81
> [  668.766619] ixgbe 0000:05:00.1 enp5s0f1: Enabled Features: RxQ: 16 TxQ:
> 16 FdirHash DCA
> [  668.777948] ixgbe 0000:05:00.1 enp5s0f1: Intel(R) 10 Gigabit Network
> Connection
>
> # DPDK NIC status AFTER binding to ixgbe
> /root/dpdk/tools/dpdk-devbind.p# DPDK NIC status BEFORE binding to ixgbey
> --status | grep 0000
> 0000:05:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio
> unused=ixgbe
> 0000:01:00.0 '82576 Gigabit Network Connection' if=enp1s0f0 drv=igb
> unused=igb_uio
> 0000:01:00.1 '82576 Gigabit Network Connection' if=enp1s0f1 drv=igb
> unused=igb_uio
> 0000:05:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=enp5s0f1
> drv=ixgbe unused=igb_uio
>
> # ixgbe driver version
> ethtool -i enp5s0f1
> driver: ixgbe
> version: 4.4.6
> firmware-version: 0x18b30001
> expansion-rom-version:
> bus-info: 0000:05:00.1
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
>
> # Create the VF
> echo 1 > /sys/bus/pci/devices/0000\:05\:00.1/sriov_numvfs
>
> # DPDK NIC status AFTER creating the new VF
> /root/dpdk/tools/dpdk-devbind.py --status | grep 0000
> 0000:05:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio
> unused=ixgbe
> 0000:01:00.0 '82576 Gigabit Network Connection' if=enp1s0f0 drv=igb
> unused=igb_uio
> 0000:01:00.1 '82576 Gigabit Network Connection' if=enp1s0f1 drv=igb
> unused=igb_uio
> 0000:05:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=enp5s0f1
> drv=ixgbe unused=igb_uio
> 0000:05:10.1 '82599 Ethernet Controller Virtual Function' if=eth0
> drv=ixgbevf unused=igb_uio
>
> # Highlighting the new virtual VF device that just got created
> /root/dpdk/tools/dpdk-devbind.py --status | grep 0000 | grep Virt
> 0000:05:10.1 '82599 Ethernet Controller Virtual Function' if=eth0
> drv=ixgbevf unused=igb_uio
>
> # Dmesg when creating the VF
> [  736.643865] ixgbe 0000:05:00.1: SR-IOV enabled with 1 VFs
> [  736.643870] ixgbe 0000:05:00.1: configure port vlans to keep your VFs
> secure
> [  736.744382] pci 0000:05:10.1: [8086:10ed] type 00 class 0x020000
> [  736.744436] pci 0000:05:10.1: can't set Max Payload Size to 256; if
> necessary, use "pci=pcie_bus_safe" and report a bug
> [  736.762714] ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function
> Network Driver - version 2.12.1-k
> [  736.762717] ixgbevf: Copyright (c) 2009 - 2015 Intel Corporation.
> [  736.762771] ixgbevf 0000:05:10.1: enabling device (0000 -> 0002)
> [  736.763967] ixgbevf 0000:05:10.1: PF still in reset state.  Is the PF
> interface up?
> [  736.763968] ixgbevf 0000:05:10.1: Assigning random MAC address
> [  736.806083] ixgbevf 0000:05:10.1: 0a:ae:33:3c:2c:79
> [  736.806087] ixgbevf 0000:05:10.1: MAC: 1
> [  736.806090] ixgbevf 0000:05:10.1: Intel(R) 82599 Virtual Function
> #---------------------------------------------------------------
>
> I'm not sure if that message "can't set Max Payload Size to 256" is bad?
> Maybe this is why the flow-director l4proto flows shown below don't work?
>
> #---------------------------------------------------------------
> # Ethtool shows the new VRF interface is ixgbevf
> ethtool -i eth0
> driver: ixgbevf
> version: 2.12.1-k
> firmware-version:
> expansion-rom-version:
> bus-info: 0000:05:10.1
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: no
> supports-register-dump: yes
> supports-priv-flags: no
>
> # Bind the new VF to DPDK
> /root/dpdk/tools/dpdk-devbind.py --bind=igb_uio 0000:05:10.1
>
> /root/dpdk/tools/dpdk-devbind.py --status | grep 0000
> 0000:05:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio
> unused=ixgbe
> 0000:05:10.1 '82599 Ethernet Controller Virtual Function' drv=igb_uio
> unused=ixgbevf
> 0000:01:00.0 '82576 Gigabit Network Connection' if=enp1s0f0 drv=igb
> unused=igb_uio
> 0000:01:00.1 '82576 Gigabit Network Connection' if=enp1s0f1 drv=igb
> unused=igb_uio
> 0000:05:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=enp5s0f1
> drv=ixgbe unused=igb_uio
> #---------------------------------------------------------------
>
> The VF is now setup, but we need to direct traffic to it before the DPDK
> process could receive traffic.
>
> Enable flow director:
> #---------------------------------------------------------------
> # Enable the flow-director feature
> ethtool -K enp5s0f1 ntuple on
>
> ethtool --show-ntuple enp5s0f1
> 4 RX rings available
> Total 0 rules
>
> #---------------------------------------------------------------
>
> Now time for rules, where first I'd like to explicitly direct traffic to
> linux kernel
> #---------------------------------------------------------------
> # ICMP to main queue, so ping will be via Linux kernel
> ethtool --config-ntuple enp5s0f1 flow-type ip4 l4proto 1 action 0 loc 1
> rmgr: Cannot insert RX class rule: Operation not supported
>
> # Try again with latest ethtool 4.8
> root at dpdkhost:/home/das/ethtool-4.8# ./ethtool --config-ntuple enp5s0f1
> flow-type ip4 l4proto 1 action 0 loc 1
> rmgr: Cannot insert RX class rule: Operation not supported
> #---------------------------------------------------------------
>
> Any thoughts on why "l4proto" doesn't seem to work?  Am I being sillly?
>
> I'd also like to direct IGMP and PIM to linux via:
> #---------------
> # IGMP
> ethtool --config-ntuple enp5s0f1 flow-type ip4 l4proto 2 action 0
> # PIM
> ethtool --config-ntuple enp5s0f1 flow-type ip4 l4proto 103 action 0
> #---------------
>
> Regardless of the l4proto filters, continue trying to direct multicast to
> the VF:
> #---------------------------------------------------------------
> # Multicast local to main queue (224.0.0.0/24)
> ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m
> 255.255.255.0 action 0 loc 2
>
> # Great the rule went in, as shown:
> ethtool --show-ntuple enp5s0f1
> 4 RX rings available
> Total 1 rules
>
> Filter: 2
>     Rule Type: Raw IPv4
>     Src IP addr: 0.0.0.0 mask: 255.255.255.255
>     Dest IP addr: 0.0.0.0 mask: 255.255.255.0
>     TOS: 0x0 mask: 0xff
>     Protocol: 0 mask: 0xff
>     L4 bytes: 0x0 mask: 0xffffffff
>     VLAN EtherType: 0x0 mask: 0xffff
>     VLAN: 0x0 mask: 0xffff
>     User-defined: 0x0 mask: 0xffffffffffffffff
>     Action: Direct to queue 0
>
> # Now direct all other multicast to the DPDK VF queue 1 (224.0.0.0/4)
> ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m
> 240.0.0.0 action 1 loc 3
> rmgr: Cannot insert RX class rule: Invalid argument
> #---------------------------------------------------------------
> That's weird, what's wrong?  Why won't the second (2nd) rule go in?
>
> Try removing and adding only the non-local multicast
> #---------------------------------------------------------------
> # Remove the link local multicast rule
> ethtool --config-ntuple enp5s0f1 delete 2
>
> # Direct all multicast to queue 1 (224.0.0.0/4)
> ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m
> 240.0.0.0 action 1 loc 3
>
> # Great the rule went in that time, as shown:
> ethtool --show-ntuple enp5s0f1
> 4 RX rings available
> Total 1 rules
>
> Filter: 3
>     Rule Type: Raw IPv4
>     Src IP addr: 0.0.0.0 mask: 255.255.255.255
>     Dest IP addr: 0.0.0.0 mask: 240.0.0.0
>     TOS: 0x0 mask: 0xff
>     Protocol: 0 mask: 0xff
>     L4 bytes: 0x0 mask: 0xffffffff
>     VLAN EtherType: 0x0 mask: 0xffff
>     VLAN: 0x0 mask: 0xffff
>     User-defined: 0x0 mask: 0xffffffffffffffff
>     Action: Direct to queue 1
>
> # Can't seem to add the other rule that worked before now either.  Weird.
> ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m
> 0.0.0.255 action 0 loc 2
> rmgr: Cannot insert RX class rule: Invalid argument
>
> # What about not specificying the "location" or rule number?
> ethtool --config-ntuple enp5s0f1 delete 3
> ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m
> 255.255.255.0 action 0
> Added rule with ID 2045
>
> # Fingers crossed...
> ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m
> 240.0.0.0 action 1
> rmgr: Cannot insert RX class rule: Invalid argument
>
> # Doh!
> #---------------------------------------------------------------
>
> Ok, so only one (1) multicast rule seems to go in, but I wonder if that's
> working?
>
> For this test, there is a Cisco switch/router doing a multicast ping from
> 172.16.1.1 to 226.1.1.1, and also doing a unicast ping from enp5s0f1.
>
> #---------------------------------------------------------------
> # Bind the VF back to linux to allow tcpdump
> /root/dpdk/tools/dpdk-devbind.py --bind=ixgbevf 0000:05:10.1
>
> #Bring up PF interface
> ifconfig enp5s0f1 up
>
> # Bring up VF
> ifconfig eth0 up
>
> # Check for the traffic on the PF enp5s0f1
> tcpdump -c 10 -nei enp5s0f1
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on enp5s0f1, link-type EN10MB (Ethernet), capture size 262144
> bytes
> 15:33:22.472673 00:1b:21:66:a9:81 > 00:21:55:84:a3:3f, ethertype IPv4
> (0x0800), length 98: 172.16.1.20 > 172.16.1.1: ICMP echo request, id
> 2074, seq 841, length 64
> 15:33:22.473482 00:21:55:84:a3:3f > 00:1b:21:66:a9:81, ethertype IPv4
> (0x0800), length 98: 172.16.1.1 > 172.16.1.20: ICMP echo reply, id 2074,
> seq 841, length 64
> 15:33:23.009617 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
> (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5,
> seq 476, length 180
> 15:33:23.471987 00:1b:21:66:a9:81 > 00:21:55:84:a3:3f, ethertype IPv4
> (0x0800), length 98: 172.16.1.20 > 172.16.1.1: ICMP echo request, id
> 2074, seq 842, length 64
> 15:33:23.472414 00:21:55:84:a3:3f > 00:1b:21:66:a9:81, ethertype IPv4
> (0x0800), length 98: 172.16.1.1 > 172.16.1.20: ICMP echo reply, id 2074,
> seq 842, length 64
> 15:33:24.471968 00:1b:21:66:a9:81 > 00:21:55:84:a3:3f, ethertype IPv4
> (0x0800), length 98: 172.16.1.20 > 172.16.1.1: ICMP echo request, id
> 2074, seq 843, length 64
> 15:33:24.523256 00:21:55:84:a3:3f > 00:1b:21:66:a9:81, ethertype IPv4
> (0x0800), length 98: 172.16.1.1 > 172.16.1.20: ICMP echo reply, id 2074,
> seq 843, length 64
> 15:33:25.009659 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
> (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5,
> seq 477, length 180
> 15:33:25.473568 00:1b:21:66:a9:81 > 00:21:55:84:a3:3f, ethertype IPv4
> (0x0800), length 98: 172.16.1.20 > 172.16.1.1: ICMP echo request, id
> 2074, seq 844, length 64
> 15:33:25.478962 00:21:55:84:a3:3f > 00:1b:21:66:a9:81, ethertype IPv4
> (0x0800), length 98: 172.16.1.1 > 172.16.1.20: ICMP echo reply, id 2074,
> seq 844, length 64
> 10 packets captured
> 10 packets received by filter
> 0 packets dropped by kernel
>
>
> # That's weird.  Didn't expect any traffic multicast, but we got a mix of
> the unicast ( 172.16.1.1 > 172.16.1.20 ) and multicast (172.16.1.1 >
> 226.1.1.1).  Why is the multicast traffic to 226.1.1.1 still hitting this
> interface?
>
> # Check the VF eth0
> tcpdump -c 10 -nei eth0
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
> 15:34:48.435652 00:21:55:84:a3:3f > 01:00:5e:00:00:01, ethertype IPv4
> (0x0800), length 60: 172.16.1.1 > 224.0.0.1: igmp query v3
> 15:34:49.019941 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
> (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5,
> seq 519, length 180
> 15:34:51.056169 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
> (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5,
> seq 520, length 180
> 15:34:53.056236 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
> (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5,
> seq 521, length 180
> 15:34:55.056390 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
> (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5,
> seq 522, length 180
> 15:34:57.056578 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
> (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5,
> seq 523, length 180
> 15:34:58.743276 00:21:55:84:a3:3f > 01:00:5e:00:00:01, ethertype IPv4
> (0x0800), length 60: 172.16.1.1 > 224.0.0.1: igmp query v3
> 15:34:59.056686 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
> (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5,
> seq 524, length 180
> 15:35:01.056869 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
> (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5,
> seq 525, length 180
> 15:35:03.056988 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4
> (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5,
> seq 526, length 180
> 10 packets captured
> 10 packets received by filter
> 0 packets dropped by kernel
>
>
> # Ok, so that's good.  Multicast traffic only going to the VF.
> #---------------------------------------------------------------
> This is kind of working as I would expect, as the VF is getting the
> multicast traffic, but why is the PF enp5s0f1 still getting multicast?
>
> I thought flow director would have sent all the multicast ONLY to the VF
> eth0?
>
>
> Just for future searching, it seem the PF only gets the flow director
> (fdir) counters:
> #---------------------------------------------------------------
> root at smtwin1:/home/das/ethtool-4.8# ethtool -S enp5s0f1 | grep fdir
>      fdir_match: 1040
>      fdir_miss: 2653
>      fdir_overflow: 0
> root at smtwin1:/home/das/ethtool-4.8# ethtool -S eth0
> NIC statistics:
>      rx_packets: 934
>      tx_packets: 8
>      rx_bytes: 191520
>      tx_bytes: 648
>      tx_busy: 0
>      tx_restart_queue: 0
>      tx_timeout_count: 0
>      multicast: 933
>      rx_csum_offload_errors: 0
>      tx_queue_0_packets: 8
>      tx_queue_0_bytes: 648
>      tx_queue_0_bp_napi_yield: 0
>      tx_queue_0_bp_misses: 0
>      tx_queue_0_bp_cleaned: 0
>      rx_queue_0_packets: 934
>      rx_queue_0_bytes: 191520
>      rx_queue_0_bp_poll_yield: 0
>      rx_queue_0_bp_misses: 0
>      rx_queue_0_bp_cleaned: 0
> #---------------------------------------------------------------
>
> Thanks in advance for your help!
>
> regards,
> Dave
>
> Helpful reference pages:
> http://dpdk.org/doc/guides/howto/flow_bifurcation.html
>
> https://dpdksummit.com/Archive/pdf/2016Userspace/
> Day02-Session05-JingjingWu-Userspace2016.pdf
>
> http://rhelblog.redhat.com/2015/10/02/getting-the-best-
> of-both-worlds-with-queue-splitting-bifurcated-driver/
>
> https://github.com/pavel-odintsov/fastnetmon/wiki/
> Traffic-filtration-using-NIC-capabilities-on-wire-speed-(10GE,-14Mpps)
>
>
>
>
>


More information about the users mailing list