Bug 399 - ixgbe X540 PMD RSS is zero for NFSv3 NULL reply
Summary: ixgbe X540 PMD RSS is zero for NFSv3 NULL reply
Status: IN_PROGRESS
Alias: None
Product: DPDK
Classification: Unclassified
Component: ethdev (show other bugs)
Version: 18.11
Hardware: x86 Linux
: Normal normal
Target Milestone: ---
Assignee: Konstantin Ananyev
URL:
Depends on:
Blocks:
 
Reported: 2020-02-13 23:34 CET by Dave Burton
Modified: 2021-02-02 07:58 CET (History)
7 users (show)



Attachments
tcpdump -i eth3 -w /tmp/client80.mount.pcap host 10.240.160.1 (52.00 KB, application/vnd.tcpdump.pcap)
2020-02-13 23:34 CET, Dave Burton
Details

Description Dave Burton 2020-02-13 23:34:42 CET
Created attachment 84 [details]
tcpdump -i eth3 -w /tmp/client80.mount.pcap host 10.240.160.1

We see packets with rte_mbuf.hash.rss == 0 for NFSv3 NULL reply packets, and NFSv3 NFSACL reply packets, but all other packets appear correct.

Attached is a pcap, which when replayed through DPDK 18.11.3 having a copper X540-based 10GbE NIC should show the zero RSS hashes for a restricted set of packets.  It is okay on others, but an NFSv3 mount command will have its frames directed to the wrong core due to rte_mbuf.hash.rss == 0.

This problem does NOT exhibit with the 82599 ixgbe PMD.

If you open this pcap with wireshark, this filter expression will limit your view to only those packets that get a zero RSS hash.  All other packets in this pcap had correct RSS values in rte_mbuf.hash.rss.

      rpc.procedure == 0 and rpc.msgtyp == 1 and (tcp.port == 2049)

This will select packets: 20, 44, 71, 91, 118, 138, 165, 185, 211, 230, 256, 275, 301, 320, 346, 365, 391, 410, 436, 455.  Each of these packets had rte_mbuf.hash.rss == 0.  All other packets had valid hash.rss values.

The attached pcap holds the traffic as seen by the client, generated by this command:

seq 10 | while read a; do sudo time mount -t nfs -o vers=3 10.240.160.1:/jot /mnt/daveb; sudo umount /mnt/daveb; done

Our application is a bump-on-the-wire between client and server.  All packets flow through our box (and DPDK 18.11.3) both directions.  The only packets with a bad RSS are those indicated above.  The NIC in our box is (from kern.log):

Feb  7 16:43:06 firefly kernel: PE310G4BPi40T found, firmware version: 0xac
Comment 1 David Hendel 2020-02-20 10:04:27 CET
Hi DPDK team,
I am working with Dave on that and seems like issue with the DPDK driver, will appriciate looking into that.
Comment 2 Ajit Khaparde 2020-02-21 20:28:21 CET
Can you please take a look? Thanks
Comment 3 Dave Burton 2020-03-09 19:19:26 CET
This is still showing as unconfirmed.   Has there been an attempt to repro, perhaps by replaying the packets in the tcpdump through an X540?  This is a blocking problem for us.
Comment 4 Ajit Khaparde 2020-03-09 20:17:46 CET
+Ferruh
Comment 5 Dave Burton 2020-04-14 21:35:17 CEST
Asking for an update, please.
Comment 6 Steve Yang 2020-12-30 07:10:27 CET
It looks like a hardware bug and is tracking by hardware team.
Comment 7 dapengx.yu@intel.com 2021-01-18 09:44:01 CET
extract a packet(rpc.procedure == 0 and rpc.msgtyp == 1 and (tcp.port == 2049)) from the attached pcap file, it is for the testing of the following case.

1. Cannot reproduce on ixgbe kernel driver on X540-AT2.

       use default RSS config of kernel driver, the extracted packet and other TCP packet with same tuple (src ip, dst ip, src port, dst port) are distributed into the 10# queue.

2. Easily reproduced on testpmd + DPDK PMD driver on X540-AT2.

       use default RSS config of testpmd, the extracted packet is distributed into the 0# queue, without RSS hash, and hw ptype is also wrong(only L2_ETHER). But other TCP packet with same tuple (src ip, dst ip, src port, dst port) is distributed into the 13# queue with RSS hash.

       gdb --args ./build/app/dpdk-testpmd -c 1ffff -n 4 – -i --nb-cores=16 --rxq=64 --txq=64 --port-topology=chained,

       testpmd>set fwd rxonly

       testpmd>set verbose 1

       testpmd>start

port 0/queue 0: received 1 packets
src=A0:36:9F:54:A6:58 - dst=A0:36:9F:68:FD:B4 - type=0x0800 - length=82 - nb_segs=1 - hw ptype: L2_ETHER - sw ptype: L2_ETHER L3_IPV4 L4_TCP - l2_len=14 - l3_len=20 - l4_len=20 - Receive queue=0x0
ol_flags: PKT_RX_L4_CKSUM_GOOD PKT_RX_IP_CKSUM_GOOD PKT_RX_OUTER_L4_CKSUM_UNKNOWN

port 0/queue 13: received 1 packets
src=A0:36:9F:54:A6:58 - dst=A0:36:9F:68:FD:B4 - type=0x0800 - length=60 - nb_segs=1 - RSS hash=0xd83121bd - RSS queue=0xd - hw ptype: L2_ETHER L3_IPV4 L4_TCP - sw ptype: L2_ETHER L3_IPV4 L4_TCP - l2_len=14 - l3_len=20 - l4_len=20 - Receive queue=0xd
ol_flags: PKT_RX_RSS_HASH PKT_RX_L4_CKSUM_GOOD PKT_RX_IP_CKSUM_GOOD PKT_RX_OUTER_L4_CKSUM_UNKNOWN
Comment 8 dapengx.yu@intel.com 2021-01-20 02:45:53 CET
X540 look inside tcp/udp payload and recognize NFS protocol. testpmd app set some registers, which cause NFS packet flow into queue0.

trying to identify these registers, to make testpmd set register as kernel driver, then the issue should be resolved.
Comment 9 dapengx.yu@intel.com 2021-01-20 10:47:45 CET
root cause found: ixgbe PMD driver enable NFS filtering when receive side coalescing is not required.

When NFS filtering is enabled, NFS packet will be sent to queue0, RSS cannot work on NFS packet.

fix patch is at http://patchwork.dpdk.org/patch/86957/
Comment 10 dapengx.yu@intel.com 2021-01-25 09:42:14 CET
Patch v2 submitted: https://patches.dpdk.org/patch/87167/
Comment 11 dapengx.yu@intel.com 2021-01-26 06:35:38 CET
patch v3 submitted: https://patches.dpdk.org/patch/87270/, just commit comment updated.
Comment 12 jiang,yu 2021-02-02 07:58:51 CET
V3 patch has been meged in dpdk21.02-rc2.
Commit info:
68643843e net/ixgbe: disable NFS filtering

Note You need to log in before you can comment on or make changes to this bug.