[dpdk-dev] [dpdk-users] RSS Hash not working for XL710/X710 NICs for some RX mbuf sizes

Take Ceara dumitru.ceara at gmail.com
Mon Jul 18 18:14:11 CEST 2016


Hi Helin,

On Mon, Jul 18, 2016 at 5:15 PM, Zhang, Helin <helin.zhang at intel.com> wrote:
> Hi Ceara
>
> Could you help to let me know your firmware version?

# ethtool -i p7p1 | grep firmware
firmware-version: f4.40.35115 a1.4 n4.53 e2021

> And could you help to try with the standard DPDK example application, such as testpmd, to see if there is the same issue?
> Basically we always set the same size for both rx and tx buffer, like the default one of 2048 for a lot of applications.

I'm a bit lost in the testpmd CLI. I enabled RSS, configured 2 RX
queues per port and started sending traffic with single segmnet
packets of size 2K but I didn't figure out how to actually verify that
the RSS hash is correctly set.. Please let me know if I should do it
in a different way.

testpmd -c 0x331 -w 0000:82:00.0 -w 0000:83:00.0 -- --mbuf-size 2048 -i
[...]

testpmd> port stop all
Stopping ports...
Checking link statuses...
Port 0 Link Up - speed 40000 Mbps - full-duplex
Port 1 Link Up - speed 40000 Mbps - full-duplex
Done

testpmd> port config all txq 2

testpmd> port config all rss all

testpmd> port config all max-pkt-len 2048
testpmd> port start all
Configuring Port 0 (socket 0)
PMD: i40e_set_tx_function_flag(): Vector tx can be enabled on this txq.
PMD: i40e_set_tx_function_flag(): Vector tx can be enabled on this txq.
PMD: i40e_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are
satisfied. Rx Burst Bulk Alloc function will be used on port=0,
queue=0.
PMD: i40e_set_tx_function(): Vector tx finally be used.
PMD: i40e_set_rx_function(): Using Vector Scattered Rx callback (port=0).
Port 0: 3C:FD:FE:9D:BE:F0
Configuring Port 1 (socket 0)
PMD: i40e_set_tx_function_flag(): Vector tx can be enabled on this txq.
PMD: i40e_set_tx_function_flag(): Vector tx can be enabled on this txq.
PMD: i40e_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are
satisfied. Rx Burst Bulk Alloc function will be used on port=1,
queue=0.
PMD: i40e_set_tx_function(): Vector tx finally be used.
PMD: i40e_set_rx_function(): Using Vector Scattered Rx callback (port=1).
Port 1: 3C:FD:FE:9D:BF:30
Checking link statuses...
Port 0 Link Up - speed 40000 Mbps - full-duplex
Port 1 Link Up - speed 40000 Mbps - full-duplex
Done

testpmd> set txpkts 2048
testpmd> show config txpkts
Number of segments: 1
Segment sizes: 2048
Split packet: off


testpmd> start tx_first
  io packet forwarding - CRC stripping disabled - packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  RX queues=1 - RX desc=128 - RX free threshold=32
  RX threshold registers: pthresh=8 hthresh=8 wthresh=0
  TX queues=2 - TX desc=512 - TX free threshold=32
  TX threshold registers: pthresh=32 hthresh=0 wthresh=0
  TX RS bit threshold=32 - TXQ flags=0xf01
testpmd> stop
Telling cores to stop...
Waiting for lcores to finish...

  ---------------------- Forward statistics for port 0  ----------------------
  RX-packets: 32             RX-dropped: 0             RX-total: 32
  TX-packets: 32             TX-dropped: 0             TX-total: 32
  ----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 1  ----------------------
  RX-packets: 32             RX-dropped: 0             RX-total: 32
  TX-packets: 32             TX-dropped: 0             TX-total: 32
  ----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
  RX-packets: 64             RX-dropped: 0             RX-total: 64
  TX-packets: 64             TX-dropped: 0             TX-total: 64
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Done.
testpmd>


>
> Definitely we will try to reproduce that issue with testpmd, with using 2K mbufs. Hopefully we can find the root cause, or tell you that's not an issue.
>

I forgot to mention that in my test code the TX/RX_MBUF_SIZE macros
also include the mbuf headroom and the size of the mbuf structure.
Therefore testing with 2K mbufs in my scenario actually creates
mempools of objects of size 2K + sizeof(struct rte_mbuf) +
RTE_PKTMBUF_HEADROOM.

> Thank you very much for your reporting!
>
> BTW, dev at dpdk.org should be the right one to replace users at dpdk.org, for sending questions/issues like this.

Thanks, I'll keep that in mind.

>
> Regards,
> Helin

Regards,
Dumitru

>
>> -----Original Message-----
>> From: Take Ceara [mailto:dumitru.ceara at gmail.com]
>> Sent: Monday, July 18, 2016 4:03 PM
>> To: users at dpdk.org
>> Cc: Zhang, Helin <helin.zhang at intel.com>; Wu, Jingjing <jingjing.wu at intel.com>
>> Subject: [dpdk-users] RSS Hash not working for XL710/X710 NICs for some RX
>> mbuf sizes
>>
>> Hi,
>>
>> Is there any known issue regarding the i40e DPDK driver when having RSS
>> hashing enabled in DPDK 16.04?
>> I've noticed that for some specific receive mbuf sizes the RSS hash is always set
>> to 0 for incoming packets.
>>
>> I have a setup with two XL710 ports connected back to back. The simple test
>> program below sends fixed TCP packets from port 0 to port 1. The
>> L5 payload is added in the packet in such a way that the packet consumes exactly
>> one TX mbuf. For some values of the RX mbuf size the incoming mbuf has the
>> hash.rss == 0 even though the PKT_RX_RSS_HASH flag is set in ol_flags. In my
>> code the TX/RX mbuf sizes are controlled by the RX_MBUF_SIZE and
>> TX_MBUF_SIZE macros.
>>
>> As an example, with some of the following TX/RX sizes the assert that checks if
>> the RSS hash is non-zero fails and with the other it passes:
>>
>> RX_MBUF_SIZE  TX_MBUF_SIZE assert
>> =================================
>> 1024          1024         fail
>> 1025          1024         ok
>> 1024          2048         fail
>> 2048          2048         fail
>> 2048          2047         fail
>> 2049          2048         ok
>>
>> On the same setup I have another loopback connection between two 82599ES
>> 10G NICs and when I run exactly the same test the RSS hash is always correct in
>> all cases.
>>
>> $ $RTE_SDK/tools/dpdk_nic_bind.py -s
>>
>> Network devices using DPDK-compatible driver
>> ============================================
>> 0000:02:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
>> drv=igb_uio unused=
>> 0000:03:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
>> drv=igb_uio unused=
>> 0000:82:00.0 'Ethernet Controller XL710 for 40GbE QSFP+' drv=igb_uio unused=
>> 0000:83:00.0 'Ethernet Controller XL710 for 40GbE QSFP+' drv=igb_uio unused=
>>
>> The command line I use for running the test on the 40G NICs is:
>>
>> ./build/test -c 0x1 -n 4 -m 1024 -w 0000:82:00.0 -w 0000:83:00.0
>>
>> Thanks,
>> Dumitru Ceara
>>
>> #include <stdbool.h>
>> #include <stdint.h>
>> #include <assert.h>
>> #include <unistd.h>
>>
>> #include <rte_ethdev.h>
>> #include <rte_timer.h>
>> #include <rte_ip.h>
>> #include <rte_tcp.h>
>> #include <rte_udp.h>
>> #include <rte_errno.h>
>> #include <rte_arp.h>
>>
>> #define MBUF_SIZE(frag_size) \
>>     ((frag_size) + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)
>>
>> #define RX_MBUF_SIZE MBUF_SIZE(RTE_MBUF_DEFAULT_DATAROOM)
>> #define TX_MBUF_SIZE MBUF_SIZE(RTE_MBUF_DEFAULT_DATAROOM)
>>
>> #define MBUF_CACHE 512
>> #define MBUF_COUNT 1024
>>
>> static struct rte_mempool *rx_mpool;
>> static struct rte_mempool *tx_mpool;
>>
>> #define PORT_MAX_MTU 9198
>>
>> #define L5_GET_LEN(pkt) (rte_pktmbuf_tailroom((pkt)))
>>
>> #define PORT0  0
>> #define PORT1  1
>> #define QUEUE0 0
>> #define Q_CNT  1
>>
>>
>> struct rte_eth_conf default_port_config = {
>>     .rxmode = {
>>         .mq_mode        = ETH_MQ_RX_RSS,
>>         .max_rx_pkt_len = PORT_MAX_MTU,
>>         .split_hdr_size = 0,
>>         .header_split   = 0, /**< Header Split disabled */
>>         .hw_ip_checksum = 1, /**< IP checksum offload enabled */
>>         .hw_vlan_filter = 0, /**< VLAN filtering disabled */
>>         .jumbo_frame    = 1, /**< Jumbo Frame Support disabled */
>>         .hw_strip_crc   = 0, /**< CRC stripped by hardware */
>>     },
>>     .rx_adv_conf = {
>>         .rss_conf = {
>>             .rss_key = NULL,
>>             .rss_key_len = 0,
>>             .rss_hf = ETH_RSS_IPV4 | ETH_RSS_NONFRAG_IPV4_TCP |
>> ETH_RSS_NONFRAG_IPV4_UDP,
>>         },
>>     },
>>     .txmode = {
>>         .mq_mode = ETH_MQ_TX_NONE,
>>     }
>> };
>>
>> struct rte_eth_rxconf rx_conf = {
>>     .rx_thresh = {
>>         .pthresh = 8,
>>         .hthresh = 8,
>>         .wthresh = 4,
>>     },
>>     .rx_free_thresh = 64,
>>     .rx_drop_en = 0
>> };
>>
>> struct rte_eth_txconf tx_conf = {
>>     .tx_thresh = {
>>         .pthresh = 36,
>>         .hthresh = 0,
>>         .wthresh = 0,
>>     },
>>     .tx_free_thresh = 64,
>>     .tx_rs_thresh = 32,
>> };
>>
>> static void port_setup(uint32_t port)
>> {
>>     uint32_t queue;
>>     int ret;
>>
>>     assert(rte_eth_dev_configure(port, Q_CNT, Q_CNT,
>>                                  &default_port_config) == 0);
>>     for (queue = 0; queue < Q_CNT; queue++) {
>>         ret = rte_eth_rx_queue_setup(port, queue, 128, SOCKET_ID_ANY,
>>                                      &rx_conf,
>>                                      rx_mpool);
>>         assert(ret == 0);
>>         ret = rte_eth_tx_queue_setup(port, queue, 128, SOCKET_ID_ANY,
>>                                      &tx_conf);
>>         assert(ret == 0);
>>     }
>>
>>     assert(rte_eth_dev_start(port) == 0); }
>>
>> #define HDRS_SIZE                   \
>>         (sizeof(struct ether_hdr) + \
>>          sizeof(struct ipv4_hdr) +  \
>>          sizeof(struct tcp_hdr))
>>
>> static struct rte_mbuf *get_tcp_pkt(uint16_t eth_port) {
>>     struct rte_mbuf  *pkt;
>>     struct ether_hdr *eth_hdr;
>>     struct ipv4_hdr  *ip_hdr;
>>     struct tcp_hdr   *tcp_hdr;
>>     uint32_t          ip_hdr_len = sizeof(*ip_hdr);
>>     uint32_t          tcp_hdr_len = sizeof(*tcp_hdr);
>>     uint32_t          l5_len;
>>
>>     assert(pkt = rte_pktmbuf_alloc(tx_mpool));
>>
>>     pkt->port = eth_port;
>>     pkt->l2_len = sizeof(*eth_hdr);
>>
>>     RTE_LOG(ERR, USER1, "1:head = %d, tail = %d, len = %d\n",
>>             rte_pktmbuf_headroom(pkt), rte_pktmbuf_tailroom(pkt),
>>             rte_pktmbuf_pkt_len(pkt));
>>
>>     /* Reserve space for ETH + IP + TCP Headers.
>>      * Store how much tailroom we have.
>>      */
>>     eth_hdr = (struct ether_hdr *)rte_pktmbuf_append(pkt, HDRS_SIZE);
>>     assert(eth_hdr);
>>     l5_len = L5_GET_LEN(pkt);
>>
>>     /* ETH Header. */
>>     rte_eth_macaddr_get(PORT0, &eth_hdr->s_addr);
>>     rte_eth_macaddr_get(PORT1, &eth_hdr->d_addr);
>>     eth_hdr->ether_type = rte_cpu_to_be_16(ETHER_TYPE_IPv4);
>>
>>     /* IP Header. */
>>     ip_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
>>     ip_hdr->version_ihl = (4 << 4) | (ip_hdr_len >> 2);
>>     ip_hdr->type_of_service = 0;
>>     ip_hdr->total_length = rte_cpu_to_be_16(ip_hdr_len + tcp_hdr_len +
>> l5_len);
>>     ip_hdr->packet_id = 0;
>>     ip_hdr->fragment_offset = rte_cpu_to_be_16(0);
>>     ip_hdr->time_to_live = 60;
>>     ip_hdr->next_proto_id = IPPROTO_TCP;
>>     ip_hdr->src_addr = rte_cpu_to_be_32(0x01010101);
>>     ip_hdr->dst_addr = rte_cpu_to_be_32(0x01010101);
>>     ip_hdr->hdr_checksum = rte_cpu_to_be_16(0);
>>
>>     pkt->l3_len = ip_hdr_len;
>>     pkt->ol_flags |= PKT_TX_IP_CKSUM;
>>
>>     /* TCP Header. */
>>     tcp_hdr = (struct tcp_hdr *)(ip_hdr + 1);
>>     tcp_hdr->src_port = rte_cpu_to_be_16(0x42);
>>     tcp_hdr->dst_port = rte_cpu_to_be_16(0x24);
>>     tcp_hdr->sent_seq = rte_cpu_to_be_32(0x1234);
>>     tcp_hdr->recv_ack = rte_cpu_to_be_32(0x1234);
>>     tcp_hdr->data_off = tcp_hdr_len >> 2 << 4;
>>     tcp_hdr->tcp_flags = TCP_FIN_FLAG;
>>     tcp_hdr->rx_win = rte_cpu_to_be_16(0xffff);
>>     tcp_hdr->tcp_urp = rte_cpu_to_be_16(0);
>>
>>     pkt->ol_flags |= PKT_TX_TCP_CKSUM | PKT_TX_IPV4;
>>     pkt->l4_len = tcp_hdr_len;
>>
>>     tcp_hdr->cksum = 0;
>>     tcp_hdr->cksum = rte_ipv4_phdr_cksum(ip_hdr, pkt->ol_flags);
>>
>>     /* Add Payload. */
>>     assert(rte_pktmbuf_append(pkt, l5_len));
>>
>>     RTE_LOG(ERR, USER1, "1:head = %d, tail = %d, len = %d\n",
>>             rte_pktmbuf_headroom(pkt), rte_pktmbuf_tailroom(pkt),
>>             rte_pktmbuf_pkt_len(pkt));
>>
>>     return pkt;
>> }
>>
>> int main(int argc, char **argv)
>> {
>>     struct rte_mbuf *tx_mbuf[3];
>>
>>     rte_eal_init(argc, argv);
>>
>>     rx_mpool = rte_mempool_create("rx_mpool", MBUF_COUNT,
>> RX_MBUF_SIZE,
>>                                   0,
>>                                   sizeof(struct rte_pktmbuf_pool_private),
>>                                   rte_pktmbuf_pool_init, NULL,
>>                                   rte_pktmbuf_init, NULL,
>>                                   SOCKET_ID_ANY,
>>                                   0);
>>
>>     tx_mpool = rte_mempool_create("tx_mpool", MBUF_COUNT,
>> TX_MBUF_SIZE,
>>                                   0,
>>                                   sizeof(struct rte_pktmbuf_pool_private),
>>                                   rte_pktmbuf_pool_init, NULL,
>>                                   rte_pktmbuf_init, NULL,
>>                                   SOCKET_ID_ANY,
>>                                   0);
>>
>>     assert(rx_mpool && tx_mpool);
>>
>>     port_setup(PORT0);
>>     port_setup(PORT1);
>>
>>     for (;;) {
>>         uint16_t no_rx_buffers;
>>         uint16_t i;
>>         struct rte_mbuf *rx_pkts[16];
>>
>>         tx_mbuf[0] = get_tcp_pkt(PORT0);
>>         assert(rte_eth_tx_burst(PORT0, QUEUE0, tx_mbuf, 1) == 1);
>>
>>         no_rx_buffers = rte_eth_rx_burst(PORT1, QUEUE0, rx_pkts, 16);
>>         for (i = 0; i < no_rx_buffers; i++) {
>>             RTE_LOG(ERR, USER1, "RX RSS HASH: %8lX %4X\n",
>>                     rx_pkts[i]->ol_flags,
>>                     rx_pkts[i]->hash.rss);
>>
>>             assert(rx_pkts[i]->ol_flags == PKT_RX_RSS_HASH);
>>             assert(rx_pkts[i]->hash.rss != 0);
>>
>>             rte_pktmbuf_free(rx_pkts[i]);
>>         }
>>     }
>>
>>     return 0;
>> }



-- 
Dumitru Ceara


More information about the dev mailing list