[dpdk-dev] [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode

Roger B. Melton rmelton at cisco.com
Fri Oct 6 14:38:01 CEST 2017

On 10/6/17 4:54 AM, Bruce Richardson wrote:
> On Thu, Oct 05, 2017 at 03:11:11PM -0400, Roger B Melton wrote:
>> ---
>> i40evf tx vector logic frees mbufs, but it does not remove the
>> mbufs from software rings which leads to double frees.  This change
>>   corrects that oversight.  We've validated this fix within our application.
> Hi Roger,
> I'm a little concerned here by this driver fix, since if we are getting
> double frees of mbufs there must be another bug somewhere in tracking
> the free ring elements. Clearing the entries to NULL introduces extra
> writes to the vector code path which will likely have a performance
> impact, but also should be unnecessary, given proper tracking of the
> ring status.
> Can you provide us with some details as to how to reproduce the issue,
> especially with a public sample app? I'd really like to look more into
> why this is happening, and if other fixes may work.
> Thanks,
> /Bruce
> .
Hey Bruce,

I've not attempted to reproduce the issue using sample apps.  It was 
initially difficult for us to reproduce with our application until we 
stumbled on a recipe.  The symptoms of the double free are two fold:

  * Application crashes: Corrupted packet headers, walking a chain of rx
    segments and loosing one in the middle,...
  * i40e adminq lockup - Meaning VF sends an OP and PF does not process it

The former has been directly correlated to tx vector mode.  We still 
don't understand how this could lead to adminq lockup, but we have not 
observed the issue applied the patch I submitted.  We have questions out 
to Intel i40e team on this.

Here's a high level view of the scenario under which the issues are 
observed and how we concluded that there were issues with tx vector 
free. We provided this information to the Intel i40e DPDK team, they 
reviewed the tx vector logic and suggested changes.  With the changes 
suggested by Intel (the patch I submitted) we have re-enabled tx vector 
when MTU is < MBUF size and observed no crashes.

Below that you will find additional detail on the procedure within our 
application for changing MTU, including DPDK API calls.

Let me know if you have additional questions.


      o Scenario:

          + Intel i40e-da4 (4x10G) or Cisco branded equivalent
          + KVM or ESXi 6.5 host
          + In a single VM we have at least 1 VF from all 4 PFs.
          + We are running traffic on all interfaces.
          + We stop traffic
          + We stop a device (VF)
          + We start the device
          + We start traffic
          + At any point after we start the device we can crash

      o Experiment (Jumping over some of the work to get to the point
        where we believed that i40e driver was doing double frees):

          + Our application attaches userdata to mbufs and we put that
            userdata on a linked list.
          + We observed that when we processed the userdata it had been
            corrupted which lead to crashes.
          + This gave us a hint that the mbuf was on multiple lists.
          + We reviewed our application code and could not find a cause.
          + We began to suspect a double free in the i40evf PMD.

              # We disabled rx free logic and observed crashes
                (intentionally leaking mbufs in search of the double free).
              # We disabled tx free logic and observed no crashes
              # This gave us a hint that the double frees were coming
                from the i40evf PMD tx logic.

          + We had also observed that if we forced MTU to large always
            that there were no crashes

              # A side effect of forcing large MTU is that multi-segment
                is enabled.
              # This gave us a hint that enabling multi-segment was
                somehow avoiding the double free.

          + We forced multi-segments regardless of MTU and permitted MTU
            changes and observed no crashes.
          + We reviewed the i40evf mbuf free logic to see the effect of
            enabling multi-segment and observed that when multi-segment
            is enabled, rx vector was enabled but tx vector was not.
          + This lead us to examine RX vector mode free logic vs TX
            vector mode free logic.

              # RX free logic has special handling for vector mode free
              # TX free logic does not have any special handling for
                vector free

      o By enabling multiple segments always (even if MTU does not
        require multiple segments), we effectively disabled tx vector
        mode and this avoided the double free.
      o Our application no longer crashed, but it could not take
        advantage of tx vector optimizations.

CP == Control Plane
DP == Data Plane

  * CP sends admin down to DP
  * DP disables RX/TX
      o Block all future tx burst calls
      o rte_eth_dev_set_link_down() invoked
      o Block all future rx burst calls
  * DP notifies CP admin down action is complete
  * CP sends MTU change
  * DP processes MTU change
      o For each rxq:
          + rte_eth_rx_queue_info_get()
          + if not rxq_deferred_start rte_eth_dev_rx_queue_stop()
      o For each txq:
          + rte_eth_tx_queue_info_get()
          + if not txq_deferred_start rte_eth_dev_tx_queue_stop()
      o rte_eth_dev_stop()
      o Re-configure the port: (Note this is original code, not new code
        which is forcing multisegs always)
          + Set rx_mode.jumbo_frame if MTU > 1518
          + Set rx_mode.enable_scatter if MTU > 2048
          + txq_flags = ETH_TXQ_NOOFLOADS
          + if MTU > 2048, txq_flags |= ETH_TXQ_FLAGS_NOMULTSEGS
          + rte_eth_promiscuous_get()
          + rte_eth_dev_info_get()
          + rte_eth_dev_configure()
              # Init tx_vec_allowed and rx_vec_allowed to TRUE.
          + rte_eth_dev_info_get()
          + For each txq: rte_eth_tx_queue_setup()
              # If new MTU > 2048, ETH_TXQ_FLAGS_NOMULTSEGS was set in
                txq_flags & tx_vec_allowed will be cleared.
          + For each rxq: rte_eth_rx_queue_setup()
      o rte_eth_dev_set_mtu()
      o rte_eth_dev_start()
      o rte_eth_dev_info_get()
      o For each rxq:
          + if not rxq_deferred_start rte_eth_dev_rx_queue_start()
      o For each txq:
          + if not txq_deferred_start rte_eth_dev_tx_queue_start()
  * DP notifies CP MTU change applied.
  * CP sends admin up to DP
  * DP enables RX/TX
      o Enable all future tx burst calls
      o rte_eth_dev_set_link_up() invoked
      o Enable all future rx burst calls
  * DP notifies CP admin up action is complete

|Roger B. Melton                |          |      Cisco Systems      |
|CPP Software                  :|:        :|:     7100 Kit Creek Rd  |
|+1.919.476.2332 phone        :|||:      :|||:    RTP, NC 27709-4987 |
|+1.919.392.1094 fax       .:|||||||:..:|||||||:. rmelton at cisco.com  |
|                                                                    |
| This email may contain confidential and privileged material for the|
| sole use of the intended recipient. Any review, use, distribution  |
| or disclosure by others is strictly prohibited. If you are not the |
| intended recipient (or authorized to receive for the recipient),   |
| please contact the sender by reply email and delete all copies of  |
| this message.                                                      |
|                                                                    |
| For corporate legal information go to:                             |
| http://www.cisco.com/web/about/doing_business/legal/cri/index.html |
|__________________________ http://www.cisco.com ____________________|

More information about the dev mailing list