[dpdk-dev] [PATCH 0/9] mbuf: structure reorganization

Olivier Matz olivier.matz at 6wind.com
Fri Mar 31 10:41:07 CEST 2017


On Thu, 30 Mar 2017 18:06:35 +0000, "Ananyev, Konstantin" <konstantin.ananyev at intel.com> wrote:
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Thursday, March 30, 2017 5:48 PM
> > To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; Richardson, Bruce <bruce.richardson at intel.com>; Olivier Matz
> > <olivier.matz at 6wind.com>
> > Cc: dev at dpdk.org; mb at smartsharesystems.com; Chilikin, Andrey <andrey.chilikin at intel.com>; jblunck at infradead.org;
> > nelio.laranjeiro at 6wind.com; arybchenko at solarflare.com
> > Subject: RE: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > 
> > 
> >   
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> > > Sent: Thursday, March 30, 2017 5:45 PM
> > > To: Richardson, Bruce <bruce.richardson at intel.com>; Olivier Matz <olivier.matz at 6wind.com>
> > > Cc: dev at dpdk.org; mb at smartsharesystems.com; Chilikin, Andrey <andrey.chilikin at intel.com>; jblunck at infradead.org;
> > > nelio.laranjeiro at 6wind.com; arybchenko at solarflare.com
> > > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > >
> > >
> > >  
> > > > -----Original Message-----
> > > > From: Richardson, Bruce
> > > > Sent: Thursday, March 30, 2017 1:23 PM
> > > > To: Olivier Matz <olivier.matz at 6wind.com>
> > > > Cc: dev at dpdk.org; Ananyev, Konstantin <konstantin.ananyev at intel.com>; mb at smartsharesystems.com; Chilikin, Andrey
> > > > <andrey.chilikin at intel.com>; jblunck at infradead.org; nelio.laranjeiro at 6wind.com; arybchenko at solarflare.com
> > > > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > > >
> > > > On Thu, Mar 30, 2017 at 02:02:36PM +0200, Olivier Matz wrote:  
> > > > > On Thu, 30 Mar 2017 10:31:08 +0100, Bruce Richardson <bruce.richardson at intel.com> wrote:  
> > > > > > On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:  
> > > > > > > On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:  
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Does anyone have any other comment on this series?
> > > > > > > > Can it be applied?
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Olivier
> > > > > > > >  
> > > > > > >
> > > > > > > I assume all driver maintainers have done performance analysis to check
> > > > > > > for regressions. Perhaps they can confirm this is the case.
> > > > > > >
> > > > > > > 	/Bruce  
> > > > > > > >  
> > > > > > In the absence, of anyone else reporting performance numbers with this
> > > > > > patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
> > > > > > driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
> > > > > > fairly noticable performance drop. I still need to dig in more, e.g. do
> > > > > > an RFC2544 zero-loss test, and also bisect the patchset to see what
> > > > > > parts may be causing the problem.
> > > > > >
> > > > > > Has anyone else tried any other drivers or systems to see what the perf
> > > > > > impact of this set may be?  
> > > > >
> > > > > I did, of course. I didn't see any noticeable performance drop on
> > > > > ixgbe (4 NICs, one port per NIC, 1 core). I can replay the test with
> > > > > current version.
> > > > >  
> > > > I had no doubt you did some perf testing! :-)
> > > >
> > > > Perhaps the regression I see is limited to i40e driver. I've confirmed I
> > > > still see it with that driver in zero-loss tests, so next step is to try
> > > > and localise what change in the patchset is causing it.
> > > >
> > > > Ideally, though, I think we should see acks or other comments from
> > > > driver maintainers at least confirming that they have tested. You cannot
> > > > be held responsible for testing every DPDK driver before you submit work
> > > > like this.
> > > >  
> > >
> > > Unfortunately  I also see a regression.
> > > Did a quick flood test on 2.8 GHZ IVB with 4x10Gb.  
> > 
> > Sorry, forgot to mention - it is on ixgbe.
> > So it doesn't look like i40e specific.
> >   
> > > Observed a drop even with default testpmd RXD/TXD numbers (128/512):
> > > from 50.8 Mpps down to 47.8 Mpps.
> > > From what I am seeing the particular patch that causing it:
> > > [dpdk-dev,3/9] mbuf: set mbuf fields while in pool
> > >
> > > cc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
> > > cmdline:
> > > ./dpdk.org-1705-mbuf1/x86_64-native-linuxapp-gcc/app/testpmd  --lcores='7,8'  -n 4 --socket-mem='1024,0'  -w 04:00.1 -w 07:00.1 -w
> > > 0b:00.1 -w 0e:00.1 -- -i
> > >  
> 
> Actually one more question regarding:
> [dpdk-dev,9/9] mbuf: reorder VLAN tci and buffer len fields
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index fd97bd3..ada98d5 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -449,8 +449,7 @@  struct rte_mbuf {
>  
>  	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
>  	uint16_t data_len;        /**< Amount of data in segment buffer. */
> -	/** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
> -	uint16_t vlan_tci;
> +	uint16_t buf_len;         /**< Size of segment buffer. */
>  
>  	union {
>  		uint32_t rss;     /**< RSS hash result if RSS enabled */
> @@ -475,11 +474,11 @@  struct rte_mbuf {
>  		uint32_t usr;	  /**< User defined tags. See rte_distributor_process() */
>  	} hash;                   /**< hash information */
>  
> +	/** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
> +	uint16_t vlan_tci;
>  	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
>  	uint16_t vlan_tci_outer;
>  
> -	uint16_t buf_len;         /**< Length of segment buffer. */
> -
>  	/** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
>  	 * are not normalized but are always the same for a given port.
>  	 */
> 
> How ixgbe and i40e SSE version supposed to work correctly after  that change?
> As I remember both of them sets vlan_tci as part of 16B shuffle operation.
> Something like that:
> pkt_mb4 = _mm_shuffle_epi8(descs[3], shuf_msk);
> ...
> mm_storeu_si128((void *)&rx_pkts[pos+3]->rx_descriptor_fields1,
>                                  pkt_mb4);
> 
> But now vlan_tci is swapped with buf_len.
> Which means 2 things to me:
> It is more than 16B away from rx_descriptor_fields1 and can't be updated in one go anymore,
> and instead of vlan_tci we are updating buf_len.


Sorry, I missed it. But this shows something problematic: changing the
order of fields in a structure breaks code without notification. I think
that drivers expecting a field at a specific position should have some
BUG_ON() to check that the condition is still valid. We can't expect anyone
to know all the constraints of all vectors PMDs in DPDK.

The original idea of this patch was to group vlan_tci and vlan_outer_tci,
which looked to be a good idea at first glance. If it requires to change
all vector code, let's drop it.

Just for the exercice, let's imagine we need that patch. What would be
the procedure to have it integrated? How can we detect there is an issue?
Who would be in charge of modifying all the vector code in PMDs?


Regards,
Olivier


More information about the dev mailing list