[dpdk-dev] [PATCH] mbuf: make rearm_data address naturally aligned

Bruce Richardson bruce.richardson at intel.com
Thu May 19 10:50:48 CEST 2016


On Thu, May 19, 2016 at 12:20:16AM +0530, Jerin Jacob wrote:
> On Wed, May 18, 2016 at 05:43:00PM +0100, Bruce Richardson wrote:
> > On Wed, May 18, 2016 at 07:27:43PM +0530, Jerin Jacob wrote:
> > > To avoid multiple stores on fast path, Ethernet drivers
> > > aggregate the writes to data_off, refcnt, nb_segs and port
> > > to an uint64_t data and write the data in one shot
> > > with uint64_t* at &mbuf->rearm_data address.
> > > 
> > > Some of the non-IA platforms have store operation overhead
> > > if the store address is not naturally aligned.This patch
> > > fixes the performance issue on those targets.
> > > 
> > > Signed-off-by: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> > > ---
> > > 
> > > Tested this patch on IA and non-IA(ThunderX) platforms.
> > > This patch shows 400Kpps/core improvement on ThunderX + ixgbe + vector environment.
> > > and this patch does not have any overhead on IA platform.
> > > 
> > > Have tried an another similar approach by replacing "buf_len" with "pad"
> > > (in this patch context),
> > > Since it has additional overhead on read and then mask to keep "buf_len" intact,
> > > not much improvement is not shown.
> > > ref: http://dpdk.org/ml/archives/dev/2016-May/038914.html
> > > 
> > > ---
> > While this will work and from your tests doesn't seem to have a performance
> > impact, I'm not sure I particularly like it. It's extending out the end of
> > cacheline0 of the mbuf by 16 bytes, though I suppose it's not technically using
> > up any more space of it.
> 
> Extending by 2 bytes. Right ?. Yes, I guess, Now we using only 56 out of 64 bytes
> in the first 64-byte cache line.
> 
> > 
> > What I'm wondering about though, is do we have any usecases where we need a
> > variable buf_len for packets for RX. These mbufs come directly from a mempool,
> > which is generally understood to be a set of fixed-sized buffers. I realise that
> > this change was made in the past after some discussion, but one of the key points
> > there [at least to my reading] was that - even though nobody actually made a
> > concrete case where they had variable-sized buffers - having support for them
> > made no performance difference.
> > 
> > The latter part of that has now changed, and supporting variable-sized mbufs
> > from an mbuf pool has a perf impact. Do we definitely need that functionality,
> > because the easiest fix here is just to move the rxrearm marker back above
> > mbuf_len as it was originally in releases like 1.8?
> 
> And initialize the buf_len with mp->elt_size - sizeof(struct rte_mbuf).
> Right?
> 
> I don't have a strong opinion on this, I can do this if there is no
> objection on this. Let me know.
> 
> However, I do see in future, "buf_len" may belong at the end of the first 64 byte
> cache line as currently "port" is defined as uint8_t, IMO, that is less.
> We may need to increase that uint16_t. The reason why I think that
> because, Currently in ThunderX HW, we do have 128VFs per socket for
> built-in NIC, So, the two node configuration and one external PCIe NW card
> configuration can easily go beyond 256 ports.
> 
Ok, good point. If you think it's needed, and if we are changing the mbuf
structure, it might be a good time to extend that field while you are at it, save
a second ABI break later on.

/Bruce

> > 
> > Regards,
> > /Bruce
> > 
> > Ref: http://dpdk.org/ml/archives/dev/2014-December/009432.html
> > 


More information about the dev mailing list