[dpdk-dev] [RFC 0/8] mbuf: structure reorganization

Olivier Matz olivier.matz at 6wind.com
Fri Feb 17 15:17:08 CET 2017


Hi Jan,

On Fri, 17 Feb 2017 14:38:32 +0100, Jan Blunck <jblunck at infradead.org>
wrote:
> On Fri, Feb 17, 2017 at 11:51 AM, Olivier Matz
> <olivier.matz at 6wind.com> wrote:
> > Hi Jan,
> >
> > On Thu, 16 Feb 2017 18:26:39 +0100, Jan Blunck
> > <jblunck at infradead.org> wrote:  
> >> On Thu, Feb 16, 2017 at 2:48 PM, Olivier Matz
> >> <olivier.matz at 6wind.com> wrote:  
> >> > On Mon, 6 Feb 2017 18:41:27 +0000, "Ananyev, Konstantin"
> >> > <konstantin.ananyev at intel.com> wrote:  
> >> >> >
> >> >> > The main changes are:
> >> >> > - reorder structure to increase vector performance on some
> >> >> > non-ia platforms.
> >> >> > - add a 64bits timestamp field in the 1st cache line  
> >> >>
> >> >> Wonder why it deserves to be in first cache line?
> >> >> How it differs from seqn below (pure SW stuff right now).  
> >> >
> >> > In case the timestamp is set from a NIC value, it is set in the
> >> > Rx path. So that's why I think it deserve to be located in the
> >> > 1st cache line.
> >> >
> >> > As you said, the seqn is a pure sw stuff right: it is set in a
> >> > lib, not in a PMD rx path.
> >> >  
> >>
> >> If we talk about setting the timestamp value in the RX path this
> >> implicitly means software timestamps. Hardware timestamping usually
> >> works by letting the hardware inject sync events for coarse time
> >> tracking and additionally injecting fine granular per-packet ticks
> >> at a specific offset in the packet. Out of performance reasons I
> >> don't think it makes sense to extract this during the burst and
> >> write it into the mbuf again.  
> >
> > From what I've understand, at least it does not work like this for
> > mellanox NICs: timestamp is a metadata attached to a rx packet. But
> > maybe they (and other NIC vendors interrested in the feature) can
> > confirm or not.
> >  
> 
> Mellanox NICs use a 48bit cycle counter split into a high and low
> part. To convert the cycle values into a timestamp you need to
> initialize and maintainer a timecounter that shifts the cycle count
> e.g. nanosecs. IIRC Mellanox doesn't generate explicit clock events
> but the cycle counter is large enough so that the user can easily
> maintain the timecounter by manually updating it.
> 
> >>
> >> The problem with timestamps is to get the abstraction right wrt the
> >> correction factors and the size of the tick vs. the timestamp in
> >> the events injected. From my perspective it would be better to
> >> extract the handling of timestamp data into a library with PMD
> >> specific implementation of the conversions. That way the
> >> normalized timestamp values can get extracted if they are present.
> >> The mbuf itself would only indicate the presence of timestamp
> >> metadata in that case.  
> >
> > I agree however that we need to properly define the meaning of this
> > field. My idea is:
> >
> > - the timestamp is in nanosecond
> > - the reference is always the same for a given path: if the
> > timestamp is set in a PMD, all the packets for this PMD will have
> > the same reference, but for 2 different PMDs (or a sw lib), the
> > reference would not be the same.
> >
> > I think it's enough for many use cases.
> > We can later add helpers to compare timestamps with different
> > references.  
> 
> My point is that I still doubt that it belongs into the first
> cacheline. It requires accessing other structures for converting into
> nanoseconds anyway. Optimally I would like to see this happening on
> access instead but if that isn't achievable at least in a second step.

Sorry, I don't really get your point. My comprehension of the timestamp
usage in a PMD is as following:

rx_burst(struct rxq *rxq, ...)
{
	unsigned long factor = rxq->timestamp_factor;
	unsigned port = rxq->port;

	for each hw_desc {
		m = rte_pktmbuf_alloc(rxq->pool);
		m->len = hw_desc->len;
		m->port = port;
		m->ol_flags = 
		...
		m->timestamp = hw_desc->timestamp * factor;
	}
	...
}

In that case, I think it deserves to be in the 1st cache line.


Olivier


More information about the dev mailing list