[dpdk-dev] [PATCH 2/4] distributor: new packet distributor library

Richardson, Bruce bruce.richardson at intel.com
Wed May 21 12:21:26 CEST 2014


> -----Original Message-----
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Tuesday, May 20, 2014 7:19 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/4] distributor: new packet distributor library
> 
> On Tue, May 20, 2014 at 11:00:55AM +0100, Bruce Richardson wrote:
> > This adds the code for a new Intel DPDK library for packet distribution.
> > The distributor is a component which is designed to pass packets
> > one-at-a-time to workers, with dynamic load balancing. Using the RSS
> > field in the mbuf as a tag, the distributor tracks what packet tag is
> > being processed by what worker and then ensures that no two packets with
> > the same tag are in-flight simultaneously. Once a tag is not in-flight,
> > then the next packet with that tag will be sent to the next available
> > core.
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson at intel.com>
> ><snip>
> 
> > +#define RTE_DISTRIB_GET_BUF (1)
> > +#define RTE_DISTRIB_RETURN_BUF (2)
> > +
> Can you document the meaning of these bits please, the code makes it
> somewhat
> confusing to differentiate them.  As I read the code, GET_BUF is used as a flag
> to indicate that rte_distributor_get_pkt needs to wait while a buffer is
> filled in by the processing thread, while RETURN_BUF indicates that a worker is
> leaving and the buffer needs to be (re)assigned to an alternate worker, is that
> correct?
Pretty much. I'll add additional comments to the code.

> 
> > +#define RTE_DISTRIB_BACKLOG_SIZE 8
> > +#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
> > +
> > +#define RTE_DISTRIB_MAX_RETURNS 128
> > +#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
> > +
> > +union rte_distributor_buffer {
> > +	volatile int64_t bufptr64;
> > +	char pad[CACHE_LINE_SIZE*3];
> Do you need the pad, if you mark the struct as cache aligned?
Yes, for performance reasons we actually want the structure to take up three cache lines, not just one. For instance, this will guarantee that we don't have adjacent line prefetcher in hardware pull in an additional cache line -belonging to a different worker - when we access the memory.

> > +} __rte_cache_aligned;
> >
> +
> ><snip>
> > +
> > +struct rte_mbuf *
> > +rte_distributor_get_pkt(struct rte_distributor *d,
> > +		unsigned worker_id, struct rte_mbuf *oldpkt,
> > +		unsigned reserved __rte_unused)
> > +{
> > +	union rte_distributor_buffer *buf = &d->bufs[worker_id];
> > +	int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS) |
> \
> > +			RTE_DISTRIB_GET_BUF;
> > +	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
> > +		rte_pause();
> > +	buf->bufptr64 = req;
> > +	while (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
> > +		rte_pause();
> You may want to document the fact that this is deadlock prone.  You clearly
> state that only a single thread can run the processing routine, but if a user
> selects a single worker thread to preform double duty, the GET_BUF_FLAG will
> never get cleared here, and no other queues will get processed.
Agreed, I'll update the comments.

> 
> > +	/* since bufptr64 is a signed value, this should be an arithmetic shift */
> > +	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
> > +	return (struct rte_mbuf *)((uintptr_t)ret);
> > +}
> > +
> > +int
> > +rte_distributor_return_pkt(struct rte_distributor *d,
> > +		unsigned worker_id, struct rte_mbuf *oldpkt)
> > +{
> Maybe some optional sanity checking, here and above, to ensure that a packet
> returned through get_pkt doesn't also get returned here, mangling the flags
> field?
That actually shouldn't be an issue. 
When we return a packet using this call, we just get the in_flight_ids value for the worker to zero (and re-assign the backlog, if any), and move on to the next worker. No checking of the returned packet is done. Also, since get_pkt always returns a new packet, the internal logic will still work ok - all that will happen if you return the wrong packet, e.g. by returning the same packet twice rather than returning the latest packet each time, is that the returns array will have the duplicated pointer in it. Whatever gets passed back by the worker gets stored directly there - it's up to the worker to return the correct pointer to the distributor.

> 
> ><snip>
> > +
> > +/* flush the distributor, so that there are no outstanding packets in flight or
> > + * queued up. */
> > +int
> > +rte_distributor_flush(struct rte_distributor *d)
> > +{
> You need to document that this function can only be called by the same thread
> that is running rte_distributor_process, lest your corrupt your queue data.
> Alternatively, it might be nicer to modify this functions internals to set a
> flag in the distributor status bits to make the process routine do the flush
> work when it gets set.  that would allow this function to be called by any
> other thread, which seems like a more natural interface.
Agreed. At minimum I'll update the comments, and I'll also look into what would be involved in changing the mechanism like you describe. However, given the limited time to code freeze date, it may not be possible to do here. [I also don't anticipate this function being much used in normal operations anyway - it was written in order to allow me to write proper unit tests to test the process function. We need a flush function for unit testing to sure that our packet counts are predictable at the end of each test run, and eliminate any dependency in the tests on the internal buffer sizes of the distributor.]

> 
> ><snip>
> > +}
> > +
> > +/* clears the internal returns array in the distributor */
> > +void
> > +rte_distributor_clear_returns(struct rte_distributor *d)
> > +{
> This can also only be called by the same thread that runs the process routine,
> lest the start and count values get mis-assigned.
Agreed. Will update comments.

> 
> > +	d->returns.start = d->returns.count = 0;
> > +#ifndef __OPTIMIZE__
> > +	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
> > +#endif
> > +}
> > +
> > +/* creates a distributor instance */
> > +struct rte_distributor *
> > +rte_distributor_create(const char *name,
> > +		unsigned socket_id,
> > +		unsigned num_workers,
> > +		struct rte_distributor_extra_args *args __rte_unused)
> > +{
> > +	struct rte_distributor *d;
> > +	struct rte_distributor_list *distributor_list;
> > +	char mz_name[RTE_MEMZONE_NAMESIZE];
> > +	const struct rte_memzone *mz;
> > +
> > +	/* compilation-time checks */
> > +	RTE_BUILD_BUG_ON((sizeof(*d) & CACHE_LINE_MASK) != 0);
> > +	RTE_BUILD_BUG_ON((RTE_MAX_LCORE & 7) != 0);
> > +
> > +	if (name == NULL || num_workers >= RTE_MAX_LCORE) {
> > +		rte_errno = EINVAL;
> > +		return NULL;
> > +	}
> > +	rte_snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s",
> name);
> > +	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id,
> NO_FLAGS);
> > +	if (mz == NULL) {
> > +		rte_errno = ENOMEM;
> > +		return NULL;
> > +	}
> > +
> > +	/* check that we have an initialised tail queue */
> > +	if ((distributor_list =
> RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_DISTRIBUTOR,
> > +			rte_distributor_list)) == NULL) {
> > +		rte_errno = E_RTE_NO_TAILQ;
> > +		return NULL;
> > +	}
> > +
> > +	d = mz->addr;
> > +	rte_snprintf(d->name, sizeof(d->name), "%s", name);
> > +	d->num_workers = num_workers;
> > +	TAILQ_INSERT_TAIL(distributor_list, d, next);
> You need locking around this list unless you intend to assert that distributor
> creation and destruction must only be preformed from a single thread.  Also,
> where is the API method to tear down a distributor instance?
Ack re locking, will make this as used in other structures.
For tearing down, that's not possible until such time as we get a function to free memzones back. Rings and mempools similarly have no free function.

> 
> ><snip>
> > +#endif
> > +
> > +#include <rte_mbuf.h>
> > +
> > +#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance
> */
> > +
> > +struct rte_distributor;
> > +
> > +struct rte_distributor_extra_args { }; /**< reserved for future use*/
> > +
> You don't need to reserve a struct name for future use.  No one will use it
> until you create it.
> 
> > +struct rte_mbuf *
> > +rte_distributor_get_pkt(struct rte_distributor *d,
> > +		unsigned worker_id, struct rte_mbuf *oldpkt, unsigned
> reserved);
> > +
> Don't need to reserve an extra argument here.  You're not ABI safe currently,
> and if DPDK becomes ABI safe in the future, we will use a linker script to
> provide versions with backward compatibility easily enough.
We may not have ABI compatibility between releases, but on the other hand we try to reduce the amount of code changes that need to be made by our customers who are compiling their code against the libraries - generally linking against static rather than shared libraries. Since we have a reasonable expectation that this field will be needed in a future release, we want to include it now so that when we do need it, no code changes need to be made to upgrade this particular library to a new Intel DPDK version.



More information about the dev mailing list