[dpdk-dev] [RFC PATCH 0/1] eventtimer: introduce event timer wheel

Jerin Jacob jerin.jacob at caviumnetworks.com
Fri Aug 25 12:25:38 CEST 2017


-----Original Message-----
> Date: Wed, 23 Aug 2017 22:57:08 +0000
> From: "Carrillo, Erik G" <erik.g.carrillo at intel.com>
> To: Jerin Jacob <jerin.jacob at caviumnetworks.com>, "dev at dpdk.org"
>  <dev at dpdk.org>
> CC: "thomas at monjalon.net" <thomas at monjalon.net>, "Richardson, Bruce"
>  <bruce.richardson at intel.com>, "Van Haaren, Harry"
>  <harry.van.haaren at intel.com>, "hemant.agrawal at nxp.com"
>  <hemant.agrawal at nxp.com>, "Eads, Gage" <gage.eads at intel.com>,
>  "nipun.gupta at nxp.com" <nipun.gupta at nxp.com>, "Vangati, Narender"
>  <narender.vangati at intel.com>, "Rao, Nikhil" <nikhil.rao at intel.com>,
>  "pbhagavatula at caviumnetworks.com" <pbhagavatula at caviumnetworks.com>,
>  "jianbo.liu at linaro.org" <jianbo.liu at linaro.org>, "rsanford at akamai.com"
>  <rsanford at akamai.com>
> Subject: RE: [dpdk-dev] [RFC PATCH 0/1] eventtimer: introduce event timer
>  wheel
> 
> Hi Jerin,

Hi Carrillo,

> 
> Thanks for sharing your proposal.
> 
> We have implemented something quite similar locally.  In applications that utilize the eventdev framework, entities we call "bridge drivers" are configured, and they are analogous to service cores. 

OK.

> 
> One such bridge driver, the Timer Bridge Driver, runs on an lcore specified during application startup and, once it is started, will manage set of event timers, and enqueue timer events into an event device upon their expiry.  To use event timers, the application will allocate them and set a payload pointer and queue id in pretty much the same way you've shown.  Then we call rte_event_timer_reset() to arm the timer, which will install it in one of the rte_timer library's skiplists.  Concurrently, the Timer Bridge Driver will be executing a run() function in a loop, which will repeatedly execute rte_timer_manage().  For any timers that have expired, a callback defined in the bridge driver will execute, and this callback will enqueue a new event of type TIMER.  As workers are dequeuing events, they will encounter the timer event and can use the "payload pointer" to get back to some specified object.

OK

> 
> Some differences include:
>  - our API doesn't currently support burst-style arming
>  - our API supports periodic timers (since they are free from the rte_timer lib)
> 
> Regarding the implementation you describe in the "Implementation thoughts" section of your email, our timer bridge driver doesn't have a ring in which it enqueues timer events.  Instead the rte_event_timer_reset() is mostly a wrapper for rte_timer_reset().  Because various worker lcores could be modifying timers concurrently, and they were all headed for the same skiplist, we encountered performance issues with lock contention for that skiplist.  I modified the timer library itself in various ways, including adding a multiple-producer single-consumer ring for requests to modify the skiplists.  I had the best results however, when I created per-installer skiplists for each target lcore.  I submitted the patch to the ML today[1].  I personally saw contention on the CAS operation to update the head pointer of ring when multiple lcores installed timers repeatedly and simultaneously, but perhaps burst-enqueuing can avoid that.

OK

> 
> On a separate note, it also looks like attributes of the timer wheel pertaining to resolution or max number of timers will have no effect in the software implementation.  Instead, timer resolution would be a function of the frequency with which a service core can invoke rte_timer_manage.  Also, it would seem that no limit on the number of timers would be necessary.  Does that sound right?

Yes. The SW implementation can ignore the resolution and max number of
timers. This is just hint from application to partition the HW resource
effectively.

> 
> In summary, it looks like our solutions align fairly well, and I propose that we take on the software implementation if there are no objections.

Sure, no objection.

Could you please comment on the API changes required in the header file.
Once it is finalized. We need to have
a) common code for ops
b) SW driver
c) HW driver

and we would like contribute on (a) and (c)

/Jerin


> 
> [1] http://dpdk.org/ml/archives/dev/2017-August/073317.html
> 
> Thanks,
> Gabriel
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > Sent: Thursday, August 17, 2017 11:11 AM
> > To: dev at dpdk.org
> > Cc: thomas at monjalon.net; Richardson, Bruce
> > <bruce.richardson at intel.com>; Van Haaren, Harry
> > <harry.van.haaren at intel.com>; hemant.agrawal at nxp.com; Eads, Gage
> > <gage.eads at intel.com>; nipun.gupta at nxp.com; Vangati, Narender
> > <narender.vangati at intel.com>; Rao, Nikhil <nikhil.rao at intel.com>;
> > pbhagavatula at caviumnetworks.com; jianbo.liu at linaro.org;
> > rsanford at akamai.com; Jerin Jacob <jerin.jacob at caviumnetworks.com>
> > Subject: [dpdk-dev] [RFC PATCH 0/1] eventtimer: introduce event timer
> > wheel
> > 
> > Some of the NPU class of networking hardwares has timer hardware where
> > the user can arm and cancel the event timer. On the expiry of the timeout
> > time, the hardware will post the notification as an event to eventdev HW,
> > Instead of calling a callback like CPU based timer scheme. It enables,
> > highresolution (1us or so) timer management using internal or external clock
> > domains, and offloading the timer housing keeping work from the worker
> > lcores.
> > 
> > This RFC attempts to abstract such NPU class of timer Hardware and
> > introduce event timer wheel subsystem inside the eventdev as they are
> > tightly coupled.
> > 
> > This RFC introduces the functionality to create an event timer wheel. This
> > allows an application to arm event timers, which shall enqueue an event to a
> > specified event queue on expiry of a given interval.
> > 
> > The event timer wheel uses an ops table to which the various event devices
> > (e.g Cavium Octeontx, NXP dpaa2 and SW) register timer subsystem
> > implementation specific ops to use.
> > 
> > The RFC extends DPDK event based programming model where event can be
> > of type timer, and expiry event will be notified through CPU over eventdev
> > ports.
> > 
> > Some of the use cases of event timer wheel are Beacon Timers, Generic SW
> > Timeout, Wireless MAC Scheduling, 3G Frame Protocols, Packet Scheduling,
> > Protocol Retransmission Timers, Supervision Timers.
> > All these use cases require high resolution and low time drift.
> > 
> > The abstract working model of an event timer wheel is as follows:
> > ==========================================================
> > =======
> >                                timer_tick_ns
> >                                    +
> >                       +-------+    |
> >                       |       |    |
> >               +-------+ bkt 0 +----v---+
> >               |       |       |        |
> >               |       +-------+        |
> >           +---+---+                +---+---+  +---+---+---+---+
> >           |       |                |       |  |   |   |   |   |
> >           | bkt n |                | bkt 1 |<-> t0| t1| t2| tn|
> >           |       |                |       |  |   |   |   |   |
> >           +---+---+                +---+---+  +---+---+---+---+
> >               |       Timer wheel      |
> >           +---+---+                +---+---+
> >           |       |                |       |
> >           | bkt 4 |                | bkt 2 |<--- Current bucket
> >           |       |                |       |
> >           +---+---+                +---+---+
> >                |      +-------+       |
> >                |      |       |       |
> >                +------+ bkt 3 +-------+
> >                       |       |
> >                       +-------+
> > 
> >  - It has a virtual monotonically increasing 64-bit timer wheel clock based on
> >    *enum rte_event_timer_wheel_clk_src* clock source. The clock source
> > could
> >    be a CPU clock, or a platform depended external clock.
> > 
> >  - Application creates a timer wheel instance with given clock source,
> >    the total number of event timers, resolution(expressed in ns) to traverse
> >    between the buckets.
> > 
> >  - Each timer wheel may have 0 to n buckets based on the configured
> >    max timeout(max_tmo_ns) and resolution(timer_tick_ns). On timer wheel
> >    start, the timer starts ticking at *timer_tick_ns* resolution.
> > 
> >  - Application arms an event timer to be expired at the number of
> >    *timer_tick_ns* from now.
> > 
> >  - Application can cancel the existing armed timer if required.
> > 
> >  - If not canceled by the application and the timer expires then the library
> >    injects the timer expiry event to the designated event queue.
> > 
> >  - The timer expiry event will be received through
> > *rte_event_dequeue_burst*
> > 
> >  - Application frees the created timer wheel instance.
> > 
> > A more detailed description of the event timer wheel is contained in the
> > header's comments.
> > 
> > Implementation thoughts
> > =======================
> > The event devices have to provide a driver level function that is used to get
> > event timer subsystem capability and the respective event timer wheel ops.
> > if the event device is not capable a software implementation of the event
> > timer wheel ops will be selected.
> > 
> > The software implementation of timer wheel will make use of existing
> > rte_timer[1], rte_ring library and EAL service cores[2] to achieve event
> > generation. The worker cores call event timer arm function which enqueues
> > event timer to a rte_ring. The registered service core would then dequeue
> > event timer from rte_ring and use the rte_timer library to register a timer.
> > The service core then invokes rte_timer_manage() function to retrieve
> > expired timers and generates the associated event.
> > 
> > The implementation of event timer wheel subsystem for both hardware
> > (Cavium
> > OCTEONTX) and software(if there are no volunteers) will be undertaken by
> > Cavium.
> > 
> > [1] http://dpdk.org/doc/guides/prog_guide/timer_lib.html
> > [2] http://dpdk.org/ml/archives/dev/2017-May/065207.html
> > 
> > An example code snippet to show the proposed API usage
> > ======================================================
> > example: TCP Retransmission in abstract form.
> > 
> > uint8_t
> > configure_event_dev(...)
> > {
> > 	/* Create the event device. */
> > 	const struct rte_event_dev_config config = {
> > 		.nb_event_queues = 1,
> > 		/* Event device related configuration. */
> > 		...
> > 	};
> > 
> > 	rte_event_dev_configure(event_dev_id, &config);
> > 	/* Event queue and port configuration. */
> > 	...
> > 	/* Start the event device.*/
> > 	rte_event_dev_start(event_dev_id);
> > }
> > 
> > #define NSECPERSEC	1E9 // No of ns for 1 sec
> > uint8_t
> > configure_event_timer_wheel(...)
> > {
> > 	/* Create an event timer wheel for reliable connections. */
> > 	const struct rte_event_timer_wheel_config wheel_config = {
> > 		.event_dev_id = event_dev_id,
> > 		.timer_wheel_id = 0,
> > 		.clk_src = RTE_EVENT_TIMER_WHEEL_CPU_CLK,
> > 		.timer_tick_ns = NSECPERSEC / 10, // 100 milliseconds
> > 		.max_tmo_nsec = 180 * NSECPERSEC // 2 minutes
> > 		.nb_timers = 40000, // Number of timers that the wheel can
> > hold.
> > 		.timer_wheel_flags = 0,
> > 	};
> > 	struct rte_event_timer_wheel *wheel = NULL;
> > 	wheel = rte_event_timer_wheel_create(&wheel_config);
> > 	if (wheel == NULL)
> > 	{
> > 		/* Failed to create event timer wheel. */
> > 		...
> > 		return false;
> > 
> > 	}
> > 	/* Start the event timer wheel. */
> > 	rte_event_timer_wheel_start(wheel);
> > 
> > 	/* Create a mempool of event timers. */
> > 	struct rte_mempool *event_timer_pool = NULL;
> > 
> > 	event_timer_pool =
> > rte_mempool_create("event_timer_mempool", SIZE,
> > 			sizeof(struct rte_event_timer), ...);
> > 	if (event_timer_pool == NULL)
> > 	{
> > 		/* Failed to create event timer mempool. */
> > 		...
> > 		return false;
> > 	}
> > }
> > 
> > 
> > uint8_t
> > process_tcp_data_packet(...)
> > {
> > 	/*Classify based on type*/
> > 	switch (...) {
> > 	case ...:
> > 		/* Setting up a new connection (Protocol dependent.) */
> > 		...
> > 		/* Setting up a new event timer. */
> > 		conn->timer = NULL
> > 		rte_mempool_get(event_timer_pool, (void **)&conn-
> > >timer);
> > 		if (timer == NULL) {
> > 			/* Failed to get event timer instance. */
> > 			/* Tear down the connection */
> > 			return false;
> > 		}
> > 
> > 		/* Set up the timer event. */
> > 		conn->timer->ev.u64 = conn;
> > 		conn->timer->ev.queue_id = event_queue_id;
> > 		...
> > 		/* All necessary resources successfully allocated */
> > 		/* Compute the timer timeout ticks */
> > 		conn->timer->timeout_ticks = 30; //3 sec Per RFC1122(TCP
> > returns)
> > 		/* Arm the timer with our timeout */
> > 		ret = rte_event_timer_arm_burst(wheel, &conn->timer, 1);
> > 		if (ret != 1) {
> > 			/* Check return value for too early or too late
> > expiration
> > 			 * tick */
> > 			...
> > 			return false;
> > 		}
> > 		return true;
> > 	case ...:
> > 		/* Ack for the previous tcp data packet has been received.*/
> > 		/* cancel the retransmission timer*/
> > 		rte_event_timer_cancel_burst(wheel, &conn->timer, 1);
> > 		break;
> > 	}
> > }
> > 
> > uint8_t
> > process_timer_event(...)
> > {
> > 	/* A retransmission timeout for the connection has been received.
> > */
> > 	conn = ev.event_ptr;
> > 	/* Retransmit last packet (e.g. TCP segment). */
> > 	...
> > 	/* Re-arm timer using original values. */
> > 	rte_event_timer_arm_burst(wheel_id, &conn->timer, 1); }
> > 
> > void
> > events_processing_loop(...)
> > {
> > 	while (...) {
> > 		/* Receive events from the configured event port. */
> > 		rte_event_dequeue_burst(event_dev_id, event_port, &ev,
> > 1, 0);
> > 		...
> > 		/* Classify events based on event_type. */
> > 		switch(ev.event_type) {
> > 			case RTE_EVENT_TYPE_ETHDEV:
> > 				...
> > 				process_packets(...);
> > 				break;
> > 			case RTE_EVENT_TYPE_TIMER:
> > 				process_timer_event(ev);
> > 				...
> > 				break;
> > 		}
> > 	}
> > }
> > 
> > int main()
> > {
> > 
> > 	configure_event_dev();
> > 	configure_event_timer_wheel();
> > 	on_each_worker_lcores(events_processing_loop())
> > }
> > 
> > Jerin Jacob (1):
> >   eventtimer: introduce event timer wheel
> > 
> >  doc/api/doxy-api-index.md                   |   3 +-
> >  lib/librte_eventdev/Makefile                |   1 +
> >  lib/librte_eventdev/rte_event_timer_wheel.h | 493
> > ++++++++++++++++++++++++++++
> >  lib/librte_eventdev/rte_eventdev.h          |   4 +-
> >  4 files changed, 498 insertions(+), 3 deletions(-)  create mode 100644
> > lib/librte_eventdev/rte_event_timer_wheel.h
> > 
> > --
> > 2.14.1
> 


More information about the dev mailing list