[dpdk-dev,1/4] eventdev: introduce event driven programming model

Message ID 1479447902-3700-2-git-send-email-jerin.jacob@caviumnetworks.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
checkpatch/checkpatch success coding style OK

Commit Message

Jerin Jacob Nov. 18, 2016, 5:44 a.m. UTC
  In a polling model, lcores poll ethdev ports and associated
rx queues directly to look for packet. In an event driven model,
by contrast, lcores call the scheduler that selects packets for
them based on programmer-specified criteria. Eventdev library
adds support for event driven programming model, which offer
applications automatic multicore scaling, dynamic load balancing,
pipelining, packet ingress order maintenance and
synchronization services to simplify application packet processing.

By introducing event driven programming model, DPDK can support
both polling and event driven programming models for packet processing,
and applications are free to choose whatever model
(or combination of the two) that best suits their needs.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 MAINTAINERS                        |    3 +
 doc/api/doxy-api-index.md          |    1 +
 doc/api/doxy-api.conf              |    1 +
 lib/librte_eventdev/rte_eventdev.h | 1439 ++++++++++++++++++++++++++++++++++++
 4 files changed, 1444 insertions(+)
 create mode 100644 lib/librte_eventdev/rte_eventdev.h
  

Comments

Thomas Monjalon Nov. 23, 2016, 6:39 p.m. UTC | #1
Hi Jerin,

Thanks for bringing a big new piece in DPDK.

I made some comments below.

2016-11-18 11:14, Jerin Jacob:
> +Eventdev API - EXPERIMENTAL
> +M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> +F: lib/librte_eventdev/

OK to mark it experimental.
What is the plan to remove the experimental word?

> + * RTE event device drivers do not use interrupts for enqueue or dequeue
> + * operation. Instead, Event drivers export Poll-Mode enqueue and dequeue
> + * functions to applications.

To the question "what makes DPDK different" it could be answered
that DPDK event drivers implement polling functions :)

> +#include <stdbool.h>
> +
> +#include <rte_pci.h>
> +#include <rte_dev.h>
> +#include <rte_memory.h>

Is it possible to remove some of these includes from the API?

> +
> +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
> +/**< Skeleton event device PMD name */

I do not understand this #define.
And it is not properly prefixed.

> +struct rte_event_dev_info {
> +	const char *driver_name;	/**< Event driver name */
> +	struct rte_pci_device *pci_dev;	/**< PCI information */

There is some work in progress to remove PCI information from ethdev.
Please do not add any PCI related structure in eventdev.
The generic structure is rte_device.

> +struct rte_event_dev_config {
> +	uint32_t dequeue_wait_ns;
> +	/**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this device.

Please explain exactly when the wait occurs and why.

> +	 * This value should be in the range of *min_dequeue_wait_ns* and
> +	 * *max_dequeue_wait_ns* which previously provided in
> +	 * rte_event_dev_info_get()
> +	 * \see RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT

I think the @see syntax would be more consistent than \see.

> +	uint8_t nb_event_port_dequeue_depth;
> +	/**< Number of dequeue queue depth for any event port on this device.

I think it deserves more explanations.

> +	uint32_t event_dev_cfg;
> +	/**< Event device config flags(RTE_EVENT_DEV_CFG_)*/

How this field differs from others in the struct?
Should it be named flags?

> +	uint32_t event_queue_cfg; /**< Queue config flags(EVENT_QUEUE_CFG_) */

Same comment about the naming of this field for event_queue config sruct.

> +/** Event port configuration structure */
> +struct rte_event_port_conf {
> +	int32_t new_event_threshold;
> +	/**< A backpressure threshold for new event enqueues on this port.
> +	 * Use for *closed system* event dev where event capacity is limited,
> +	 * and cannot exceed the capacity of the event dev.
> +	 * Configuring ports with different thresholds can make higher priority
> +	 * traffic less likely to  be backpressured.
> +	 * For example, a port used to inject NIC Rx packets into the event dev
> +	 * can have a lower threshold so as not to overwhelm the device,
> +	 * while ports used for worker pools can have a higher threshold.
> +	 * This value cannot exceed the *nb_events_limit*
> +	 * which previously supplied to rte_event_dev_configure()
> +	 */
> +	uint8_t dequeue_depth;
> +	/**< Configure number of bulk dequeues for this event port.
> +	 * This value cannot exceed the *nb_event_port_dequeue_depth*
> +	 * which previously supplied to rte_event_dev_configure()
> +	 */
> +	uint8_t enqueue_depth;
> +	/**< Configure number of bulk enqueues for this event port.
> +	 * This value cannot exceed the *nb_event_port_enqueue_depth*
> +	 * which previously supplied to rte_event_dev_configure()
> +	 */
> +};

The depth configuration is not clear to me.

> +/* Event types to classify the event source */

Why this classification is needed?

> +#define RTE_EVENT_TYPE_ETHDEV           0x0
> +/**< The event generated from ethdev subsystem */
> +#define RTE_EVENT_TYPE_CRYPTODEV        0x1
> +/**< The event generated from crypodev subsystem */
> +#define RTE_EVENT_TYPE_TIMERDEV         0x2
> +/**< The event generated from timerdev subsystem */
> +#define RTE_EVENT_TYPE_CORE             0x3
> +/**< The event generated from core.

What is core?

> +/* Event enqueue operations */

I feel a longer explanation is needed here to describe
what is an operation and where this data is useful.

> +#define RTE_EVENT_OP_NEW                0
> +/**< New event without previous context */
> +#define RTE_EVENT_OP_FORWARD            1
> +/**< Re-enqueue previously dequeued event */
> +#define RTE_EVENT_OP_RELEASE            2

There is no comment for the release operation.

> +/**
> + * Release the flow context associated with the schedule type.
> + *
[...]
> + */

There is no function declaration below this comment.

> +/**
> + * The generic *rte_event* structure to hold the event attributes
> + * for dequeue and enqueue operation
> + */
> +struct rte_event {
> +	/** WORD0 */
> +	RTE_STD_C11
> +	union {
> +		uint64_t event;
[...]
> +	};
> +	/** WORD1 */
> +	RTE_STD_C11
> +	union {
> +		uintptr_t event_ptr;

I wonder if it can be a problem to have the size of this field
not constant across machines.

> +		/**< Opaque event pointer */
> +		struct rte_mbuf *mbuf;
> +		/**< mbuf pointer if dequeued event is associated with mbuf */

How do we know that an event is associated with mbuf?
Does it mean that such events are always converted into mbuf even if the
application does not need it?

> +struct rte_eventdev_driver;
> +struct rte_eventdev_ops;

I think it is better to split API and driver interface in two files.
(we should do this split in ethdev)

> +/**
> + * Enqueue the event object supplied in the *rte_event* structure on an
> + * event device designated by its *dev_id* through the event port specified by
> + * *port_id*. The event object specifies the event queue on which this
> + * event will be enqueued.
> + *
> + * @param dev_id
> + *   Event device identifier.
> + * @param port_id
> + *   The identifier of the event port.
> + * @param ev
> + *   Pointer to struct rte_event
> + *
> + * @return
> + *  - 0 on success
> + *  - <0 on failure. Failure can occur if the event port's output queue is
> + *     backpressured, for instance.
> + */
> +static inline int
> +rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev)

Is it really needed to have non-burst variant of enqueue/dequeue?

> +/**
> + * Converts nanoseconds to *wait* value for rte_event_dequeue()
> + *
> + * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then
> + * application can use this function to convert wait value in nanoseconds to
> + * implementations specific wait value supplied in rte_event_dequeue()

Why is it implementation-specific?
Why this conversion is not internal in the driver?

End of review for this patch ;)
  
Jerin Jacob Nov. 24, 2016, 1:59 a.m. UTC | #2
On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> Hi Jerin,

Hi Thomas,

> 
> Thanks for bringing a big new piece in DPDK.
> 
> I made some comments below.

Thanks for the review.

> 
> 2016-11-18 11:14, Jerin Jacob:
> > +Eventdev API - EXPERIMENTAL
> > +M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > +F: lib/librte_eventdev/
> 
> OK to mark it experimental.
> What is the plan to remove the experimental word?

IMO, EXPERIMENTAL status can be changed when
- At least two event drivers available(Intel and Cavium are working on
  SW and HW event drivers)
- Functional test applications are fine with at least two drivers
- Portable example application to showcase the features of the library
- eventdev integration with another dpdk subsystem such as ethdev

Thoughts?. I am not sure the criteria used in cryptodev case.


> 
> > + * RTE event device drivers do not use interrupts for enqueue or dequeue
> > + * operation. Instead, Event drivers export Poll-Mode enqueue and dequeue
> > + * functions to applications.
> 
> To the question "what makes DPDK different" it could be answered
> that DPDK event drivers implement polling functions :)

Mostly taken from ethdev API header file :-)

> 
> > +#include <stdbool.h>
> > +
> > +#include <rte_pci.h>
> > +#include <rte_dev.h>
> > +#include <rte_memory.h>
> 
> Is it possible to remove some of these includes from the API?

OK. I will scan through all the header file and remove the not required
ones.

> 
> > +
> > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
> > +/**< Skeleton event device PMD name */
> 
> I do not understand this #define.

Applications can explicitly request the a specific driver though driver
name. This will go as argument to rte_event_dev_get_dev_id(const char *name).
The reason for keeping this #define in rte_eventdev.h is that,
application needs to include only rte_eventdev.h not rte_eventdev_pmd.h.

I will remove the definition from this patch and add this definition in
skeleton driver patch(patch 03/04)

> And it is not properly prefixed.

OK. I will prefix with RTE_ in v2.

> 
> > +struct rte_event_dev_info {
> > +	const char *driver_name;	/**< Event driver name */
> > +	struct rte_pci_device *pci_dev;	/**< PCI information */
> 
> There is some work in progress to remove PCI information from ethdev.
> Please do not add any PCI related structure in eventdev.
> The generic structure is rte_device.

OK. Makes sense. A grep of "rte_device" shows none of the subsystem
implemented yet and the work in progress. I will change to rte_device
when it is mainline. The skeleton eventdev driver based on PCI bus needs
this for the moment.


> 
> > +struct rte_event_dev_config {
> > +	uint32_t dequeue_wait_ns;
> > +	/**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this device.
> 
> Please explain exactly when the wait occurs and why.

Here is the explanation from rte_event_dequeue() API definition,
-
@param wait
0 - no-wait, returns immediately if there is no event.
>0 - wait for the event, if the device is configured with
RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until
the event available or *wait* time.
if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
then this function will wait until the event available or *dequeue_wait_ns*
                                                      ^^^^^^^^^^^^^^^^^^^^^^
ns which was previously supplied to rte_event_dev_configure()
-
This is provides the application to have control over, how long the
implementation should wait if event is not available.

Let me know what exact changes are required if details are not enough in
rte_event_dequeue() API definition.

> 
> > +	 * This value should be in the range of *min_dequeue_wait_ns* and
> > +	 * *max_dequeue_wait_ns* which previously provided in
> > +	 * rte_event_dev_info_get()
> > +	 * \see RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
> 
> I think the @see syntax would be more consistent than \see.

OK. I will change to @see

> 
> > +	uint8_t nb_event_port_dequeue_depth;
> > +	/**< Number of dequeue queue depth for any event port on this device.
> 
> I think it deserves more explanations.

see below

> 
> > +	uint32_t event_dev_cfg;
> > +	/**< Event device config flags(RTE_EVENT_DEV_CFG_)*/
> 
> How this field differs from others in the struct?
> Should it be named flags?

OK. I will change to flags

> 
> > +	uint32_t event_queue_cfg; /**< Queue config flags(EVENT_QUEUE_CFG_) */
> 
> Same comment about the naming of this field for event_queue config sruct.

OK. I will change to flags

> 
> > +/** Event port configuration structure */
> > +struct rte_event_port_conf {
> > +	int32_t new_event_threshold;
> > +	/**< A backpressure threshold for new event enqueues on this port.
> > +	 * Use for *closed system* event dev where event capacity is limited,
> > +	 * and cannot exceed the capacity of the event dev.
> > +	 * Configuring ports with different thresholds can make higher priority
> > +	 * traffic less likely to  be backpressured.
> > +	 * For example, a port used to inject NIC Rx packets into the event dev
> > +	 * can have a lower threshold so as not to overwhelm the device,
> > +	 * while ports used for worker pools can have a higher threshold.
> > +	 * This value cannot exceed the *nb_events_limit*
> > +	 * which previously supplied to rte_event_dev_configure()
> > +	 */
> > +	uint8_t dequeue_depth;
> > +	/**< Configure number of bulk dequeues for this event port.
> > +	 * This value cannot exceed the *nb_event_port_dequeue_depth*
> > +	 * which previously supplied to rte_event_dev_configure()
> > +	 */
> > +	uint8_t enqueue_depth;
> > +	/**< Configure number of bulk enqueues for this event port.
> > +	 * This value cannot exceed the *nb_event_port_enqueue_depth*
> > +	 * which previously supplied to rte_event_dev_configure()
> > +	 */
> > +};
> 
> The depth configuration is not clear to me.

Basically the maximum number of events can be enqueued/dequeued at time
from a given event port. depth of one == non burst mode.

> 
> > +/* Event types to classify the event source */
> 
> Why this classification is needed?

This for application pipeling and the cases like, if application wants to know which
subsystem generated the event.

example packet forwarding loop on the worker cores:
while(1) {
	ev = dequeue()
	// event from ethdev subsystem
	if (ev.event_type == RTE_EVENT_TYPE_ETHDEV) {
		- swap the mac address
		- push to atomic queue for ingress flow order maintenance
		  by CORE
	/* events from core */
	} else if (ev.event_type == RTE_EVENT_TYPE_CORE) {

	}
	enqueue(ev);
}

> 
> > +#define RTE_EVENT_TYPE_ETHDEV           0x0
> > +/**< The event generated from ethdev subsystem */
> > +#define RTE_EVENT_TYPE_CRYPTODEV        0x1
> > +/**< The event generated from crypodev subsystem */
> > +#define RTE_EVENT_TYPE_TIMERDEV         0x2
> > +/**< The event generated from timerdev subsystem */
> > +#define RTE_EVENT_TYPE_CORE             0x3
> > +/**< The event generated from core.
> 
> What is core?

The event are generated by lcore for pipeling. Any suggestion for
better name? lcore?

> 
> > +/* Event enqueue operations */
> 
> I feel a longer explanation is needed here to describe
> what is an operation and where this data is useful.

I will try to add it. The v1 has lengthy description for release
because it not self explanatory.

> 
> > +#define RTE_EVENT_OP_NEW                0
> > +/**< New event without previous context */
> > +#define RTE_EVENT_OP_FORWARD            1
> > +/**< Re-enqueue previously dequeued event */
> > +#define RTE_EVENT_OP_RELEASE            2
> 
> There is no comment for the release operation.

Its there. see next comment

> 
> > +/**
> > + * Release the flow context associated with the schedule type.
> > + *
> [...]
> > + */
> 
> There is no function declaration below this comment.

This comment was for previous RTE_EVENT_OP_RELEASE.I will fix the doxygen
formatting issue.

> 
> > +/**
> > + * The generic *rte_event* structure to hold the event attributes
> > + * for dequeue and enqueue operation
> > + */
> > +struct rte_event {
> > +	/** WORD0 */
> > +	RTE_STD_C11
> > +	union {
> > +		uint64_t event;
> [...]
> > +	};
> > +	/** WORD1 */
> > +	RTE_STD_C11
> > +	union {
> > +		uintptr_t event_ptr;
> 
> I wonder if it can be a problem to have the size of this field
> not constant across machines.

OK. May be I can make it as "uint64_t u64" to reserve space or I can
remove it.

> 
> > +		/**< Opaque event pointer */
> > +		struct rte_mbuf *mbuf;
> > +		/**< mbuf pointer if dequeued event is associated with mbuf */
> 
> How do we know that an event is associated with mbuf?

By looking at the event source/type RTE_EVENT_TYPE_*

> Does it mean that such events are always converted into mbuf even if the
> application does not need it?

Hardware has dependency on getting physical address of the event, so any
struct that has "phys_addr_t buf_physaddr" works.

> 
> > +struct rte_eventdev_driver;
> > +struct rte_eventdev_ops;
> 
> I think it is better to split API and driver interface in two files.
> (we should do this split in ethdev)

I thought so, but then the "static inline" versions of northbound
API(like rte_event_enqueue) will go another file(due to the fact that
implementation need to deference "dev->data->ports[port_id]"). Do you want that way?
I would like to keep all northbound API in rte_eventdev.h and not any of them
in rte_eventdev_pmd.h.

Any suggestions?

> 
> > +/**
> > + * Enqueue the event object supplied in the *rte_event* structure on an
> > + * event device designated by its *dev_id* through the event port specified by
> > + * *port_id*. The event object specifies the event queue on which this
> > + * event will be enqueued.
> > + *
> > + * @param dev_id
> > + *   Event device identifier.
> > + * @param port_id
> > + *   The identifier of the event port.
> > + * @param ev
> > + *   Pointer to struct rte_event
> > + *
> > + * @return
> > + *  - 0 on success
> > + *  - <0 on failure. Failure can occur if the event port's output queue is
> > + *     backpressured, for instance.
> > + */
> > +static inline int
> > +rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev)
> 
> Is it really needed to have non-burst variant of enqueue/dequeue?

Yes. certain HW can work only with non burst variants.
> 
> > +/**
> > + * Converts nanoseconds to *wait* value for rte_event_dequeue()
> > + *
> > + * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then
> > + * application can use this function to convert wait value in nanoseconds to
> > + * implementations specific wait value supplied in rte_event_dequeue()
> 
> Why is it implementation-specific?
> Why this conversion is not internal in the driver?

This is for performance optimization, otherwise in drivers
need to convert ns to ticks in "fast path"

> 
> End of review for this patch ;)
  
Bruce Richardson Nov. 24, 2016, 12:26 p.m. UTC | #3
On Thu, Nov 24, 2016 at 07:29:13AM +0530, Jerin Jacob wrote:
> On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:

Just some comments on mine triggered by Thomas comments?

<snip>
> > + */
> > > +static inline int
> > > +rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev)
> > 
> > Is it really needed to have non-burst variant of enqueue/dequeue?
> 
> Yes. certain HW can work only with non burst variants.

In those cases is it not acceptable just to have the dequeue_burst
function return 1 all the time? It would allow apps to be more portable
between burst and non-burst varients would it not.

> > 
> > > +/**
> > > + * Converts nanoseconds to *wait* value for rte_event_dequeue()
> > > + *
> > > + * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then
> > > + * application can use this function to convert wait value in nanoseconds to
> > > + * implementations specific wait value supplied in rte_event_dequeue()
> > 
> > Why is it implementation-specific?
> > Why this conversion is not internal in the driver?
> 
> This is for performance optimization, otherwise in drivers
> need to convert ns to ticks in "fast path"
> 
> > 
Is that really likely to be a performance bottleneck. I would expect
modern cores to fly through basic arithmetic in a negligable amount of
cycles?

/Bruce
  
Thomas Monjalon Nov. 24, 2016, 3:35 p.m. UTC | #4
2016-11-24 07:29, Jerin Jacob:
> On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > 2016-11-18 11:14, Jerin Jacob:
> > > +Eventdev API - EXPERIMENTAL
> > > +M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > +F: lib/librte_eventdev/
> > 
> > OK to mark it experimental.
> > What is the plan to remove the experimental word?
> 
> IMO, EXPERIMENTAL status can be changed when
> - At least two event drivers available(Intel and Cavium are working on
>   SW and HW event drivers)
> - Functional test applications are fine with at least two drivers
> - Portable example application to showcase the features of the library
> - eventdev integration with another dpdk subsystem such as ethdev
> 
> Thoughts?. I am not sure the criteria used in cryptodev case.

Sounds good.
We will be more confident when drivers and tests will be implemented.

I think the roadmap for the SW driver targets the release 17.05.
Do you still plan 17.02 for this API and the Cavium driver?

> > > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
> > > +/**< Skeleton event device PMD name */
> > 
> > I do not understand this #define.
> 
> Applications can explicitly request the a specific driver though driver
> name. This will go as argument to rte_event_dev_get_dev_id(const char *name).
> The reason for keeping this #define in rte_eventdev.h is that,
> application needs to include only rte_eventdev.h not rte_eventdev_pmd.h.

So each driver must register its name in the API?
Is it really needed?

> > > +struct rte_event_dev_config {
> > > +	uint32_t dequeue_wait_ns;
> > > +	/**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this device.
> > 
> > Please explain exactly when the wait occurs and why.
> 
> Here is the explanation from rte_event_dequeue() API definition,
> -
> @param wait
> 0 - no-wait, returns immediately if there is no event.
> >0 - wait for the event, if the device is configured with
> RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until
> the event available or *wait* time.
> if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
> then this function will wait until the event available or *dequeue_wait_ns*
>                                                       ^^^^^^^^^^^^^^^^^^^^^^
> ns which was previously supplied to rte_event_dev_configure()
> -
> This is provides the application to have control over, how long the
> implementation should wait if event is not available.
> 
> Let me know what exact changes are required if details are not enough in
> rte_event_dequeue() API definition.

Maybe that timeout would be a better name.
It waits only if there is nothing in the queue.
It can be interesting to highlight in this comment that this parameter
makes the dequeue function a blocking call.

> > > +/** Event port configuration structure */
> > > +struct rte_event_port_conf {
> > > +	int32_t new_event_threshold;
> > > +	/**< A backpressure threshold for new event enqueues on this port.
> > > +	 * Use for *closed system* event dev where event capacity is limited,
> > > +	 * and cannot exceed the capacity of the event dev.
> > > +	 * Configuring ports with different thresholds can make higher priority
> > > +	 * traffic less likely to  be backpressured.
> > > +	 * For example, a port used to inject NIC Rx packets into the event dev
> > > +	 * can have a lower threshold so as not to overwhelm the device,
> > > +	 * while ports used for worker pools can have a higher threshold.
> > > +	 * This value cannot exceed the *nb_events_limit*
> > > +	 * which previously supplied to rte_event_dev_configure()
> > > +	 */
> > > +	uint8_t dequeue_depth;
> > > +	/**< Configure number of bulk dequeues for this event port.
> > > +	 * This value cannot exceed the *nb_event_port_dequeue_depth*
> > > +	 * which previously supplied to rte_event_dev_configure()
> > > +	 */
> > > +	uint8_t enqueue_depth;
> > > +	/**< Configure number of bulk enqueues for this event port.
> > > +	 * This value cannot exceed the *nb_event_port_enqueue_depth*
> > > +	 * which previously supplied to rte_event_dev_configure()
> > > +	 */
> > > +};
> > 
> > The depth configuration is not clear to me.
> 
> Basically the maximum number of events can be enqueued/dequeued at time
> from a given event port. depth of one == non burst mode.

OK so depth is the queue size. Please could you reword?

> > > +/* Event types to classify the event source */
> > 
> > Why this classification is needed?
> 
> This for application pipeling and the cases like, if application wants to know which
> subsystem generated the event.
> 
> example packet forwarding loop on the worker cores:
> while(1) {
> 	ev = dequeue()
> 	// event from ethdev subsystem
> 	if (ev.event_type == RTE_EVENT_TYPE_ETHDEV) {
> 		- swap the mac address
> 		- push to atomic queue for ingress flow order maintenance
> 		  by CORE
> 	/* events from core */
> 	} else if (ev.event_type == RTE_EVENT_TYPE_CORE) {
> 
> 	}
> 	enqueue(ev);
> }

I don't know why but I feel this classification is weak.
You need to track the source of the event. Does it make sense to go beyond
and identify the source device?

> > > +#define RTE_EVENT_TYPE_ETHDEV           0x0
> > > +/**< The event generated from ethdev subsystem */
> > > +#define RTE_EVENT_TYPE_CRYPTODEV        0x1
> > > +/**< The event generated from crypodev subsystem */
> > > +#define RTE_EVENT_TYPE_TIMERDEV         0x2
> > > +/**< The event generated from timerdev subsystem */
> > > +#define RTE_EVENT_TYPE_CORE             0x3
> > > +/**< The event generated from core.
> > 
> > What is core?
> 
> The event are generated by lcore for pipeling. Any suggestion for
> better name? lcore?

What about CPU or SW?

> > > +		/**< Opaque event pointer */
> > > +		struct rte_mbuf *mbuf;
> > > +		/**< mbuf pointer if dequeued event is associated with mbuf */
> > 
> > How do we know that an event is associated with mbuf?
> 
> By looking at the event source/type RTE_EVENT_TYPE_*
> 
> > Does it mean that such events are always converted into mbuf even if the
> > application does not need it?
> 
> Hardware has dependency on getting physical address of the event, so any
> struct that has "phys_addr_t buf_physaddr" works.

I do not understand.

I tought that decoding the event would be the responsibility of the app
by calling a function like
rte_eventdev_convert_to_mbuf(struct rte_event *, struct rte_mbuf *).

> > > +struct rte_eventdev_driver;
> > > +struct rte_eventdev_ops;
> > 
> > I think it is better to split API and driver interface in two files.
> > (we should do this split in ethdev)
> 
> I thought so, but then the "static inline" versions of northbound
> API(like rte_event_enqueue) will go another file(due to the fact that
> implementation need to deference "dev->data->ports[port_id]"). Do you want that way?
> I would like to keep all northbound API in rte_eventdev.h and not any of them
> in rte_eventdev_pmd.h.

My comment was confusing.
You are doing 2 files, one for API (what you call northbound I think)
and the other one for driver interface (what you call southbound I think),
it's very fine.

> > > +/**
> > > + * Enqueue the event object supplied in the *rte_event* structure on an
> > > + * event device designated by its *dev_id* through the event port specified by
> > > + * *port_id*. The event object specifies the event queue on which this
> > > + * event will be enqueued.
> > > + *
> > > + * @param dev_id
> > > + *   Event device identifier.
> > > + * @param port_id
> > > + *   The identifier of the event port.
> > > + * @param ev
> > > + *   Pointer to struct rte_event
> > > + *
> > > + * @return
> > > + *  - 0 on success
> > > + *  - <0 on failure. Failure can occur if the event port's output queue is
> > > + *     backpressured, for instance.
> > > + */
> > > +static inline int
> > > +rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev)
> > 
> > Is it really needed to have non-burst variant of enqueue/dequeue?
> 
> Yes. certain HW can work only with non burst variants.

Same comment as Bruce, we must keep only the burst variant.
We cannot have different API for different HW.

> > > +/**
> > > + * Converts nanoseconds to *wait* value for rte_event_dequeue()
> > > + *
> > > + * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then
> > > + * application can use this function to convert wait value in nanoseconds to
> > > + * implementations specific wait value supplied in rte_event_dequeue()
> > 
> > Why is it implementation-specific?
> > Why this conversion is not internal in the driver?
> 
> This is for performance optimization, otherwise in drivers
> need to convert ns to ticks in "fast path"

So why not defining the unit of this timeout as CPU cycles like the ones
returned by rte_get_timer_cycles()?
  
Bruce Richardson Nov. 24, 2016, 4:24 p.m. UTC | #5
On Fri, Nov 18, 2016 at 11:14:59AM +0530, Jerin Jacob wrote:
> In a polling model, lcores poll ethdev ports and associated
> rx queues directly to look for packet. In an event driven model,
> by contrast, lcores call the scheduler that selects packets for
> them based on programmer-specified criteria. Eventdev library
> adds support for event driven programming model, which offer
> applications automatic multicore scaling, dynamic load balancing,
> pipelining, packet ingress order maintenance and
> synchronization services to simplify application packet processing.
> 
> By introducing event driven programming model, DPDK can support
> both polling and event driven programming models for packet processing,
> and applications are free to choose whatever model
> (or combination of the two) that best suits their needs.
> 
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---

Hi Jerin,

Thanks for the patchset. A few minor comments in general on the API that
we found from working with it (thus far - more may follow :-) ).

1. Priorities: priorities are used in a number of places in the API, but
   all are uint8_t types and have their own MAX/NORMAL/MIN values. I think
   it would be simpler for the user just to have one priority type in the
   library, and use that everywhere. I suggest using RTE_EVENT_PRIORITY_*
   and drop the separate defines for SERVICE_PRIORITY, and QUEUE_PRIORITY
   etc. Ideally, I'd see things like this converted to enums too, rather
   than defines, but I'm not sure it's possible in this case.

2. Functions for config and setup can have their structure parameter
   types as const as they don't/shouldn't change the values internally.
   So add "const" to parameters to:
     rte_event_dev_configure()
     rte_event_queue_setup()
     rte_event_port_setup()
     rte_event_port_link()

3. in event schedule() function, the dev->schedule() function needs the
   dev instance pointer passed in as parameter.

4. The event op values and the event type values would be better as
   enums rather than as a set of #defines.

Regards,
/Bruce
  
Jerin Jacob Nov. 24, 2016, 7:30 p.m. UTC | #6
On Thu, Nov 24, 2016 at 04:24:11PM +0000, Bruce Richardson wrote:
> On Fri, Nov 18, 2016 at 11:14:59AM +0530, Jerin Jacob wrote:
> > In a polling model, lcores poll ethdev ports and associated
> > rx queues directly to look for packet. In an event driven model,
> > by contrast, lcores call the scheduler that selects packets for
> > them based on programmer-specified criteria. Eventdev library
> > adds support for event driven programming model, which offer
> > applications automatic multicore scaling, dynamic load balancing,
> > pipelining, packet ingress order maintenance and
> > synchronization services to simplify application packet processing.
> > 
> > By introducing event driven programming model, DPDK can support
> > both polling and event driven programming models for packet processing,
> > and applications are free to choose whatever model
> > (or combination of the two) that best suits their needs.
> > 
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> 
> Hi Jerin,
> 
> Thanks for the patchset. A few minor comments in general on the API that
> we found from working with it (thus far - more may follow :-) ).

Thanks Bruce.

> 
> 1. Priorities: priorities are used in a number of places in the API, but
>    all are uint8_t types and have their own MAX/NORMAL/MIN values. I think
>    it would be simpler for the user just to have one priority type in the
>    library, and use that everywhere. I suggest using RTE_EVENT_PRIORITY_*
>    and drop the separate defines for SERVICE_PRIORITY, and QUEUE_PRIORITY
>    etc. Ideally, I'd see things like this converted to enums too, rather
>    than defines, but I'm not sure it's possible in this case.

OK. I will address it in v2

> 
> 2. Functions for config and setup can have their structure parameter
>    types as const as they don't/shouldn't change the values internally.
>    So add "const" to parameters to:
>      rte_event_dev_configure()
>      rte_event_queue_setup()
>      rte_event_port_setup()
>      rte_event_port_link()
> 

OK. I will address it in v2

> 3. in event schedule() function, the dev->schedule() function needs the
>    dev instance pointer passed in as parameter.

OK. I will address it in v2

> 
> 4. The event op values and the event type values would be better as
>    enums rather than as a set of #defines.

OK. I will address it in v2

I will reply to your other comments in Thomas's email.

> 
> Regards,
> /Bruce
  
Jerin Jacob Nov. 25, 2016, 12:23 a.m. UTC | #7
On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> 2016-11-24 07:29, Jerin Jacob:
> > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > 2016-11-18 11:14, Jerin Jacob:
> > > > +Eventdev API - EXPERIMENTAL
> > > > +M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > +F: lib/librte_eventdev/
> > > 
> > > OK to mark it experimental.
> > > What is the plan to remove the experimental word?
> > 
> > IMO, EXPERIMENTAL status can be changed when
> > - At least two event drivers available(Intel and Cavium are working on
> >   SW and HW event drivers)
> > - Functional test applications are fine with at least two drivers
> > - Portable example application to showcase the features of the library
> > - eventdev integration with another dpdk subsystem such as ethdev
> > 
> > Thoughts?. I am not sure the criteria used in cryptodev case.
> 
> Sounds good.
> We will be more confident when drivers and tests will be implemented.
> 
> I think the roadmap for the SW driver targets the release 17.05.
> Do you still plan 17.02 for this API and the Cavium driver?

No. 17.02 too short for up-streaming the Cavium driver.However, I think API and
skeleton event driver can go in 17.02 if there are no objections.

> 
> > > > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
> > > > +/**< Skeleton event device PMD name */
> > > 
> > > I do not understand this #define.
> > 
> > Applications can explicitly request the a specific driver though driver
> > name. This will go as argument to rte_event_dev_get_dev_id(const char *name).
> > The reason for keeping this #define in rte_eventdev.h is that,
> > application needs to include only rte_eventdev.h not rte_eventdev_pmd.h.
> 
> So each driver must register its name in the API?
> Is it really needed?

Otherwise how application knows the name of the driver.
The similar scheme used in cryptodev.
http://dpdk.org/browse/dpdk/tree/lib/librte_cryptodev/rte_cryptodev.h#n53
No strong opinion here. Open for suggestions.

> 
> > > > +struct rte_event_dev_config {
> > > > +	uint32_t dequeue_wait_ns;
> > > > +	/**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this device.
> > > 
> > > Please explain exactly when the wait occurs and why.
> > 
> > Here is the explanation from rte_event_dequeue() API definition,
> > -
> > @param wait
> > 0 - no-wait, returns immediately if there is no event.
> > >0 - wait for the event, if the device is configured with
> > RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until
> > the event available or *wait* time.
> > if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
> > then this function will wait until the event available or *dequeue_wait_ns*
> >                                                       ^^^^^^^^^^^^^^^^^^^^^^
> > ns which was previously supplied to rte_event_dev_configure()
> > -
> > This is provides the application to have control over, how long the
> > implementation should wait if event is not available.
> > 
> > Let me know what exact changes are required if details are not enough in
> > rte_event_dequeue() API definition.
> 
> Maybe that timeout would be a better name.
> It waits only if there is nothing in the queue.
> It can be interesting to highlight in this comment that this parameter
> makes the dequeue function a blocking call.

OK. I will change to timeout then

> 
> > > > +/** Event port configuration structure */
> > > > +struct rte_event_port_conf {
> > > > +	int32_t new_event_threshold;
> > > > +	/**< A backpressure threshold for new event enqueues on this port.
> > > > +	 * Use for *closed system* event dev where event capacity is limited,
> > > > +	 * and cannot exceed the capacity of the event dev.
> > > > +	 * Configuring ports with different thresholds can make higher priority
> > > > +	 * traffic less likely to  be backpressured.
> > > > +	 * For example, a port used to inject NIC Rx packets into the event dev
> > > > +	 * can have a lower threshold so as not to overwhelm the device,
> > > > +	 * while ports used for worker pools can have a higher threshold.
> > > > +	 * This value cannot exceed the *nb_events_limit*
> > > > +	 * which previously supplied to rte_event_dev_configure()
> > > > +	 */
> > > > +	uint8_t dequeue_depth;
> > > > +	/**< Configure number of bulk dequeues for this event port.
> > > > +	 * This value cannot exceed the *nb_event_port_dequeue_depth*
> > > > +	 * which previously supplied to rte_event_dev_configure()
> > > > +	 */
> > > > +	uint8_t enqueue_depth;
> > > > +	/**< Configure number of bulk enqueues for this event port.
> > > > +	 * This value cannot exceed the *nb_event_port_enqueue_depth*
> > > > +	 * which previously supplied to rte_event_dev_configure()
> > > > +	 */
> > > > +};
> > > 
> > > The depth configuration is not clear to me.
> > 
> > Basically the maximum number of events can be enqueued/dequeued at time
> > from a given event port. depth of one == non burst mode.
> 
> OK so depth is the queue size. Please could you reword?

OK

> 
> > > > +/* Event types to classify the event source */
> > > 
> > > Why this classification is needed?
> > 
> > This for application pipeling and the cases like, if application wants to know which
> > subsystem generated the event.
> > 
> > example packet forwarding loop on the worker cores:
> > while(1) {
> > 	ev = dequeue()
> > 	// event from ethdev subsystem
> > 	if (ev.event_type == RTE_EVENT_TYPE_ETHDEV) {
> > 		- swap the mac address
> > 		- push to atomic queue for ingress flow order maintenance
> > 		  by CORE
> > 	/* events from core */
> > 	} else if (ev.event_type == RTE_EVENT_TYPE_CORE) {
> > 
> > 	}
> > 	enqueue(ev);
> > }
> 
> I don't know why but I feel this classification is weak.
> You need to track the source of the event. Does it make sense to go beyond
> and identify the source device?

No, dequeue has dev_id argument, so event comes only from that device

> 
> > > > +#define RTE_EVENT_TYPE_ETHDEV           0x0
> > > > +/**< The event generated from ethdev subsystem */
> > > > +#define RTE_EVENT_TYPE_CRYPTODEV        0x1
> > > > +/**< The event generated from crypodev subsystem */
> > > > +#define RTE_EVENT_TYPE_TIMERDEV         0x2
> > > > +/**< The event generated from timerdev subsystem */
> > > > +#define RTE_EVENT_TYPE_CORE             0x3
> > > > +/**< The event generated from core.
> > > 
> > > What is core?
> > 
> > The event are generated by lcore for pipeling. Any suggestion for
> > better name? lcore?
> 
> What about CPU or SW?

No strong opinion here. I will go with CPU then

> 
> > > > +		/**< Opaque event pointer */
> > > > +		struct rte_mbuf *mbuf;
> > > > +		/**< mbuf pointer if dequeued event is associated with mbuf */
> > > 
> > > How do we know that an event is associated with mbuf?
> > 
> > By looking at the event source/type RTE_EVENT_TYPE_*
> > 
> > > Does it mean that such events are always converted into mbuf even if the
> > > application does not need it?
> > 
> > Hardware has dependency on getting physical address of the event, so any
> > struct that has "phys_addr_t buf_physaddr" works.
> 
> I do not understand.

In HW based implementations, the event pointer will be submitted to HW.
As you know, since HW can't understand the virtual address and it needs
to converted to the physical address, any DPDK object that provides phys_addr_t
such as mbuf can be used with libeventdev.

> 
> I tought that decoding the event would be the responsibility of the app
> by calling a function like
> rte_eventdev_convert_to_mbuf(struct rte_event *, struct rte_mbuf *).

It can be. But it is costly.i.e Yet another function pointer based
driver interface on fastpath. Instead, if the driver itself can
convert to mbuf(in case of ETHDEV device) and tag the source/event type
as RTE_EVENT_TYPE_ETHDEV.
IMO the proposed schemed helps in SW based implementation as their no real
mbuf conversation. Something we can revisit in ethdev integration if
required.

> 
> > > > +struct rte_eventdev_driver;
> > > > +struct rte_eventdev_ops;
> > > 
> > > I think it is better to split API and driver interface in two files.
> > > (we should do this split in ethdev)
> > 
> > I thought so, but then the "static inline" versions of northbound
> > API(like rte_event_enqueue) will go another file(due to the fact that
> > implementation need to deference "dev->data->ports[port_id]"). Do you want that way?
> > I would like to keep all northbound API in rte_eventdev.h and not any of them
> > in rte_eventdev_pmd.h.
> 
> My comment was confusing.
> You are doing 2 files, one for API (what you call northbound I think)
> and the other one for driver interface (what you call southbound I think),
> it's very fine.
> 
> > > > +/**
> > > > + * Enqueue the event object supplied in the *rte_event* structure on an
> > > > + * event device designated by its *dev_id* through the event port specified by
> > > > + * *port_id*. The event object specifies the event queue on which this
> > > > + * event will be enqueued.
> > > > + *
> > > > + * @param dev_id
> > > > + *   Event device identifier.
> > > > + * @param port_id
> > > > + *   The identifier of the event port.
> > > > + * @param ev
> > > > + *   Pointer to struct rte_event
> > > > + *
> > > > + * @return
> > > > + *  - 0 on success
> > > > + *  - <0 on failure. Failure can occur if the event port's output queue is
> > > > + *     backpressured, for instance.
> > > > + */
> > > > +static inline int
> > > > +rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev)
> > > 
> > > Is it really needed to have non-burst variant of enqueue/dequeue?
> > 
> > Yes. certain HW can work only with non burst variants.
> 
> Same comment as Bruce, we must keep only the burst variant.
> We cannot have different API for different HW.

I don't think there is any portability issue here, I can explain.

The application level, we have two more use case to deal with non burst
variant

- latency critical work
- on dequeue, if application wants to deal with only one flow(i.e to
  avoid processing two different application flows to avoid cache trashing)

Selection of the burst variants will be based on
rte_event_dev_info_get() and rte_event_dev_configure()(see, max_event_port_dequeue_depth,
max_event_port_enqueue_depth, nb_event_port_dequeue_depth, nb_event_port_enqueue_depth )
So I don't think their is portability issue here and I don't want to waste my
CPU cycles on the for loop if application known to be working with non
bursts variant like below

nb_events = rte_event_dequeue_burst();
for(i=0; i < nb_events; i++){
	process ev[i]
}

And mostly importantly the NPU can get almost same throughput
without burst variant so why not?

> 
> > > > +/**
> > > > + * Converts nanoseconds to *wait* value for rte_event_dequeue()
> > > > + *
> > > > + * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then
> > > > + * application can use this function to convert wait value in nanoseconds to
> > > > + * implementations specific wait value supplied in rte_event_dequeue()
> > > 
> > > Why is it implementation-specific?
> > > Why this conversion is not internal in the driver?
> > 
> > This is for performance optimization, otherwise in drivers
> > need to convert ns to ticks in "fast path"
> 
> So why not defining the unit of this timeout as CPU cycles like the ones
> returned by rte_get_timer_cycles()?

Because HW co-processor can run in different clock domain. Need not be at
CPU frequency.

> 
>
  
Bruce Richardson Nov. 25, 2016, 11 a.m. UTC | #8
On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote:
> On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > 2016-11-24 07:29, Jerin Jacob:
> > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > 2016-11-18 11:14, Jerin Jacob:
> > > > > +Eventdev API - EXPERIMENTAL
> > > > > +M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > > +F: lib/librte_eventdev/
> > > > 
> > > > OK to mark it experimental.
> > > > What is the plan to remove the experimental word?
> > > 
> > > IMO, EXPERIMENTAL status can be changed when
> > > - At least two event drivers available(Intel and Cavium are working on
> > >   SW and HW event drivers)
> > > - Functional test applications are fine with at least two drivers
> > > - Portable example application to showcase the features of the library
> > > - eventdev integration with another dpdk subsystem such as ethdev
> > > 
> > > Thoughts?. I am not sure the criteria used in cryptodev case.
> > 
> > Sounds good.
> > We will be more confident when drivers and tests will be implemented.
> > 
> > I think the roadmap for the SW driver targets the release 17.05.
> > Do you still plan 17.02 for this API and the Cavium driver?
> 
> No. 17.02 too short for up-streaming the Cavium driver.However, I think API and
> skeleton event driver can go in 17.02 if there are no objections.
> 
> > 
> > > > > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
> > > > > +/**< Skeleton event device PMD name */
> > > > 
> > > > I do not understand this #define.
> > > 
> > > Applications can explicitly request the a specific driver though driver
> > > name. This will go as argument to rte_event_dev_get_dev_id(const char *name).
> > > The reason for keeping this #define in rte_eventdev.h is that,
> > > application needs to include only rte_eventdev.h not rte_eventdev_pmd.h.
> > 
> > So each driver must register its name in the API?
> > Is it really needed?
> 
> Otherwise how application knows the name of the driver.
> The similar scheme used in cryptodev.
> http://dpdk.org/browse/dpdk/tree/lib/librte_cryptodev/rte_cryptodev.h#n53
> No strong opinion here. Open for suggestions.
> 

I like having a name registered. I think we need a scheme where an app
can find and use an implementation using a specific driver.

> > 
> > > > > +struct rte_event_dev_config {
> > > > > +	uint32_t dequeue_wait_ns;
> > > > > +	/**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this device.
> > > > 
> > > > Please explain exactly when the wait occurs and why.
> > > 
> > > Here is the explanation from rte_event_dequeue() API definition,
> > > -
> > > @param wait
> > > 0 - no-wait, returns immediately if there is no event.
> > > >0 - wait for the event, if the device is configured with
> > > RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until
> > > the event available or *wait* time.
> > > if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
> > > then this function will wait until the event available or *dequeue_wait_ns*
> > >                                                       ^^^^^^^^^^^^^^^^^^^^^^
> > > ns which was previously supplied to rte_event_dev_configure()
> > > -
> > > This is provides the application to have control over, how long the
> > > implementation should wait if event is not available.
> > > 
> > > Let me know what exact changes are required if details are not enough in
> > > rte_event_dequeue() API definition.
> > 
> > Maybe that timeout would be a better name.
> > It waits only if there is nothing in the queue.
> > It can be interesting to highlight in this comment that this parameter
> > makes the dequeue function a blocking call.
> 
> OK. I will change to timeout then
> 
> > 
> > > > > +/** Event port configuration structure */
> > > > > +struct rte_event_port_conf {
> > > > > +	int32_t new_event_threshold;
> > > > > +	/**< A backpressure threshold for new event enqueues on this port.
> > > > > +	 * Use for *closed system* event dev where event capacity is limited,
> > > > > +	 * and cannot exceed the capacity of the event dev.
> > > > > +	 * Configuring ports with different thresholds can make higher priority
> > > > > +	 * traffic less likely to  be backpressured.
> > > > > +	 * For example, a port used to inject NIC Rx packets into the event dev
> > > > > +	 * can have a lower threshold so as not to overwhelm the device,
> > > > > +	 * while ports used for worker pools can have a higher threshold.
> > > > > +	 * This value cannot exceed the *nb_events_limit*
> > > > > +	 * which previously supplied to rte_event_dev_configure()
> > > > > +	 */
> > > > > +	uint8_t dequeue_depth;
> > > > > +	/**< Configure number of bulk dequeues for this event port.
> > > > > +	 * This value cannot exceed the *nb_event_port_dequeue_depth*
> > > > > +	 * which previously supplied to rte_event_dev_configure()
> > > > > +	 */
> > > > > +	uint8_t enqueue_depth;
> > > > > +	/**< Configure number of bulk enqueues for this event port.
> > > > > +	 * This value cannot exceed the *nb_event_port_enqueue_depth*
> > > > > +	 * which previously supplied to rte_event_dev_configure()
> > > > > +	 */
> > > > > +};
> > > > 
> > > > The depth configuration is not clear to me.
> > > 
> > > Basically the maximum number of events can be enqueued/dequeued at time
> > > from a given event port. depth of one == non burst mode.
> > 
> > OK so depth is the queue size. Please could you reword?
> 
> OK
> 
> > 
> > > > > +/* Event types to classify the event source */
> > > > 
> > > > Why this classification is needed?
> > > 
> > > This for application pipeling and the cases like, if application wants to know which
> > > subsystem generated the event.
> > > 
> > > example packet forwarding loop on the worker cores:
> > > while(1) {
> > > 	ev = dequeue()
> > > 	// event from ethdev subsystem
> > > 	if (ev.event_type == RTE_EVENT_TYPE_ETHDEV) {
> > > 		- swap the mac address
> > > 		- push to atomic queue for ingress flow order maintenance
> > > 		  by CORE
> > > 	/* events from core */
> > > 	} else if (ev.event_type == RTE_EVENT_TYPE_CORE) {
> > > 
> > > 	}
> > > 	enqueue(ev);
> > > }
> > 
> > I don't know why but I feel this classification is weak.
> > You need to track the source of the event. Does it make sense to go beyond
> > and identify the source device?
> 
> No, dequeue has dev_id argument, so event comes only from that device
> 
> > 
> > > > > +#define RTE_EVENT_TYPE_ETHDEV           0x0
> > > > > +/**< The event generated from ethdev subsystem */
> > > > > +#define RTE_EVENT_TYPE_CRYPTODEV        0x1
> > > > > +/**< The event generated from crypodev subsystem */
> > > > > +#define RTE_EVENT_TYPE_TIMERDEV         0x2
> > > > > +/**< The event generated from timerdev subsystem */
> > > > > +#define RTE_EVENT_TYPE_CORE             0x3
> > > > > +/**< The event generated from core.
> > > > 
> > > > What is core?
> > > 
> > > The event are generated by lcore for pipeling. Any suggestion for
> > > better name? lcore?
> > 
> > What about CPU or SW?
> 
> No strong opinion here. I will go with CPU then

If you have no strong opinion, I think I'd prefer SW to CPU, as the main
difference to my mind is that this comes from another SW entity rather
than a hardware block.

> 
> > 
> > > > > +		/**< Opaque event pointer */
> > > > > +		struct rte_mbuf *mbuf;
> > > > > +		/**< mbuf pointer if dequeued event is associated with mbuf */
> > > > 
> > > > How do we know that an event is associated with mbuf?
> > > 
> > > By looking at the event source/type RTE_EVENT_TYPE_*
> > > 
> > > > Does it mean that such events are always converted into mbuf even if the
> > > > application does not need it?
> > > 
> > > Hardware has dependency on getting physical address of the event, so any
> > > struct that has "phys_addr_t buf_physaddr" works.
> > 
> > I do not understand.
> 
> In HW based implementations, the event pointer will be submitted to HW.
> As you know, since HW can't understand the virtual address and it needs
> to converted to the physical address, any DPDK object that provides phys_addr_t
> such as mbuf can be used with libeventdev.
> 
> > 
> > I tought that decoding the event would be the responsibility of the app
> > by calling a function like
> > rte_eventdev_convert_to_mbuf(struct rte_event *, struct rte_mbuf *).
> 
> It can be. But it is costly.i.e Yet another function pointer based
> driver interface on fastpath. Instead, if the driver itself can
> convert to mbuf(in case of ETHDEV device) and tag the source/event type
> as RTE_EVENT_TYPE_ETHDEV.
> IMO the proposed schemed helps in SW based implementation as their no real
> mbuf conversation. Something we can revisit in ethdev integration if
> required.
> 
> > 
> > > > > +struct rte_eventdev_driver;
> > > > > +struct rte_eventdev_ops;
> > > > 
> > > > I think it is better to split API and driver interface in two files.
> > > > (we should do this split in ethdev)
> > > 
> > > I thought so, but then the "static inline" versions of northbound
> > > API(like rte_event_enqueue) will go another file(due to the fact that
> > > implementation need to deference "dev->data->ports[port_id]"). Do you want that way?
> > > I would like to keep all northbound API in rte_eventdev.h and not any of them
> > > in rte_eventdev_pmd.h.
> > 
> > My comment was confusing.
> > You are doing 2 files, one for API (what you call northbound I think)
> > and the other one for driver interface (what you call southbound I think),
> > it's very fine.
> > 
> > > > > +/**
> > > > > + * Enqueue the event object supplied in the *rte_event* structure on an
> > > > > + * event device designated by its *dev_id* through the event port specified by
> > > > > + * *port_id*. The event object specifies the event queue on which this
> > > > > + * event will be enqueued.
> > > > > + *
> > > > > + * @param dev_id
> > > > > + *   Event device identifier.
> > > > > + * @param port_id
> > > > > + *   The identifier of the event port.
> > > > > + * @param ev
> > > > > + *   Pointer to struct rte_event
> > > > > + *
> > > > > + * @return
> > > > > + *  - 0 on success
> > > > > + *  - <0 on failure. Failure can occur if the event port's output queue is
> > > > > + *     backpressured, for instance.
> > > > > + */
> > > > > +static inline int
> > > > > +rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev)
> > > > 
> > > > Is it really needed to have non-burst variant of enqueue/dequeue?
> > > 
> > > Yes. certain HW can work only with non burst variants.
> > 
> > Same comment as Bruce, we must keep only the burst variant.
> > We cannot have different API for different HW.
> 
> I don't think there is any portability issue here, I can explain.
> 
> The application level, we have two more use case to deal with non burst
> variant
> 
> - latency critical work
> - on dequeue, if application wants to deal with only one flow(i.e to
>   avoid processing two different application flows to avoid cache trashing)
> 
> Selection of the burst variants will be based on
> rte_event_dev_info_get() and rte_event_dev_configure()(see, max_event_port_dequeue_depth,
> max_event_port_enqueue_depth, nb_event_port_dequeue_depth, nb_event_port_enqueue_depth )
> So I don't think their is portability issue here and I don't want to waste my
> CPU cycles on the for loop if application known to be working with non
> bursts variant like below
> 

If the application is known to be working on non-burst varients, then
they always request a burst-size of 1, and skip the loop completely.
There is no extra performance hit in that case in either the app or the
driver (since the non-burst driver always returns 1, irrespective of the
number requested).

> nb_events = rte_event_dequeue_burst();
> for(i=0; i < nb_events; i++){
> 	process ev[i]
> }
> 
> And mostly importantly the NPU can get almost same throughput
> without burst variant so why not?
> 
> > 
> > > > > +/**
> > > > > + * Converts nanoseconds to *wait* value for rte_event_dequeue()
> > > > > + *
> > > > > + * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then
> > > > > + * application can use this function to convert wait value in nanoseconds to
> > > > > + * implementations specific wait value supplied in rte_event_dequeue()
> > > > 
> > > > Why is it implementation-specific?
> > > > Why this conversion is not internal in the driver?
> > > 
> > > This is for performance optimization, otherwise in drivers
> > > need to convert ns to ticks in "fast path"
> > 
> > So why not defining the unit of this timeout as CPU cycles like the ones
> > returned by rte_get_timer_cycles()?
> 
> Because HW co-processor can run in different clock domain. Need not be at
> CPU frequency.
> 
While I've no huge objection to this API, since it will not be
implemented by our SW implementation, I'm just curious as to how much
having this will save. How complicated is the arithmetic that needs to
be done, and how many cycles on your platform is that going to take?

/Bruce
  
Van Haaren, Harry Nov. 25, 2016, 11:59 a.m. UTC | #9
Hi All,

> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Friday, November 25, 2016 12:24 AM
> To: Thomas Monjalon <thomas.monjalon@6wind.com>
> Cc: dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>; Van Haaren, Harry
> <harry.van.haaren@intel.com>; hemant.agrawal@nxp.com; Eads, Gage <gage.eads@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model
> 
> On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > 2016-11-24 07:29, Jerin Jacob:
> > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > 2016-11-18 11:14, Jerin Jacob:
> > > > > +Eventdev API - EXPERIMENTAL
> > > > > +M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > > +F: lib/librte_eventdev/
> > > >
> > > > OK to mark it experimental.
> > > > What is the plan to remove the experimental word?
> > >
> > > IMO, EXPERIMENTAL status can be changed when
> > > - At least two event drivers available(Intel and Cavium are working on
> > >   SW and HW event drivers)
> > > - Functional test applications are fine with at least two drivers
> > > - Portable example application to showcase the features of the library
> > > - eventdev integration with another dpdk subsystem such as ethdev
> > >
> > > Thoughts?. I am not sure the criteria used in cryptodev case.
> >
> > Sounds good.
> > We will be more confident when drivers and tests will be implemented.
> >
> > I think the roadmap for the SW driver targets the release 17.05.
> > Do you still plan 17.02 for this API and the Cavium driver?
> 
> No. 17.02 too short for up-streaming the Cavium driver.However, I think API and
> skeleton event driver can go in 17.02 if there are no objections.
> 
> >
> > > > > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
> > > > > +/**< Skeleton event device PMD name */
> > > >
> > > > I do not understand this #define.
> > >
> > > Applications can explicitly request the a specific driver though driver
> > > name. This will go as argument to rte_event_dev_get_dev_id(const char *name).
> > > The reason for keeping this #define in rte_eventdev.h is that,
> > > application needs to include only rte_eventdev.h not rte_eventdev_pmd.h.
> >
> > So each driver must register its name in the API?
> > Is it really needed?
> 
> Otherwise how application knows the name of the driver.
> The similar scheme used in cryptodev.
> http://dpdk.org/browse/dpdk/tree/lib/librte_cryptodev/rte_cryptodev.h#n53
> No strong opinion here. Open for suggestions.
> 
> >
> > > > > +struct rte_event_dev_config {
> > > > > +	uint32_t dequeue_wait_ns;
> > > > > +	/**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this device.
> > > >
> > > > Please explain exactly when the wait occurs and why.
> > >
> > > Here is the explanation from rte_event_dequeue() API definition,
> > > -
> > > @param wait
> > > 0 - no-wait, returns immediately if there is no event.
> > > >0 - wait for the event, if the device is configured with
> > > RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until
> > > the event available or *wait* time.
> > > if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
> > > then this function will wait until the event available or *dequeue_wait_ns*
> > >                                                       ^^^^^^^^^^^^^^^^^^^^^^
> > > ns which was previously supplied to rte_event_dev_configure()
> > > -
> > > This is provides the application to have control over, how long the
> > > implementation should wait if event is not available.
> > >
> > > Let me know what exact changes are required if details are not enough in
> > > rte_event_dequeue() API definition.
> >
> > Maybe that timeout would be a better name.
> > It waits only if there is nothing in the queue.
> > It can be interesting to highlight in this comment that this parameter
> > makes the dequeue function a blocking call.
> 
> OK. I will change to timeout then
> 
> >
> > > > > +/** Event port configuration structure */
> > > > > +struct rte_event_port_conf {
> > > > > +	int32_t new_event_threshold;
> > > > > +	/**< A backpressure threshold for new event enqueues on this port.
> > > > > +	 * Use for *closed system* event dev where event capacity is limited,
> > > > > +	 * and cannot exceed the capacity of the event dev.
> > > > > +	 * Configuring ports with different thresholds can make higher priority
> > > > > +	 * traffic less likely to  be backpressured.
> > > > > +	 * For example, a port used to inject NIC Rx packets into the event dev
> > > > > +	 * can have a lower threshold so as not to overwhelm the device,
> > > > > +	 * while ports used for worker pools can have a higher threshold.
> > > > > +	 * This value cannot exceed the *nb_events_limit*
> > > > > +	 * which previously supplied to rte_event_dev_configure()
> > > > > +	 */
> > > > > +	uint8_t dequeue_depth;
> > > > > +	/**< Configure number of bulk dequeues for this event port.
> > > > > +	 * This value cannot exceed the *nb_event_port_dequeue_depth*
> > > > > +	 * which previously supplied to rte_event_dev_configure()
> > > > > +	 */
> > > > > +	uint8_t enqueue_depth;
> > > > > +	/**< Configure number of bulk enqueues for this event port.
> > > > > +	 * This value cannot exceed the *nb_event_port_enqueue_depth*
> > > > > +	 * which previously supplied to rte_event_dev_configure()
> > > > > +	 */
> > > > > +};
> > > >
> > > > The depth configuration is not clear to me.
> > >
> > > Basically the maximum number of events can be enqueued/dequeued at time
> > > from a given event port. depth of one == non burst mode.
> >
> > OK so depth is the queue size. Please could you reword?
> 
> OK
> 
> >
> > > > > +/* Event types to classify the event source */
> > > >
> > > > Why this classification is needed?
> > >
> > > This for application pipeling and the cases like, if application wants to know which
> > > subsystem generated the event.
> > >
> > > example packet forwarding loop on the worker cores:
> > > while(1) {
> > > 	ev = dequeue()
> > > 	// event from ethdev subsystem
> > > 	if (ev.event_type == RTE_EVENT_TYPE_ETHDEV) {
> > > 		- swap the mac address
> > > 		- push to atomic queue for ingress flow order maintenance
> > > 		  by CORE
> > > 	/* events from core */
> > > 	} else if (ev.event_type == RTE_EVENT_TYPE_CORE) {
> > >
> > > 	}
> > > 	enqueue(ev);
> > > }
> >
> > I don't know why but I feel this classification is weak.
> > You need to track the source of the event. Does it make sense to go beyond
> > and identify the source device?
> 
> No, dequeue has dev_id argument, so event comes only from that device
> 
> >
> > > > > +#define RTE_EVENT_TYPE_ETHDEV           0x0
> > > > > +/**< The event generated from ethdev subsystem */
> > > > > +#define RTE_EVENT_TYPE_CRYPTODEV        0x1
> > > > > +/**< The event generated from crypodev subsystem */
> > > > > +#define RTE_EVENT_TYPE_TIMERDEV         0x2
> > > > > +/**< The event generated from timerdev subsystem */
> > > > > +#define RTE_EVENT_TYPE_CORE             0x3
> > > > > +/**< The event generated from core.
> > > >
> > > > What is core?
> > >
> > > The event are generated by lcore for pipeling. Any suggestion for
> > > better name? lcore?
> >
> > What about CPU or SW?
> 
> No strong opinion here. I will go with CPU then


+1 for CPU (as SW is the software PMD name).


> > > > > +		/**< Opaque event pointer */
> > > > > +		struct rte_mbuf *mbuf;
> > > > > +		/**< mbuf pointer if dequeued event is associated with mbuf */
> > > >
> > > > How do we know that an event is associated with mbuf?
> > >
> > > By looking at the event source/type RTE_EVENT_TYPE_*
> > >
> > > > Does it mean that such events are always converted into mbuf even if the
> > > > application does not need it?
> > >
> > > Hardware has dependency on getting physical address of the event, so any
> > > struct that has "phys_addr_t buf_physaddr" works.
> >
> > I do not understand.
> 
> In HW based implementations, the event pointer will be submitted to HW.
> As you know, since HW can't understand the virtual address and it needs
> to converted to the physical address, any DPDK object that provides phys_addr_t
> such as mbuf can be used with libeventdev.
> 
> >
> > I tought that decoding the event would be the responsibility of the app
> > by calling a function like
> > rte_eventdev_convert_to_mbuf(struct rte_event *, struct rte_mbuf *).
> 
> It can be. But it is costly.i.e Yet another function pointer based
> driver interface on fastpath. Instead, if the driver itself can
> convert to mbuf(in case of ETHDEV device) and tag the source/event type
> as RTE_EVENT_TYPE_ETHDEV.
> IMO the proposed schemed helps in SW based implementation as their no real
> mbuf conversation. Something we can revisit in ethdev integration if
> required.
> 
> >
> > > > > +struct rte_eventdev_driver;
> > > > > +struct rte_eventdev_ops;
> > > >
> > > > I think it is better to split API and driver interface in two files.
> > > > (we should do this split in ethdev)
> > >
> > > I thought so, but then the "static inline" versions of northbound
> > > API(like rte_event_enqueue) will go another file(due to the fact that
> > > implementation need to deference "dev->data->ports[port_id]"). Do you want that way?
> > > I would like to keep all northbound API in rte_eventdev.h and not any of them
> > > in rte_eventdev_pmd.h.
> >
> > My comment was confusing.
> > You are doing 2 files, one for API (what you call northbound I think)
> > and the other one for driver interface (what you call southbound I think),
> > it's very fine.
> >
> > > > > +/**
> > > > > + * Enqueue the event object supplied in the *rte_event* structure on an
> > > > > + * event device designated by its *dev_id* through the event port specified by
> > > > > + * *port_id*. The event object specifies the event queue on which this
> > > > > + * event will be enqueued.
> > > > > + *
> > > > > + * @param dev_id
> > > > > + *   Event device identifier.
> > > > > + * @param port_id
> > > > > + *   The identifier of the event port.
> > > > > + * @param ev
> > > > > + *   Pointer to struct rte_event
> > > > > + *
> > > > > + * @return
> > > > > + *  - 0 on success
> > > > > + *  - <0 on failure. Failure can occur if the event port's output queue is
> > > > > + *     backpressured, for instance.
> > > > > + */
> > > > > +static inline int
> > > > > +rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev)
> > > >
> > > > Is it really needed to have non-burst variant of enqueue/dequeue?
> > >
> > > Yes. certain HW can work only with non burst variants.
> >
> > Same comment as Bruce, we must keep only the burst variant.
> > We cannot have different API for different HW.
> 
> I don't think there is any portability issue here, I can explain.
> 
> The application level, we have two more use case to deal with non burst
> variant
> 
> - latency critical work
> - on dequeue, if application wants to deal with only one flow(i.e to
>   avoid processing two different application flows to avoid cache trashing)
> 
> Selection of the burst variants will be based on
> rte_event_dev_info_get() and rte_event_dev_configure()(see, max_event_port_dequeue_depth,
> max_event_port_enqueue_depth, nb_event_port_dequeue_depth, nb_event_port_enqueue_depth )
> So I don't think their is portability issue here and I don't want to waste my
> CPU cycles on the for loop if application known to be working with non
> bursts variant like below
> 
> nb_events = rte_event_dequeue_burst();
> for(i=0; i < nb_events; i++){
> 	process ev[i]
> }
> 
> And mostly importantly the NPU can get almost same throughput
> without burst variant so why not?


Perhaps I'm mis-understanding, but can you not just dequeue 1 from the burst() function?

struct rte_event ev;
rte_event_dequeue_burst(dev, port, &ev, 1, wait);
process( &ev );

I mean, if an application *demands* to not use bursts, the above allows it. Of course it won't scale to other implementations that would benefit from burst - but that's the application authors choice?


> > > > > +/**
> > > > > + * Converts nanoseconds to *wait* value for rte_event_dequeue()
> > > > > + *
> > > > > + * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then
> > > > > + * application can use this function to convert wait value in nanoseconds to
> > > > > + * implementations specific wait value supplied in rte_event_dequeue()
> > > >
> > > > Why is it implementation-specific?
> > > > Why this conversion is not internal in the driver?
> > >
> > > This is for performance optimization, otherwise in drivers
> > > need to convert ns to ticks in "fast path"
> >
> > So why not defining the unit of this timeout as CPU cycles like the ones
> > returned by rte_get_timer_cycles()?
> 
> Because HW co-processor can run in different clock domain. Need not be at
> CPU frequency.
  
Bruce Richardson Nov. 25, 2016, 12:09 p.m. UTC | #10
> -----Original Message-----
> From: Van Haaren, Harry
> Sent: Friday, November 25, 2016 11:59 AM
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>; Thomas Monjalon
> <thomas.monjalon@6wind.com>
> Cc: dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>;
> hemant.agrawal@nxp.com; Eads, Gage <gage.eads@intel.com>
> Subject: RE: [dpdk-dev] [PATCH 1/4] eventdev: introduce event driven
> programming model
> 
> Hi All,
> 
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Friday, November 25, 2016 12:24 AM
> > To: Thomas Monjalon <thomas.monjalon@6wind.com>
> > Cc: dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>; Van
> > Haaren, Harry <harry.van.haaren@intel.com>; hemant.agrawal@nxp.com;
> > Eads, Gage <gage.eads@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH 1/4] eventdev: introduce event driven
> > programming model
> >
> > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > > 2016-11-24 07:29, Jerin Jacob:
> > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > > 2016-11-18 11:14, Jerin Jacob:
>
> > > > > > +#define RTE_EVENT_TYPE_ETHDEV           0x0
> > > > > > +/**< The event generated from ethdev subsystem */
> > > > > > +#define RTE_EVENT_TYPE_CRYPTODEV        0x1
> > > > > > +/**< The event generated from crypodev subsystem */
> > > > > > +#define RTE_EVENT_TYPE_TIMERDEV         0x2
> > > > > > +/**< The event generated from timerdev subsystem */
> > > > > > +#define RTE_EVENT_TYPE_CORE             0x3
> > > > > > +/**< The event generated from core.
> > > > >
> > > > > What is core?
> > > >
> > > > The event are generated by lcore for pipeling. Any suggestion for
> > > > better name? lcore?
> > >
> > > What about CPU or SW?
> >
> > No strong opinion here. I will go with CPU then
> 
> 
> +1 for CPU (as SW is the software PMD name).
> 

Fine, I'm outvoted. I'll learn to live with it. :-)

/Bruce
  
Thomas Monjalon Nov. 25, 2016, 1:09 p.m. UTC | #11
2016-11-25 11:00, Bruce Richardson:
> On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote:
> > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > > 2016-11-24 07:29, Jerin Jacob:
> > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > > 2016-11-18 11:14, Jerin Jacob:
> > > > > > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
> > > > > > +/**< Skeleton event device PMD name */
> > > > > 
> > > > > I do not understand this #define.
> > > > 
> > > > Applications can explicitly request the a specific driver though driver
> > > > name. This will go as argument to rte_event_dev_get_dev_id(const char *name).
> > > > The reason for keeping this #define in rte_eventdev.h is that,
> > > > application needs to include only rte_eventdev.h not rte_eventdev_pmd.h.
> > > 
> > > So each driver must register its name in the API?
> > > Is it really needed?
> > 
> > Otherwise how application knows the name of the driver.
> > The similar scheme used in cryptodev.
> > http://dpdk.org/browse/dpdk/tree/lib/librte_cryptodev/rte_cryptodev.h#n53
> > No strong opinion here. Open for suggestions.
> > 
> 
> I like having a name registered. I think we need a scheme where an app
> can find and use an implementation using a specific driver.

I do not like having the driver names in the API.
An API should not know its drivers.
If an application do some driver-specific processing, it knows
the driver name as well. The driver name is written in the driver.
  
Jerin Jacob Nov. 26, 2016, 12:57 a.m. UTC | #12
On Fri, Nov 25, 2016 at 02:09:22PM +0100, Thomas Monjalon wrote:
> 2016-11-25 11:00, Bruce Richardson:
> > On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote:
> > > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > > > 2016-11-24 07:29, Jerin Jacob:
> > > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > > > 2016-11-18 11:14, Jerin Jacob:
> > > > > > > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
> > > > > > > +/**< Skeleton event device PMD name */
> > > > > > 
> > > > > > I do not understand this #define.
> > > > > 
> > > > > Applications can explicitly request the a specific driver though driver
> > > > > name. This will go as argument to rte_event_dev_get_dev_id(const char *name).
> > > > > The reason for keeping this #define in rte_eventdev.h is that,
> > > > > application needs to include only rte_eventdev.h not rte_eventdev_pmd.h.
> > > > 
> > > > So each driver must register its name in the API?
> > > > Is it really needed?
> > > 
> > > Otherwise how application knows the name of the driver.
> > > The similar scheme used in cryptodev.
> > > http://dpdk.org/browse/dpdk/tree/lib/librte_cryptodev/rte_cryptodev.h#n53
> > > No strong opinion here. Open for suggestions.
> > > 
> > 
> > I like having a name registered. I think we need a scheme where an app
> > can find and use an implementation using a specific driver.
> 
> I do not like having the driver names in the API.
> An API should not know its drivers.
> If an application do some driver-specific processing, it knows
> the driver name as well. The driver name is written in the driver.

If Bruce don't have further objection, Then I will go with Thomas's
suggestion.
  
Jerin Jacob Nov. 26, 2016, 2:54 a.m. UTC | #13
On Fri, Nov 25, 2016 at 11:00:53AM +0000, Bruce Richardson wrote:
> On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote:
> > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > > 2016-11-24 07:29, Jerin Jacob:
> > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > > 2016-11-18 11:14, Jerin Jacob:
> > > > > > +Eventdev API - EXPERIMENTAL
> > > > > > +M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > > > +F: lib/librte_eventdev/
> > > > > 
> > 
> > I don't think there is any portability issue here, I can explain.
> > 
> > The application level, we have two more use case to deal with non burst
> > variant
> > 
> > - latency critical work
> > - on dequeue, if application wants to deal with only one flow(i.e to
> >   avoid processing two different application flows to avoid cache trashing)
> > 
> > Selection of the burst variants will be based on
> > rte_event_dev_info_get() and rte_event_dev_configure()(see, max_event_port_dequeue_depth,
> > max_event_port_enqueue_depth, nb_event_port_dequeue_depth, nb_event_port_enqueue_depth )
> > So I don't think their is portability issue here and I don't want to waste my
> > CPU cycles on the for loop if application known to be working with non
> > bursts variant like below
> > 
> 
> If the application is known to be working on non-burst varients, then
> they always request a burst-size of 1, and skip the loop completely.
> There is no extra performance hit in that case in either the app or the
> driver (since the non-burst driver always returns 1, irrespective of the
> number requested).

Hmm. I am afraid, There is.
On the app side, the const "1" can not be optimized by the compiler as
on downside it is function pointer based driver interface
On the driver side, the implementation would be for loop based instead
of plain access.
(compiler never can see the const "1" in driver interface)

We are planning to implement burst mode as kind of emulation mode and
have a different scheme for burst and nonburst. The similar approach we have
taken in introducing rte_event_schedule() and split the responsibility so
that SW driver can work without additional performance overhead and neat
driver interface.

If you are concerned about the usability part and regression on the SW
driver, then it's not the case, application will use nonburst variant only if
dequeue_depth == 1 and/or explicit case where latency matters.

On the portability side, we support both case and application if written based
on dequeue_depth it will perform well in both implementations.IMO, There is
no another shortcut for performance optimized application running on different
set of model.I think it is not an issue as, in event model as each cores
identical and main loop can be changed based on dequeue_depth
if needs performance(anyway mainloop will be function pointer based).

> 
> > nb_events = rte_event_dequeue_burst();
> > for(i=0; i < nb_events; i++){
> > 	process ev[i]
> > }
> > 
> > And mostly importantly the NPU can get almost same throughput
> > without burst variant so why not?
> > 
> > > 
> > > > > > +/**
> > > > > > + * Converts nanoseconds to *wait* value for rte_event_dequeue()
> > > > > > + *
> > > > > > + * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then
> > > > > > + * application can use this function to convert wait value in nanoseconds to
> > > > > > + * implementations specific wait value supplied in rte_event_dequeue()
> > > > > 
> > > > > Why is it implementation-specific?
> > > > > Why this conversion is not internal in the driver?
> > > > 
> > > > This is for performance optimization, otherwise in drivers
> > > > need to convert ns to ticks in "fast path"
> > > 
> > > So why not defining the unit of this timeout as CPU cycles like the ones
> > > returned by rte_get_timer_cycles()?
> > 
> > Because HW co-processor can run in different clock domain. Need not be at
> > CPU frequency.
> > 
> While I've no huge objection to this API, since it will not be
> implemented by our SW implementation, I'm just curious as to how much
> having this will save. How complicated is the arithmetic that needs to
> be done, and how many cycles on your platform is that going to take?

one load, division and/or multiplication of (floating) numbers. I could be
6isl cycles or more, but it matters when burst size is less(worst case 1).
I think the software implementation could use rte_get_timer_cycles() here
if required.I think there is no harm in moving some-work in slow-path if it
can be, like this case.
  
Bruce Richardson Nov. 28, 2016, 9:10 a.m. UTC | #14
On Sat, Nov 26, 2016 at 06:27:57AM +0530, Jerin Jacob wrote:
> On Fri, Nov 25, 2016 at 02:09:22PM +0100, Thomas Monjalon wrote:
> > 2016-11-25 11:00, Bruce Richardson:
> > > On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote:
> > > > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > > > > 2016-11-24 07:29, Jerin Jacob:
> > > > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > > > > 2016-11-18 11:14, Jerin Jacob:
> > > > > > > > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
> > > > > > > > +/**< Skeleton event device PMD name */
> > > > > > > 
> > > > > > > I do not understand this #define.
> > > > > > 
> > > > > > Applications can explicitly request the a specific driver though driver
> > > > > > name. This will go as argument to rte_event_dev_get_dev_id(const char *name).
> > > > > > The reason for keeping this #define in rte_eventdev.h is that,
> > > > > > application needs to include only rte_eventdev.h not rte_eventdev_pmd.h.
> > > > > 
> > > > > So each driver must register its name in the API?
> > > > > Is it really needed?
> > > > 
> > > > Otherwise how application knows the name of the driver.
> > > > The similar scheme used in cryptodev.
> > > > http://dpdk.org/browse/dpdk/tree/lib/librte_cryptodev/rte_cryptodev.h#n53
> > > > No strong opinion here. Open for suggestions.
> > > > 
> > > 
> > > I like having a name registered. I think we need a scheme where an app
> > > can find and use an implementation using a specific driver.
> > 
> > I do not like having the driver names in the API.
> > An API should not know its drivers.
> > If an application do some driver-specific processing, it knows
> > the driver name as well. The driver name is written in the driver.
> 
> If Bruce don't have further objection, Then I will go with Thomas's
> suggestion.
>
Go with it.
  
Bruce Richardson Nov. 28, 2016, 9:16 a.m. UTC | #15
On Sat, Nov 26, 2016 at 08:24:55AM +0530, Jerin Jacob wrote:
> On Fri, Nov 25, 2016 at 11:00:53AM +0000, Bruce Richardson wrote:
> > On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote:
> > > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > > > 2016-11-24 07:29, Jerin Jacob:
> > > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > > > 2016-11-18 11:14, Jerin Jacob:
> > > > > > > +Eventdev API - EXPERIMENTAL
> > > > > > > +M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > > > > +F: lib/librte_eventdev/
> > > > > > 
> > > 
> > > I don't think there is any portability issue here, I can explain.
> > > 
> > > The application level, we have two more use case to deal with non burst
> > > variant
> > > 
> > > - latency critical work
> > > - on dequeue, if application wants to deal with only one flow(i.e to
> > >   avoid processing two different application flows to avoid cache trashing)
> > > 
> > > Selection of the burst variants will be based on
> > > rte_event_dev_info_get() and rte_event_dev_configure()(see, max_event_port_dequeue_depth,
> > > max_event_port_enqueue_depth, nb_event_port_dequeue_depth, nb_event_port_enqueue_depth )
> > > So I don't think their is portability issue here and I don't want to waste my
> > > CPU cycles on the for loop if application known to be working with non
> > > bursts variant like below
> > > 
> > 
> > If the application is known to be working on non-burst varients, then
> > they always request a burst-size of 1, and skip the loop completely.
> > There is no extra performance hit in that case in either the app or the
> > driver (since the non-burst driver always returns 1, irrespective of the
> > number requested).
> 
> Hmm. I am afraid, There is.
> On the app side, the const "1" can not be optimized by the compiler as
> on downside it is function pointer based driver interface
> On the driver side, the implementation would be for loop based instead
> of plain access.
> (compiler never can see the const "1" in driver interface)
> 
> We are planning to implement burst mode as kind of emulation mode and
> have a different scheme for burst and nonburst. The similar approach we have
> taken in introducing rte_event_schedule() and split the responsibility so
> that SW driver can work without additional performance overhead and neat
> driver interface.
> 
> If you are concerned about the usability part and regression on the SW
> driver, then it's not the case, application will use nonburst variant only if
> dequeue_depth == 1 and/or explicit case where latency matters.
> 
> On the portability side, we support both case and application if written based
> on dequeue_depth it will perform well in both implementations.IMO, There is
> no another shortcut for performance optimized application running on different
> set of model.I think it is not an issue as, in event model as each cores
> identical and main loop can be changed based on dequeue_depth
> if needs performance(anyway mainloop will be function pointer based).
> 

Ok, I think I see your point now. Here is an alternative suggestion.

1. Keep the single user API.
2. Have both single and burst function pointers in the driver
3. Call appropriately in the eventdev layer based on parameters. For
example:

rte_event_dequeue_burst(..., int num)
{
	if (num == 1 && single_dequeue_fn != NULL)
		return single_dequeue_fn(...);
	return burst_dequeue_fn(...);
}

This way drivers can optionally special-case the single dequeue case -
the function pointer check will definitely be predictable in HW making
that a near-zero-cost check - while not forcing all drivers to do so.
It also reduces the public API surface, and gives us a single enqueue
and dequeue function.

/Bruce
  
Thomas Monjalon Nov. 28, 2016, 11:30 a.m. UTC | #16
2016-11-28 09:16, Bruce Richardson:
> On Sat, Nov 26, 2016 at 08:24:55AM +0530, Jerin Jacob wrote:
> > On Fri, Nov 25, 2016 at 11:00:53AM +0000, Bruce Richardson wrote:
> > > On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote:
> > > > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > > > > 2016-11-24 07:29, Jerin Jacob:
> > > > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > > > > 2016-11-18 11:14, Jerin Jacob:
> > > > > > > > +Eventdev API - EXPERIMENTAL
> > > > > > > > +M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > > > > > +F: lib/librte_eventdev/
> > > > > > > 
> > > > 
> > > > I don't think there is any portability issue here, I can explain.
> > > > 
> > > > The application level, we have two more use case to deal with non burst
> > > > variant
> > > > 
> > > > - latency critical work
> > > > - on dequeue, if application wants to deal with only one flow(i.e to
> > > >   avoid processing two different application flows to avoid cache trashing)
> > > > 
> > > > Selection of the burst variants will be based on
> > > > rte_event_dev_info_get() and rte_event_dev_configure()(see, max_event_port_dequeue_depth,
> > > > max_event_port_enqueue_depth, nb_event_port_dequeue_depth, nb_event_port_enqueue_depth )
> > > > So I don't think their is portability issue here and I don't want to waste my
> > > > CPU cycles on the for loop if application known to be working with non
> > > > bursts variant like below
> > > > 
> > > 
> > > If the application is known to be working on non-burst varients, then
> > > they always request a burst-size of 1, and skip the loop completely.
> > > There is no extra performance hit in that case in either the app or the
> > > driver (since the non-burst driver always returns 1, irrespective of the
> > > number requested).
> > 
> > Hmm. I am afraid, There is.
> > On the app side, the const "1" can not be optimized by the compiler as
> > on downside it is function pointer based driver interface
> > On the driver side, the implementation would be for loop based instead
> > of plain access.
> > (compiler never can see the const "1" in driver interface)
> > 
> > We are planning to implement burst mode as kind of emulation mode and
> > have a different scheme for burst and nonburst. The similar approach we have
> > taken in introducing rte_event_schedule() and split the responsibility so
> > that SW driver can work without additional performance overhead and neat
> > driver interface.
> > 
> > If you are concerned about the usability part and regression on the SW
> > driver, then it's not the case, application will use nonburst variant only if
> > dequeue_depth == 1 and/or explicit case where latency matters.
> > 
> > On the portability side, we support both case and application if written based
> > on dequeue_depth it will perform well in both implementations.IMO, There is
> > no another shortcut for performance optimized application running on different
> > set of model.I think it is not an issue as, in event model as each cores
> > identical and main loop can be changed based on dequeue_depth
> > if needs performance(anyway mainloop will be function pointer based).
> > 
> 
> Ok, I think I see your point now. Here is an alternative suggestion.
> 
> 1. Keep the single user API.
> 2. Have both single and burst function pointers in the driver
> 3. Call appropriately in the eventdev layer based on parameters. For
> example:
> 
> rte_event_dequeue_burst(..., int num)
> {
> 	if (num == 1 && single_dequeue_fn != NULL)
> 		return single_dequeue_fn(...);
> 	return burst_dequeue_fn(...);
> }
> 
> This way drivers can optionally special-case the single dequeue case -
> the function pointer check will definitely be predictable in HW making
> that a near-zero-cost check - while not forcing all drivers to do so.
> It also reduces the public API surface, and gives us a single enqueue
> and dequeue function.

+1
  
Jerin Jacob Nov. 29, 2016, 4:01 a.m. UTC | #17
On Mon, Nov 28, 2016 at 09:16:10AM +0000, Bruce Richardson wrote:
> On Sat, Nov 26, 2016 at 08:24:55AM +0530, Jerin Jacob wrote:
> > On Fri, Nov 25, 2016 at 11:00:53AM +0000, Bruce Richardson wrote:
> > > On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote:
> > > > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > > > > 2016-11-24 07:29, Jerin Jacob:
> > > > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > > > > 2016-11-18 11:14, Jerin Jacob:
> > > > > > > > +Eventdev API - EXPERIMENTAL
> > > > > > > > +M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > > > > > +F: lib/librte_eventdev/
> > > > > > > 
> > > > 
> > > > I don't think there is any portability issue here, I can explain.
> > > > 
> > > > The application level, we have two more use case to deal with non burst
> > > > variant
> > > > 
> > > > - latency critical work
> > > > - on dequeue, if application wants to deal with only one flow(i.e to
> > > >   avoid processing two different application flows to avoid cache trashing)
> > > > 
> > > > Selection of the burst variants will be based on
> > > > rte_event_dev_info_get() and rte_event_dev_configure()(see, max_event_port_dequeue_depth,
> > > > max_event_port_enqueue_depth, nb_event_port_dequeue_depth, nb_event_port_enqueue_depth )
> > > > So I don't think their is portability issue here and I don't want to waste my
> > > > CPU cycles on the for loop if application known to be working with non
> > > > bursts variant like below
> > > > 
> > > 
> > > If the application is known to be working on non-burst varients, then
> > > they always request a burst-size of 1, and skip the loop completely.
> > > There is no extra performance hit in that case in either the app or the
> > > driver (since the non-burst driver always returns 1, irrespective of the
> > > number requested).
> > 
> > Hmm. I am afraid, There is.
> > On the app side, the const "1" can not be optimized by the compiler as
> > on downside it is function pointer based driver interface
> > On the driver side, the implementation would be for loop based instead
> > of plain access.
> > (compiler never can see the const "1" in driver interface)
> > 
> > We are planning to implement burst mode as kind of emulation mode and
> > have a different scheme for burst and nonburst. The similar approach we have
> > taken in introducing rte_event_schedule() and split the responsibility so
> > that SW driver can work without additional performance overhead and neat
> > driver interface.
> > 
> > If you are concerned about the usability part and regression on the SW
> > driver, then it's not the case, application will use nonburst variant only if
> > dequeue_depth == 1 and/or explicit case where latency matters.
> > 
> > On the portability side, we support both case and application if written based
> > on dequeue_depth it will perform well in both implementations.IMO, There is
> > no another shortcut for performance optimized application running on different
> > set of model.I think it is not an issue as, in event model as each cores
> > identical and main loop can be changed based on dequeue_depth
> > if needs performance(anyway mainloop will be function pointer based).
> > 
> 
> Ok, I think I see your point now. Here is an alternative suggestion.
> 
> 1. Keep the single user API.
> 2. Have both single and burst function pointers in the driver
> 3. Call appropriately in the eventdev layer based on parameters. For
> example:
> 
> rte_event_dequeue_burst(..., int num)
> {
> 	if (num == 1 && single_dequeue_fn != NULL)
> 		return single_dequeue_fn(...);
> 	return burst_dequeue_fn(...);
> }
> 
> This way drivers can optionally special-case the single dequeue case -
> the function pointer check will definitely be predictable in HW making
> that a near-zero-cost check - while not forcing all drivers to do so.
> It also reduces the public API surface, and gives us a single enqueue
> and dequeue function.

The alternative suggestion looks good to me. Yes, it makes sense to reduces the
public API interface if possible.

Regarding the implementation, I thought to have a bit approach like below
to reduce the cost of additional AND operation.(with const "1", compiler
can choose with correct one with out any overhead)

rte_event_dequeue_burst(..., int num)
{
	if (num == 1)
		return single_dequeue_fn(...);
	return burst_dequeue_fn(...);
}

"single_dequeue_fn" populated from the driver layer.
In the absence of populating the "single_dequeue_fn" from the driver layer,
The common code can create the single_dequeue_fn using driver
provided "burst_dequeue_fn"

something like
generic_single_dequeue_fn(dev){
{
	dev->burst_dequeue_fn(..,1);
}

Any concerns?

> 
> /Bruce
>
  
Bruce Richardson Nov. 29, 2016, 10 a.m. UTC | #18
On Tue, Nov 29, 2016 at 09:31:42AM +0530, Jerin Jacob wrote:
> On Mon, Nov 28, 2016 at 09:16:10AM +0000, Bruce Richardson wrote:
> > On Sat, Nov 26, 2016 at 08:24:55AM +0530, Jerin Jacob wrote:
> > > On Fri, Nov 25, 2016 at 11:00:53AM +0000, Bruce Richardson wrote:
> > > > On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote:
> > > > > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > > > > > 2016-11-24 07:29, Jerin Jacob:
> > > > > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > > > > > 2016-11-18 11:14, Jerin Jacob:
> > > > > > > > > +Eventdev API - EXPERIMENTAL
> > > > > > > > > +M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > > > > > > +F: lib/librte_eventdev/
> > > > > > > > 
> > > > > 
> > > > > I don't think there is any portability issue here, I can explain.
> > > > > 
> > > > > The application level, we have two more use case to deal with non burst
> > > > > variant
> > > > > 
> > > > > - latency critical work
> > > > > - on dequeue, if application wants to deal with only one flow(i.e to
> > > > >   avoid processing two different application flows to avoid cache trashing)
> > > > > 
> > > > > Selection of the burst variants will be based on
> > > > > rte_event_dev_info_get() and rte_event_dev_configure()(see, max_event_port_dequeue_depth,
> > > > > max_event_port_enqueue_depth, nb_event_port_dequeue_depth, nb_event_port_enqueue_depth )
> > > > > So I don't think their is portability issue here and I don't want to waste my
> > > > > CPU cycles on the for loop if application known to be working with non
> > > > > bursts variant like below
> > > > > 
> > > > 
> > > > If the application is known to be working on non-burst varients, then
> > > > they always request a burst-size of 1, and skip the loop completely.
> > > > There is no extra performance hit in that case in either the app or the
> > > > driver (since the non-burst driver always returns 1, irrespective of the
> > > > number requested).
> > > 
> > > Hmm. I am afraid, There is.
> > > On the app side, the const "1" can not be optimized by the compiler as
> > > on downside it is function pointer based driver interface
> > > On the driver side, the implementation would be for loop based instead
> > > of plain access.
> > > (compiler never can see the const "1" in driver interface)
> > > 
> > > We are planning to implement burst mode as kind of emulation mode and
> > > have a different scheme for burst and nonburst. The similar approach we have
> > > taken in introducing rte_event_schedule() and split the responsibility so
> > > that SW driver can work without additional performance overhead and neat
> > > driver interface.
> > > 
> > > If you are concerned about the usability part and regression on the SW
> > > driver, then it's not the case, application will use nonburst variant only if
> > > dequeue_depth == 1 and/or explicit case where latency matters.
> > > 
> > > On the portability side, we support both case and application if written based
> > > on dequeue_depth it will perform well in both implementations.IMO, There is
> > > no another shortcut for performance optimized application running on different
> > > set of model.I think it is not an issue as, in event model as each cores
> > > identical and main loop can be changed based on dequeue_depth
> > > if needs performance(anyway mainloop will be function pointer based).
> > > 
> > 
> > Ok, I think I see your point now. Here is an alternative suggestion.
> > 
> > 1. Keep the single user API.
> > 2. Have both single and burst function pointers in the driver
> > 3. Call appropriately in the eventdev layer based on parameters. For
> > example:
> > 
> > rte_event_dequeue_burst(..., int num)
> > {
> > 	if (num == 1 && single_dequeue_fn != NULL)
> > 		return single_dequeue_fn(...);
> > 	return burst_dequeue_fn(...);
> > }
> > 
> > This way drivers can optionally special-case the single dequeue case -
> > the function pointer check will definitely be predictable in HW making
> > that a near-zero-cost check - while not forcing all drivers to do so.
> > It also reduces the public API surface, and gives us a single enqueue
> > and dequeue function.
> 
> The alternative suggestion looks good to me. Yes, it makes sense to reduces the
> public API interface if possible.
> 
> Regarding the implementation, I thought to have a bit approach like below
> to reduce the cost of additional AND operation.(with const "1", compiler
> can choose with correct one with out any overhead)
> 
> rte_event_dequeue_burst(..., int num)
> {
> 	if (num == 1)
> 		return single_dequeue_fn(...);
> 	return burst_dequeue_fn(...);
> }
> 
> "single_dequeue_fn" populated from the driver layer.
> In the absence of populating the "single_dequeue_fn" from the driver layer,
> The common code can create the single_dequeue_fn using driver
> provided "burst_dequeue_fn"
> 
> something like
> generic_single_dequeue_fn(dev){
> {
> 	dev->burst_dequeue_fn(..,1);
> }
> 
> Any concerns?
> 
No, works ok for me 

/Bruce
  
Jerin Jacob Dec. 6, 2016, 3:52 a.m. UTC | #19
As previously discussed in RFC v1 [1], RFC v2 [2], with changes
described in [3] (also pasted below), here is the first non-draft series
for this new API.

[1] http://dpdk.org/ml/archives/dev/2016-August/045181.html
[2] http://dpdk.org/ml/archives/dev/2016-October/048592.html
[3] http://dpdk.org/ml/archives/dev/2016-October/048196.html

v1..v2:
1) Remove unnecessary header files from rte_eventdev.h(Thomas)
2) Removed PMD driver name(EVENTDEV_NAME_SKELETON_PMD) from rte_eventdev.h(Thomas)
3) Removed different #define for different priority schemes. Changed to
one event device RTE_EVENT_DEV_PRIORITY_* priority (Bruce)
4) add const to rte_event_dev_configure(), rte_event_queue_setup(),
rte_event_port_setup(), rte_event_port_link()(Bruce)
5) Fixed missing dev argument in dev->schedule() function(Bruce)
6) Changed \see to @see in doxgen comments(Thomas)
7) Added additional text in specification to clarify the queue depth(Thomas)
8) Changed wait to timeout across the specification(Thomas)
9) Added longer explanation for RTE_EVENT_OP_NEW and RTE_EVENT_OP_FORWARD(Thomas)
10) Fixed issue with RTE_EVENT_OP_RELEASE doxgen formatting(Thomas)
11) Changed to RTE_EVENT_DEV_CFG_FLAG_ from RTE_EVENT_DEV_CFG_(Thomas)
12) Changed to EVENT_QUEUE_CFG_FLAG_ from EVENT_QUEUE_CFG_(Thomas)
13) s/RTE_EVENT_TYPE_CORE/RTE_EVENT_TYPE_CPU/(Thomas, Gage)
14) Removed non burst API and kept only the burst API in the API specification
(Thomas, Bruce, Harry, Jerin)
-- Driver interface has non burst API, selection of the non burst API is based
on num_objects == 1
15) sizeeof(struct rte_event) was not 16 in v1. Fixed it in v2
-- reduced the width of event_type to 4bit to save space for future change
-- introduced impl_opaque for implementation specific opaque data(Harry),
Something useful for HW driver too, in the context of removal the need for sepeare
release API.
-- squashed other element size and provided enough space to impl_opaque(Jerin)
-- added RTE_BUILD_BUG_ON(sizeof(struct rte_event) != 16); check
16) add union of uint64_t in the second element in struct rte_event to
make sure the structure has 16byte address all arch(Thomas)
17) Fixed invalid check of nb_atomic_order_sequences in implementation(Gage)
18) s/EDEV_LOG_ERR/RTE_EDEV_LOG_ERR(Thomas)
19) s/rte_eventdev_pmd_/rte_event_pmd_/(Bruce)
20) added fine details of distributed vs centralized scheduling information
in the specification and introduced RTE_EVENT_DEV_CAP_FLAG_DISTRIBUTED_SCHED
flag(Gage)
21)s/RTE_EVENT_QUEUE_CFG_FLAG_SINGLE_CONSUMER/RTE_EVENT_QUEUE_CFG_FLAG_SINGLE_LINK (Jerin)
to remove the confusion to between another producer and consumer in sw eventdev driver
22) Northbound api implementation  patch spited to more logical patches(Thomas)


Changes since RFC v2:

- Updated the documentation to define the need for this library[Jerin]
- Added RTE_EVENT_QUEUE_CFG_*_ONLY configuration parameters in
  struct rte_event_queue_conf to enable optimized sw implementation [Bruce]
- Introduced RTE_EVENT_OP* ops [Bruce]
- Added nb_event_queue_flows,nb_event_port_dequeue_depth, nb_event_port_enqueue_depth
  in rte_event_dev_configure() like ethdev and crypto library[Jerin]
- Removed rte_event_release() and replaced with RTE_EVENT_OP_RELEASE ops to
  reduce fast path APIs and it is redundant too[Jerin]
- In the view of better application portability, Removed pin_event
  from rte_event_enqueue as it is just hint and Intel/NXP can not support it[Jerin]
- Added rte_event_port_links_get()[Jerin]
- Added rte_event_dev_dump[Harry]

Notes:

- This patch set is check-patch clean with an exception that
02/04 has one WARNING:MACRO_WITH_FLOW_CONTROL
- Looking forward to getting additional maintainers for libeventdev


TODO:
1) Create user guide

Jerin Jacob (6):
  eventdev: introduce event driven programming model
  eventdev: define southbound driver interface
  eventdev: implement the northbound APIs
  eventdev: implement PMD registration functions
  event/skeleton: add skeleton eventdev driver
  app/test: unit test case for eventdev APIs

 MAINTAINERS                                        |    5 +
 app/test/Makefile                                  |    2 +
 app/test/test_eventdev.c                           |  775 +++++++++++
 config/common_base                                 |   14 +
 doc/api/doxy-api-index.md                          |    1 +
 doc/api/doxy-api.conf                              |    1 +
 drivers/Makefile                                   |    1 +
 drivers/event/Makefile                             |   36 +
 drivers/event/skeleton/Makefile                    |   55 +
 .../skeleton/rte_pmd_skeleton_event_version.map    |    4 +
 drivers/event/skeleton/skeleton_eventdev.c         |  540 ++++++++
 drivers/event/skeleton/skeleton_eventdev.h         |   72 +
 lib/Makefile                                       |    1 +
 lib/librte_eal/common/include/rte_log.h            |    1 +
 lib/librte_eventdev/Makefile                       |   57 +
 lib/librte_eventdev/rte_eventdev.c                 | 1237 +++++++++++++++++
 lib/librte_eventdev/rte_eventdev.h                 | 1408 ++++++++++++++++++++
 lib/librte_eventdev/rte_eventdev_pmd.h             |  506 +++++++
 lib/librte_eventdev/rte_eventdev_version.map       |   39 +
 mk/rte.app.mk                                      |    5 +
 20 files changed, 4760 insertions(+)
 create mode 100644 app/test/test_eventdev.c
 create mode 100644 drivers/event/Makefile
 create mode 100644 drivers/event/skeleton/Makefile
 create mode 100644 drivers/event/skeleton/rte_pmd_skeleton_event_version.map
 create mode 100644 drivers/event/skeleton/skeleton_eventdev.c
 create mode 100644 drivers/event/skeleton/skeleton_eventdev.h
 create mode 100644 lib/librte_eventdev/Makefile
 create mode 100644 lib/librte_eventdev/rte_eventdev.c
 create mode 100644 lib/librte_eventdev/rte_eventdev.h
 create mode 100644 lib/librte_eventdev/rte_eventdev_pmd.h
 create mode 100644 lib/librte_eventdev/rte_eventdev_version.map
  
Bruce Richardson Dec. 6, 2016, 4:46 p.m. UTC | #20
On Tue, Dec 06, 2016 at 09:22:14AM +0530, Jerin Jacob wrote:
> As previously discussed in RFC v1 [1], RFC v2 [2], with changes
> described in [3] (also pasted below), here is the first non-draft series
> for this new API.
> 
> [1] http://dpdk.org/ml/archives/dev/2016-August/045181.html
> [2] http://dpdk.org/ml/archives/dev/2016-October/048592.html
> [3] http://dpdk.org/ml/archives/dev/2016-October/048196.html
> 
> v1..v2:
> 1) Remove unnecessary header files from rte_eventdev.h(Thomas)
> 2) Removed PMD driver name(EVENTDEV_NAME_SKELETON_PMD) from rte_eventdev.h(Thomas)
> 3) Removed different #define for different priority schemes. Changed to
> one event device RTE_EVENT_DEV_PRIORITY_* priority (Bruce)
> 4) add const to rte_event_dev_configure(), rte_event_queue_setup(),
> rte_event_port_setup(), rte_event_port_link()(Bruce)
> 5) Fixed missing dev argument in dev->schedule() function(Bruce)
> 6) Changed \see to @see in doxgen comments(Thomas)
> 7) Added additional text in specification to clarify the queue depth(Thomas)
> 8) Changed wait to timeout across the specification(Thomas)
> 9) Added longer explanation for RTE_EVENT_OP_NEW and RTE_EVENT_OP_FORWARD(Thomas)
> 10) Fixed issue with RTE_EVENT_OP_RELEASE doxgen formatting(Thomas)
> 11) Changed to RTE_EVENT_DEV_CFG_FLAG_ from RTE_EVENT_DEV_CFG_(Thomas)
> 12) Changed to EVENT_QUEUE_CFG_FLAG_ from EVENT_QUEUE_CFG_(Thomas)
> 13) s/RTE_EVENT_TYPE_CORE/RTE_EVENT_TYPE_CPU/(Thomas, Gage)
> 14) Removed non burst API and kept only the burst API in the API specification
> (Thomas, Bruce, Harry, Jerin)
> -- Driver interface has non burst API, selection of the non burst API is based
> on num_objects == 1
> 15) sizeeof(struct rte_event) was not 16 in v1. Fixed it in v2
> -- reduced the width of event_type to 4bit to save space for future change
> -- introduced impl_opaque for implementation specific opaque data(Harry),
> Something useful for HW driver too, in the context of removal the need for sepeare
> release API.
> -- squashed other element size and provided enough space to impl_opaque(Jerin)
> -- added RTE_BUILD_BUG_ON(sizeof(struct rte_event) != 16); check
> 16) add union of uint64_t in the second element in struct rte_event to
> make sure the structure has 16byte address all arch(Thomas)
> 17) Fixed invalid check of nb_atomic_order_sequences in implementation(Gage)
> 18) s/EDEV_LOG_ERR/RTE_EDEV_LOG_ERR(Thomas)
> 19) s/rte_eventdev_pmd_/rte_event_pmd_/(Bruce)
> 20) added fine details of distributed vs centralized scheduling information
> in the specification and introduced RTE_EVENT_DEV_CAP_FLAG_DISTRIBUTED_SCHED
> flag(Gage)
> 21)s/RTE_EVENT_QUEUE_CFG_FLAG_SINGLE_CONSUMER/RTE_EVENT_QUEUE_CFG_FLAG_SINGLE_LINK (Jerin)
> to remove the confusion to between another producer and consumer in sw eventdev driver
> 22) Northbound api implementation  patch spited to more logical patches(Thomas)
> 
> 
Thanks for this Jerin, great job tracking the changes between the
versions.
Couple of comments I have to make on the patches thus far, but I think
we are near having a first version we can commit to a next-event tree.

/Bruce
  
Jerin Jacob Dec. 21, 2016, 9:25 a.m. UTC | #21
As previously discussed in RFC v1 [1], RFC v2 [2], with changes
described in [3] (also pasted below), here is the first non-draft series
for this new API.

[1] http://dpdk.org/ml/archives/dev/2016-August/045181.html
[2] http://dpdk.org/ml/archives/dev/2016-October/048592.html
[3] http://dpdk.org/ml/archives/dev/2016-October/048196.html

v3..v4:

1) Fixed the shared lib build issue(Bruce)
2) Added missing "struct rte_eventdev *dev" in eventdev_queue_release_t(Bruce)
3) In order to shorten the macro name, removed _FLAG_ in it(Bruce)
4) Fixed the wrong 'in-reply-to' while sending the v3 (Shreyansh)

v2..v3:

1) Changed struct rte_event layout more aligment balanced(Harry, Jerin)
2) Changed event_ptr type to void* from uintptr_t(Bruce)
3) Changed ev[] as const in rte_event_enqueue_burst  to disallow
drivers from modifying the events passed in(Bruce)
4) Removed queue memory allocation from common code as some drivers may not need
it(Bruce)
5) Removed "struct rte_event_queue_link" and replaced with queues and priorities
in the link and link_get API to avoid one redirection to use the API(Bruce)

v1..v2:
1) Remove unnecessary header files from rte_eventdev.h(Thomas)
2) Removed PMD driver name(EVENTDEV_NAME_SKELETON_PMD) from rte_eventdev.h(Thomas)
3) Removed different #define for different priority schemes. Changed to
one event device RTE_EVENT_DEV_PRIORITY_* priority (Bruce)
4) add const to rte_event_dev_configure(), rte_event_queue_setup(),
rte_event_port_setup(), rte_event_port_link()(Bruce)
5) Fixed missing dev argument in dev->schedule() function(Bruce)
6) Changed \see to @see in doxgen comments(Thomas)
7) Added additional text in specification to clarify the queue depth(Thomas)
8) Changed wait to timeout across the specification(Thomas)
9) Added longer explanation for RTE_EVENT_OP_NEW and RTE_EVENT_OP_FORWARD(Thomas)
10) Fixed issue with RTE_EVENT_OP_RELEASE doxgen formatting(Thomas)
11) Changed to RTE_EVENT_DEV_CFG_FLAG_ from RTE_EVENT_DEV_CFG_(Thomas)
12) Changed to EVENT_QUEUE_CFG_FLAG_ from EVENT_QUEUE_CFG_(Thomas)
13) s/RTE_EVENT_TYPE_CORE/RTE_EVENT_TYPE_CPU/(Thomas, Gage)
14) Removed non burst API and kept only the burst API in the API specification
(Thomas, Bruce, Harry, Jerin)
-- Driver interface has non burst API, selection of the non burst API is based
on num_objects == 1
15) sizeeof(struct rte_event) was not 16 in v1. Fixed it in v2
-- reduced the width of event_type to 4bit to save space for future change
-- introduced impl_opaque for implementation specific opaque data(Harry),
Something useful for HW driver too, in the context of removal the need for sepeare
release API.
-- squashed other element size and provided enough space to impl_opaque(Jerin)
-- added RTE_BUILD_BUG_ON(sizeof(struct rte_event) != 16); check
16) add union of uint64_t in the second element in struct rte_event to
make sure the structure has 16byte address all arch(Thomas)
17) Fixed invalid check of nb_atomic_order_sequences in implementation(Gage)
18) s/EDEV_LOG_ERR/RTE_EDEV_LOG_ERR(Thomas)
19) s/rte_eventdev_pmd_/rte_event_pmd_/(Bruce)
20) added fine details of distributed vs centralized scheduling information
in the specification and introduced RTE_EVENT_DEV_CAP_FLAG_DISTRIBUTED_SCHED
flag(Gage)
21)s/RTE_EVENT_QUEUE_CFG_FLAG_SINGLE_CONSUMER/RTE_EVENT_QUEUE_CFG_FLAG_SINGLE_LINK (Jerin)
to remove the confusion to between another producer and consumer in sw eventdev driver
22) Northbound api implementation  patch spited to more logical patches(Thomas)

Changes since RFC v2:

- Updated the documentation to define the need for this library[Jerin]
- Added RTE_EVENT_QUEUE_CFG_*_ONLY configuration parameters in
  struct rte_event_queue_conf to enable optimized sw implementation [Bruce]
- Introduced RTE_EVENT_OP* ops [Bruce]
- Added nb_event_queue_flows,nb_event_port_dequeue_depth, nb_event_port_enqueue_depth
  in rte_event_dev_configure() like ethdev and crypto library[Jerin]
- Removed rte_event_release() and replaced with RTE_EVENT_OP_RELEASE ops to
  reduce fast path APIs and it is redundant too[Jerin]
- In the view of better application portability, Removed pin_event
  from rte_event_enqueue as it is just hint and Intel/NXP can not support it[Jerin]
- Added rte_event_port_links_get()[Jerin]
- Added rte_event_dev_dump[Harry]

Notes:

- This patch set is check-patch clean with an exception that
03/06 has one WARNING:MACRO_WITH_FLOW_CONTROL
- Looking forward to getting additional maintainers for libeventdev

TODO:
1) Create user guide

Jerin Jacob (6):
  eventdev: introduce event driven programming model
  eventdev: define southbound driver interface
  eventdev: implement the northbound APIs
  eventdev: implement PMD registration functions
  event/skeleton: add skeleton eventdev driver
  app/test: unit test case for eventdev APIs

 MAINTAINERS                                        |    5 +
 app/test/Makefile                                  |    2 +
 app/test/test_eventdev.c                           |  778 +++++++++++
 config/common_base                                 |   14 +
 doc/api/doxy-api-index.md                          |    1 +
 doc/api/doxy-api.conf                              |    1 +
 drivers/Makefile                                   |    1 +
 drivers/event/Makefile                             |   36 +
 drivers/event/skeleton/Makefile                    |   55 +
 .../skeleton/rte_pmd_skeleton_event_version.map    |    4 +
 drivers/event/skeleton/skeleton_eventdev.c         |  519 ++++++++
 drivers/event/skeleton/skeleton_eventdev.h         |   68 +
 lib/Makefile                                       |    1 +
 lib/librte_eal/common/include/rte_log.h            |    1 +
 lib/librte_eventdev/Makefile                       |   57 +
 lib/librte_eventdev/rte_eventdev.c                 | 1222 +++++++++++++++++
 lib/librte_eventdev/rte_eventdev.h                 | 1407 ++++++++++++++++++++
 lib/librte_eventdev/rte_eventdev_pmd.h             |  514 +++++++
 lib/librte_eventdev/rte_eventdev_version.map       |   39 +
 mk/rte.app.mk                                      |    5 +
 20 files changed, 4730 insertions(+)
 create mode 100644 app/test/test_eventdev.c
 create mode 100644 drivers/event/Makefile
 create mode 100644 drivers/event/skeleton/Makefile
 create mode 100644 drivers/event/skeleton/rte_pmd_skeleton_event_version.map
 create mode 100644 drivers/event/skeleton/skeleton_eventdev.c
 create mode 100644 drivers/event/skeleton/skeleton_eventdev.h
 create mode 100644 lib/librte_eventdev/Makefile
 create mode 100644 lib/librte_eventdev/rte_eventdev.c
 create mode 100644 lib/librte_eventdev/rte_eventdev.h
 create mode 100644 lib/librte_eventdev/rte_eventdev_pmd.h
 create mode 100644 lib/librte_eventdev/rte_eventdev_version.map
  

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index d6bb8f8..e430ca7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -249,6 +249,9 @@  F: lib/librte_cryptodev/
 F: app/test/test_cryptodev*
 F: examples/l2fwd-crypto/
 
+Eventdev API - EXPERIMENTAL
+M: Jerin Jacob <jerin.jacob@caviumnetworks.com>
+F: lib/librte_eventdev/
 
 Networking Drivers
 ------------------
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 6675f96..28c1329 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -40,6 +40,7 @@  There are many libraries, so their headers may be grouped by topics:
   [ethdev]             (@ref rte_ethdev.h),
   [ethctrl]            (@ref rte_eth_ctrl.h),
   [cryptodev]          (@ref rte_cryptodev.h),
+  [eventdev]           (@ref rte_eventdev.h),
   [devargs]            (@ref rte_devargs.h),
   [bond]               (@ref rte_eth_bond.h),
   [vhost]              (@ref rte_virtio_net.h),
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 9dc7ae5..9841477 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -41,6 +41,7 @@  INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_cryptodev \
                           lib/librte_distributor \
                           lib/librte_ether \
+                          lib/librte_eventdev \
                           lib/librte_hash \
                           lib/librte_ip_frag \
                           lib/librte_jobstats \
diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
new file mode 100644
index 0000000..778d6dc
--- /dev/null
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -0,0 +1,1439 @@ 
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 Cavium.
+ *   Copyright 2016 Intel Corporation.
+ *   Copyright 2016 NXP.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Cavium nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_EVENTDEV_H_
+#define _RTE_EVENTDEV_H_
+
+/**
+ * @file
+ *
+ * RTE Event Device API
+ *
+ * In a polling model, lcores poll ethdev ports and associated rx queues
+ * directly to look for packet. In an event driven model, by contrast, lcores
+ * call the scheduler that selects packets for them based on programmer
+ * specified criteria. Eventdev library adds support for event driven
+ * programming model, which offer applications automatic multicore scaling,
+ * dynamic load balancing, pipelining, packet ingress order maintenance and
+ * synchronization services to simplify application packet processing.
+ *
+ * The Event Device API is composed of two parts:
+ *
+ * - The application-oriented Event API that includes functions to setup
+ *   an event device (configure it, setup its queues, ports and start it), to
+ *   establish the link between queues to port and to receive events, and so on.
+ *
+ * - The driver-oriented Event API that exports a function allowing
+ *   an event poll Mode Driver (PMD) to simultaneously register itself as
+ *   an event device driver.
+ *
+ * Event device components:
+ *
+ *                     +-----------------+
+ *                     | +-------------+ |
+ *        +-------+    | |    flow 0   | |
+ *        |Packet |    | +-------------+ |
+ *        |event  |    | +-------------+ |
+ *        |       |    | |    flow 1   | |port_link(port0, queue0)
+ *        +-------+    | +-------------+ |     |     +--------+
+ *        +-------+    | +-------------+ o-----v-----o        |dequeue +------+
+ *        |Crypto |    | |    flow n   | |           | event  +------->|Core 0|
+ *        |work   |    | +-------------+ o----+      | port 0 |        |      |
+ *        |done ev|    |  event queue 0  |    |      +--------+        +------+
+ *        +-------+    +-----------------+    |
+ *        +-------+                           |
+ *        |Timer  |    +-----------------+    |      +--------+
+ *        |expiry |    | +-------------+ |    +------o        |dequeue +------+
+ *        |event  |    | |    flow 0   | o-----------o event  +------->|Core 1|
+ *        +-------+    | +-------------+ |      +----o port 1 |        |      |
+ *       Event enqueue | +-------------+ |      |    +--------+        +------+
+ *     o-------------> | |    flow 1   | |      |
+ *        enqueue(     | +-------------+ |      |
+ *        queue_id,    |                 |      |    +--------+        +------+
+ *        flow_id,     | +-------------+ |      |    |        |dequeue |Core 2|
+ *        sched_type,  | |    flow n   | o-----------o event  +------->|      |
+ *        event_type,  | +-------------+ |      |    | port 2 |        +------+
+ *        subev_type,  |  event queue 1  |      |    +--------+
+ *        event)       +-----------------+      |    +--------+
+ *                                              |    |        |dequeue +------+
+ *        +-------+    +-----------------+      |    | event  +------->|Core n|
+ *        |Core   |    | +-------------+ o-----------o port n |        |      |
+ *        |(SW)   |    | |    flow 0   | |      |    +--------+        +--+---+
+ *        |event  |    | +-------------+ |      |                         |
+ *        +-------+    | +-------------+ |      |                         |
+ *            ^        | |    flow 1   | |      |                         |
+ *            |        | +-------------+ o------+                         |
+ *            |        | +-------------+ |                                |
+ *            |        | |    flow n   | |                                |
+ *            |        | +-------------+ |                                |
+ *            |        |  event queue n  |                                |
+ *            |        +-----------------+                                |
+ *            |                                                           |
+ *            +-----------------------------------------------------------+
+ *
+ *
+ *
+ * Event device: A hardware or software-based event scheduler.
+ *
+ * Event: A unit of scheduling that encapsulates a packet or other datatype
+ * like SW generated event from the core, Crypto work completion notification,
+ * Timer expiry event notification etc as well as metadata.
+ * The metadata includes flow ID, scheduling type, event priority, event_type,
+ * sub_event_type etc.
+ *
+ * Event queue: A queue containing events that are scheduled by the event dev.
+ * An event queue contains events of different flows associated with scheduling
+ * types, such as atomic, ordered, or parallel.
+ *
+ * Event port: An application's interface into the event dev for enqueue and
+ * dequeue operations. Each event port can be linked with one or more
+ * event queues for dequeue operations.
+ *
+ * By default, all the functions of the Event Device API exported by a PMD
+ * are lock-free functions which assume to not be invoked in parallel on
+ * different logical cores to work on the same target object. For instance,
+ * the dequeue function of a PMD cannot be invoked in parallel on two logical
+ * cores to operates on same  event port. Of course, this function
+ * can be invoked in parallel by different logical cores on different ports.
+ * It is the responsibility of the upper level application to enforce this rule.
+ *
+ * In all functions of the Event API, the Event device is
+ * designated by an integer >= 0 named the device identifier *dev_id*
+ *
+ * At the Event driver level, Event devices are represented by a generic
+ * data structure of type *rte_event_dev*.
+ *
+ * Event devices are dynamically registered during the PCI/SoC device probing
+ * phase performed at EAL initialization time.
+ * When an Event device is being probed, a *rte_event_dev* structure and
+ * a new device identifier are allocated for that device. Then, the
+ * event_dev_init() function supplied by the Event driver matching the probed
+ * device is invoked to properly initialize the device.
+ *
+ * The role of the device init function consists of resetting the hardware or
+ * software event driver implementations.
+ *
+ * If the device init operation is successful, the correspondence between
+ * the device identifier assigned to the new device and its associated
+ * *rte_event_dev* structure is effectively registered.
+ * Otherwise, both the *rte_event_dev* structure and the device identifier are
+ * freed.
+ *
+ * The functions exported by the application Event API to setup a device
+ * designated by its device identifier must be invoked in the following order:
+ *     - rte_event_dev_configure()
+ *     - rte_event_queue_setup()
+ *     - rte_event_port_setup()
+ *     - rte_event_port_link()
+ *     - rte_event_dev_start()
+ *
+ * Then, the application can invoke, in any order, the functions
+ * exported by the Event API to schedule events, dequeue events, enqueue events,
+ * change event queue(s) to event port [un]link establishment and so on.
+ *
+ * Application may use rte_event_[queue/port]_default_conf_get() to get the
+ * default configuration to set up an event queue or event port by
+ * overriding few default values.
+ *
+ * If the application wants to change the configuration (i.e. call
+ * rte_event_dev_configure(), rte_event_queue_setup(), or
+ * rte_event_port_setup()), it must call rte_event_dev_stop() first to stop the
+ * device and then do the reconfiguration before calling rte_event_dev_start()
+ * again. The schedule, enqueue and dequeue functions should not be invoked
+ * when the device is stopped.
+ *
+ * Finally, an application can close an Event device by invoking the
+ * rte_event_dev_close() function.
+ *
+ * Each function of the application Event API invokes a specific function
+ * of the PMD that controls the target device designated by its device
+ * identifier.
+ *
+ * For this purpose, all device-specific functions of an Event driver are
+ * supplied through a set of pointers contained in a generic structure of type
+ * *event_dev_ops*.
+ * The address of the *event_dev_ops* structure is stored in the *rte_event_dev*
+ * structure by the device init function of the Event driver, which is
+ * invoked during the PCI/SoC device probing phase, as explained earlier.
+ *
+ * In other words, each function of the Event API simply retrieves the
+ * *rte_event_dev* structure associated with the device identifier and
+ * performs an indirect invocation of the corresponding driver function
+ * supplied in the *event_dev_ops* structure of the *rte_event_dev* structure.
+ *
+ * For performance reasons, the address of the fast-path functions of the
+ * Event driver is not contained in the *event_dev_ops* structure.
+ * Instead, they are directly stored at the beginning of the *rte_event_dev*
+ * structure to avoid an extra indirect memory access during their invocation.
+ *
+ * RTE event device drivers do not use interrupts for enqueue or dequeue
+ * operation. Instead, Event drivers export Poll-Mode enqueue and dequeue
+ * functions to applications.
+ *
+ * An event driven based application has following typical workflow on fastpath:
+ * \code{.c}
+ *	while (1) {
+ *
+ *		rte_event_schedule(dev_id);
+ *
+ *		rte_event_dequeue(...);
+ *
+ *		(event processing)
+ *
+ *		rte_event_enqueue(...);
+ *	}
+ * \endcode
+ *
+ * The *schedule* operation is intended to do event scheduling, and the
+ * *dequeue* operation returns the scheduled events. An implementation
+ * is free to define the semantics between *schedule* and *dequeue*. For
+ * example, a system based on a hardware scheduler can define its
+ * rte_event_schedule() to be an NOOP, whereas a software scheduler can use
+ * the *schedule* operation to schedule events.
+ *
+ * The events are injected to event device through *enqueue* operation by
+ * event producers in the system. The typical event producers are ethdev
+ * subsystem for generating packet events, core(SW) for generating events based
+ * on different stages of application processing, cryptodev for generating
+ * crypto work completion notification etc
+ *
+ * The *dequeue* operation gets one or more events from the event ports.
+ * The application process the events and send to downstream event queue through
+ * rte_event_enqueue() if it is an intermediate stage of event processing, on
+ * the final stage, the application may send to different subsystem like ethdev
+ * to send the packet/event on the wire using ethdev rte_eth_tx_burst() API.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdbool.h>
+
+#include <rte_pci.h>
+#include <rte_dev.h>
+#include <rte_memory.h>
+#include <rte_errno.h>
+
+#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
+/**< Skeleton event device PMD name */
+
+/**
+ * Get the total number of event devices that have been successfully
+ * initialised.
+ *
+ * @return
+ *   The total number of usable event devices.
+ */
+uint8_t
+rte_event_dev_count(void);
+
+/**
+ * Get the device identifier for the named event device.
+ *
+ * @param name
+ *   Event device name to select the event device identifier.
+ *
+ * @return
+ *   Returns event device identifier on success.
+ *   - <0: Failure to find named event device.
+ */
+int
+rte_event_dev_get_dev_id(const char *name);
+
+/**
+ * Return the NUMA socket to which a device is connected.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @return
+ *   The NUMA socket id to which the device is connected or
+ *   a default of zero if the socket could not be determined.
+ *   -(-EINVAL)  dev_id value is out of range.
+ */
+int
+rte_event_dev_socket_id(uint8_t dev_id);
+
+/* Event device capability bitmap flags */
+#define RTE_EVENT_DEV_CAP_QUEUE_QOS        (1ULL << 0)
+/**< Event scheduling prioritization is based on the priority associated with
+ *  each event queue.
+ *
+ *  \see rte_event_queue_setup(), RTE_EVENT_QUEUE_PRIORITY_NORMAL
+ */
+#define RTE_EVENT_DEV_CAP_EVENT_QOS        (1ULL << 1)
+/**< Event scheduling prioritization is based on the priority associated with
+ *  each event. Priority of each event is supplied in *rte_event* structure
+ *  on each enqueue operation.
+ *
+ *  \see rte_event_enqueue()
+ */
+
+/**
+ * Event device information
+ */
+struct rte_event_dev_info {
+	const char *driver_name;	/**< Event driver name */
+	struct rte_pci_device *pci_dev;	/**< PCI information */
+	uint32_t min_dequeue_wait_ns;
+	/**< Minimum supported global dequeue wait delay(ns) by this device */
+	uint32_t max_dequeue_wait_ns;
+	/**< Maximum supported global dequeue wait delay(ns) by this device */
+	uint32_t dequeue_wait_ns;
+	/**< Configured global dequeue wait delay(ns) for this device */
+	uint8_t max_event_queues;
+	/**< Maximum event_queues supported by this device */
+	uint32_t max_event_queue_flows;
+	/**< Maximum supported flows in an event queue by this device*/
+	uint8_t max_event_queue_priority_levels;
+	/**< Maximum number of event queue priority levels by this device.
+	 * Valid when the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability
+	 */
+	uint8_t max_event_priority_levels;
+	/**< Maximum number of event priority levels by this device.
+	 * Valid when the device has RTE_EVENT_DEV_CAP_EVENT_QOS capability
+	 */
+	uint8_t max_event_ports;
+	/**< Maximum number of event ports supported by this device */
+	uint8_t max_event_port_dequeue_depth;
+	/**< Maximum dequeue queue depth for any event port.
+	 * Implementations can schedule N events at a time to an event port.
+	 * A device that does not support bulk dequeue will set this as 1.
+	 */
+	uint32_t max_event_port_enqueue_depth;
+	/**< Maximum enqueue queue depth for any event port. Implementations
+	 * can batch N events at a time to enqueue through event port.
+	 */
+	int32_t max_num_events;
+	/**< A *closed system* event dev has a limit on the number of events it
+	 * can manage at a time. An *open system* event dev does not have a
+	 * limit and will specify this as -1.
+	 */
+	uint32_t event_dev_cap;
+	/**< Event device capabilities(RTE_EVENT_DEV_CAP_)*/
+};
+
+/**
+ * Retrieve the contextual information of an event device.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @param[out] dev_info
+ *   A pointer to a structure of type *rte_event_dev_info* to be filled with the
+ *   contextual information of the device.
+ *
+ * @return
+ *   - 0: Success, driver updates the contextual information of the event device
+ *   - <0: Error code returned by the driver info get function.
+ *
+ */
+int
+rte_event_dev_info_get(uint8_t dev_id, struct rte_event_dev_info *dev_info);
+
+/* Event device configuration bitmap flags */
+#define RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT (1ULL << 0)
+/**< Override the global *dequeue_wait_ns* and use per dequeue wait in ns.
+ *  \see rte_event_dequeue_wait_time(), rte_event_dequeue()
+ */
+
+/** Event device configuration structure */
+struct rte_event_dev_config {
+	uint32_t dequeue_wait_ns;
+	/**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this device.
+	 * This value should be in the range of *min_dequeue_wait_ns* and
+	 * *max_dequeue_wait_ns* which previously provided in
+	 * rte_event_dev_info_get()
+	 * \see RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
+	 */
+	int32_t nb_events_limit;
+	/**< Applies to *closed system* event dev only. This field indicates a
+	 * limit to ethdev-like devices to limit the number of events injected
+	 * into the system to not overwhelm core-to-core events.
+	 * This value cannot exceed the *max_num_events* which previously
+	 * provided in rte_event_dev_info_get()
+	 */
+	uint8_t nb_event_queues;
+	/**< Number of event queues to configure on this device.
+	 * This value cannot exceed the *max_event_queues* which previously
+	 * provided in rte_event_dev_info_get()
+	 */
+	uint8_t nb_event_ports;
+	/**< Number of event ports to configure on this device.
+	 * This value cannot exceed the *max_event_ports* which previously
+	 * provided in rte_event_dev_info_get()
+	 */
+	uint32_t nb_event_queue_flows;
+	/**< Number of flows for any event queue on this device.
+	 * This value cannot exceed the *max_event_queue_flows* which previously
+	 * provided in rte_event_dev_info_get()
+	 */
+	uint8_t nb_event_port_dequeue_depth;
+	/**< Number of dequeue queue depth for any event port on this device.
+	 * This value cannot exceed the *max_event_port_dequeue_queue_depth*
+	 * which previously provided in rte_event_dev_info_get()
+	 * \see rte_event_port_setup()
+	 */
+	uint32_t nb_event_port_enqueue_depth;
+	/**< Number of enqueue queue depth for any event port on this device.
+	 * This value cannot exceed the *max_event_port_enqueue_queue_depth*
+	 * which previously provided in rte_event_dev_info_get()
+	 * \see rte_event_port_setup()
+	 */
+	uint32_t event_dev_cfg;
+	/**< Event device config flags(RTE_EVENT_DEV_CFG_)*/
+};
+
+/**
+ * Configure an event device.
+ *
+ * This function must be invoked first before any other function in the
+ * API. This function can also be re-invoked when a device is in the
+ * stopped state.
+ *
+ * The caller may use rte_event_dev_info_get() to get the capability of each
+ * resources available for this event device.
+ *
+ * @param dev_id
+ *   The identifier of the device to configure.
+ * @param dev_conf
+ *   The event device configuration structure.
+ *
+ * @return
+ *   - 0: Success, device configured.
+ *   - <0: Error code returned by the driver configuration function.
+ */
+int
+rte_event_dev_configure(uint8_t dev_id, struct rte_event_dev_config *dev_conf);
+
+
+/* Event queue specific APIs */
+
+#define RTE_EVENT_QUEUE_PRIORITY_HIGHEST   0
+/**< Highest event queue priority */
+#define RTE_EVENT_QUEUE_PRIORITY_NORMAL    128
+/**< Normal event queue priority */
+#define RTE_EVENT_QUEUE_PRIORITY_LOWEST    255
+/**< Lowest event queue priority */
+
+/* Event queue configuration bitmap flags */
+#define RTE_EVENT_QUEUE_CFG_DEFAULT            (0)
+/**< Default value of *event_queue_cfg* when rte_event_queue_setup() invoked
+ * with queue_conf == NULL
+ *
+ * \see rte_event_queue_setup()
+ */
+#define RTE_EVENT_QUEUE_CFG_TYPE_MASK          (3ULL << 0)
+/**< Mask for event queue schedule type configuration request */
+#define RTE_EVENT_QUEUE_CFG_ALL_TYPES          (0ULL << 0)
+/**< Allow ATOMIC,ORDERED,PARALLEL schedule type enqueue
+ *
+ * \see RTE_SCHED_TYPE_ORDERED, RTE_SCHED_TYPE_ATOMIC, RTE_SCHED_TYPE_PARALLEL
+ * \see rte_event_enqueue()
+ */
+#define RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY        (1ULL << 0)
+/**< Allow only ATOMIC schedule type enqueue
+ *
+ * The rte_event_enqueue() result is undefined if the queue configured with
+ * ATOMIC only and sched_type != RTE_SCHED_TYPE_ATOMIC
+ *
+ * \see RTE_SCHED_TYPE_ATOMIC, rte_event_enqueue()
+ */
+#define RTE_EVENT_QUEUE_CFG_ORDERED_ONLY       (2ULL << 0)
+/**< Allow only ORDERED schedule type enqueue
+ *
+ * The rte_event_enqueue() result is undefined if the queue configured with
+ * ORDERED only and sched_type != RTE_SCHED_TYPE_ORDERED
+ *
+ * \see RTE_SCHED_TYPE_ORDERED, rte_event_enqueue()
+ */
+#define RTE_EVENT_QUEUE_CFG_PARALLEL_ONLY      (3ULL << 0)
+/**< Allow only PARALLEL schedule type enqueue
+ *
+ * The rte_event_enqueue() result is undefined if the queue configured with
+ * PARALLEL only and sched_type != RTE_SCHED_TYPE_PARALLEL
+ *
+ * \see RTE_SCHED_TYPE_PARALLEL, rte_event_enqueue()
+ */
+#define RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER    (1ULL << 2)
+/**< This event queue links only to a single event port.
+ *
+ *  \see rte_event_port_setup(), rte_event_port_link()
+ */
+
+/** Event queue configuration structure */
+struct rte_event_queue_conf {
+	uint32_t nb_atomic_flows;
+	/**< The maximum number of active flows this queue can track at any
+	 * given time. The value must be in the range of
+	 * [1 - nb_event_queue_flows)] which previously provided in
+	 * rte_event_dev_info_get().
+	 */
+	uint32_t nb_atomic_order_sequences;
+	/**< The maximum number of outstanding events waiting to be
+	 * reordered by this queue. In other words, the number of entries in
+	 * this queue’s reorder buffer.When the number of events in the
+	 * reorder buffer reaches to *nb_atomic_order_sequences* then the
+	 * scheduler cannot schedule the events from this queue and invalid
+	 * event will be returned from dequeue until one or more entries are
+	 * freed up/released.
+	 * The value must be in the range of [1 - nb_event_queue_flows)]
+	 * which previously supplied to rte_event_dev_configure().
+	 */
+	uint32_t event_queue_cfg; /**< Queue config flags(EVENT_QUEUE_CFG_) */
+	uint8_t priority;
+	/**< Priority for this event queue relative to other event queues.
+	 * The requested priority should in the range of
+	 * [RTE_EVENT_QUEUE_PRIORITY_HIGHEST, RTE_EVENT_QUEUE_PRIORITY_LOWEST].
+	 * The implementation shall normalize the requested priority to
+	 * event device supported priority value.
+	 * Valid when the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability
+	 */
+};
+
+/**
+ * Retrieve the default configuration information of an event queue designated
+ * by its *queue_id* from the event driver for an event device.
+ *
+ * This function intended to be used in conjunction with rte_event_queue_setup()
+ * where caller needs to set up the queue by overriding few default values.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param queue_id
+ *   The index of the event queue to get the configuration information.
+ *   The value must be in the range [0, nb_event_queues - 1]
+ *   previously supplied to rte_event_dev_configure().
+ * @param[out] queue_conf
+ *   The pointer to the default event queue configuration data.
+ * @return
+ *   - 0: Success, driver updates the default event queue configuration data.
+ *   - <0: Error code returned by the driver info get function.
+ *
+ * \see rte_event_queue_setup()
+ *
+ */
+int
+rte_event_queue_default_conf_get(uint8_t dev_id, uint8_t queue_id,
+				 struct rte_event_queue_conf *queue_conf);
+
+/**
+ * Allocate and set up an event queue for an event device.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param queue_id
+ *   The index of the event queue to setup. The value must be in the range
+ *   [0, nb_event_queues - 1] previously supplied to rte_event_dev_configure().
+ * @param queue_conf
+ *   The pointer to the configuration data to be used for the event queue.
+ *   NULL value is allowed, in which case default configuration	used.
+ *
+ * \see rte_event_queue_default_conf_get()
+ *
+ * @return
+ *   - 0: Success, event queue correctly set up.
+ *   - <0: event queue configuration failed
+ */
+int
+rte_event_queue_setup(uint8_t dev_id, uint8_t queue_id,
+		      struct rte_event_queue_conf *queue_conf);
+
+/**
+ * Get the number of event queues on a specific event device
+ *
+ * @param dev_id
+ *   Event device identifier.
+ * @return
+ *   - The number of configured event queues
+ */
+uint8_t
+rte_event_queue_count(uint8_t dev_id);
+
+/**
+ * Get the priority of the event queue on a specific event device
+ *
+ * @param dev_id
+ *   Event device identifier.
+ * @param queue_id
+ *   Event queue identifier.
+ * @return
+ *   - If the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability then the
+ *    configured priority of the event queue in
+ *    [RTE_EVENT_QUEUE_PRIORITY_HIGHEST, RTE_EVENT_QUEUE_PRIORITY_LOWEST] range
+ *    else the value RTE_EVENT_QUEUE_PRIORITY_NORMAL
+ */
+uint8_t
+rte_event_queue_priority(uint8_t dev_id, uint8_t queue_id);
+
+/* Event port specific APIs */
+
+/** Event port configuration structure */
+struct rte_event_port_conf {
+	int32_t new_event_threshold;
+	/**< A backpressure threshold for new event enqueues on this port.
+	 * Use for *closed system* event dev where event capacity is limited,
+	 * and cannot exceed the capacity of the event dev.
+	 * Configuring ports with different thresholds can make higher priority
+	 * traffic less likely to  be backpressured.
+	 * For example, a port used to inject NIC Rx packets into the event dev
+	 * can have a lower threshold so as not to overwhelm the device,
+	 * while ports used for worker pools can have a higher threshold.
+	 * This value cannot exceed the *nb_events_limit*
+	 * which previously supplied to rte_event_dev_configure()
+	 */
+	uint8_t dequeue_depth;
+	/**< Configure number of bulk dequeues for this event port.
+	 * This value cannot exceed the *nb_event_port_dequeue_depth*
+	 * which previously supplied to rte_event_dev_configure()
+	 */
+	uint8_t enqueue_depth;
+	/**< Configure number of bulk enqueues for this event port.
+	 * This value cannot exceed the *nb_event_port_enqueue_depth*
+	 * which previously supplied to rte_event_dev_configure()
+	 */
+};
+
+/**
+ * Retrieve the default configuration information of an event port designated
+ * by its *port_id* from the event driver for an event device.
+ *
+ * This function intended to be used in conjunction with rte_event_port_setup()
+ * where caller needs to set up the port by overriding few default values.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The index of the event port to get the configuration information.
+ *   The value must be in the range [0, nb_event_ports - 1]
+ *   previously supplied to rte_event_dev_configure().
+ * @param[out] port_conf
+ *   The pointer to the default event port configuration data
+ * @return
+ *   - 0: Success, driver updates the default event port configuration data.
+ *   - <0: Error code returned by the driver info get function.
+ *
+ * \see rte_event_port_setup()
+ *
+ */
+int
+rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
+				struct rte_event_port_conf *port_conf);
+
+/**
+ * Allocate and set up an event port for an event device.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The index of the event port to setup. The value must be in the range
+ *   [0, nb_event_ports - 1] previously supplied to rte_event_dev_configure().
+ * @param port_conf
+ *   The pointer to the configuration data to be used for the queue.
+ *   NULL value is allowed, in which case default configuration	used.
+ *
+ * \see rte_event_port_default_conf_get()
+ *
+ * @return
+ *   - 0: Success, event port correctly set up.
+ *   - <0: Port configuration failed
+ *   - (-EDQUOT) Quota exceeded(Application tried to link the queue configured
+ *   with RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER to more than one event ports)
+ */
+int
+rte_event_port_setup(uint8_t dev_id, uint8_t port_id,
+		     struct rte_event_port_conf *port_conf);
+
+/**
+ * Get the number of dequeue queue depth configured for event port designated
+ * by its *port_id* on a specific event device
+ *
+ * @param dev_id
+ *   Event device identifier.
+ * @param port_id
+ *   Event port identifier.
+ * @return
+ *   - The number of configured dequeue queue depth
+ *
+ * \see rte_event_dequeue_burst()
+ */
+uint8_t
+rte_event_port_dequeue_depth(uint8_t dev_id, uint8_t port_id);
+
+/**
+ * Get the number of enqueue queue depth configured for event port designated
+ * by its *port_id* on a specific event device
+ *
+ * @param dev_id
+ *   Event device identifier.
+ * @param port_id
+ *   Event port identifier.
+ * @return
+ *   - The number of configured enqueue queue depth
+ *
+ * \see rte_event_enqueue_burst()
+ */
+uint8_t
+rte_event_port_enqueue_depth(uint8_t dev_id, uint8_t port_id);
+
+/**
+ * Get the number of ports on a specific event device
+ *
+ * @param dev_id
+ *   Event device identifier.
+ * @return
+ *   - The number of configured ports
+ */
+uint8_t
+rte_event_port_count(uint8_t dev_id);
+
+/**
+ * Start an event device.
+ *
+ * The device start step is the last one and consists of setting the event
+ * queues to start accepting the events and schedules to event ports.
+ *
+ * On success, all basic functions exported by the API (event enqueue,
+ * event dequeue and so on) can be invoked.
+ *
+ * @param dev_id
+ *   Event device identifier
+ * @return
+ *   - 0: Success, device started.
+ *   - <0: Error code of the driver device start function.
+ */
+int
+rte_event_dev_start(uint8_t dev_id);
+
+/**
+ * Stop an event device. The device can be restarted with a call to
+ * rte_event_dev_start()
+ *
+ * @param dev_id
+ *   Event device identifier.
+ */
+void
+rte_event_dev_stop(uint8_t dev_id);
+
+/**
+ * Close an event device. The device cannot be restarted!
+ *
+ * @param dev_id
+ *   Event device identifier
+ *
+ * @return
+ *  - 0 on successfully closing device
+ *  - <0 on failure to close device
+ *  - (-EAGAIN) if device is busy
+ */
+int
+rte_event_dev_close(uint8_t dev_id);
+
+/* Scheduler type definitions */
+#define RTE_SCHED_TYPE_ORDERED          0
+/**< Ordered scheduling
+ *
+ * Events from an ordered flow of an event queue can be scheduled to multiple
+ * ports for concurrent processing while maintaining the original event order.
+ * This scheme enables the user to achieve high single flow throughput by
+ * avoiding SW synchronization for ordering between ports which bound to cores.
+ *
+ * The source flow ordering from an event queue is maintained when events are
+ * enqueued to their destination queue within the same ordered flow context.
+ * An event port holds the context until application call rte_event_dequeue()
+ * from the same port, which implicitly releases the context.
+ * User may allow the scheduler to release the context earlier than that
+ * by invoking rte_event_enqueue() with RTE_EVENT_OP_RELEASE operation.
+ *
+ * Events from the source queue appear in their original order when dequeued
+ * from a destination queue.
+ * Event ordering is based on the received event(s), but also other
+ * (newly allocated or stored) events are ordered when enqueued within the same
+ * ordered context. Events not enqueued (e.g. released or stored) within the
+ * context are  considered missing from reordering and are skipped at this time
+ * (but can be ordered again within another context).
+ *
+ * \see rte_event_queue_setup(), rte_event_dequeue(), RTE_EVENT_OP_RELEASE
+ */
+
+#define RTE_SCHED_TYPE_ATOMIC           1
+/**< Atomic scheduling
+ *
+ * Events from an atomic flow of an event queue can be scheduled only to a
+ * single port at a time. The port is guaranteed to have exclusive (atomic)
+ * access to the associated flow context, which enables the user to avoid SW
+ * synchronization. Atomic flows also help to maintain event ordering
+ * since only one port at a time can process events from a flow of an
+ * event queue.
+ *
+ * The atomic queue synchronization context is dedicated to the port until
+ * application call rte_event_dequeue() from the same port, which implicitly
+ * releases the context. User may allow the scheduler to release the context
+ * earlier than that by invoking rte_event_enqueue() with
+ * RTE_EVENT_OP_RELEASE operation.
+ *
+ * \see rte_event_queue_setup(), rte_event_dequeue(), RTE_EVENT_OP_RELEASE
+ */
+
+#define RTE_SCHED_TYPE_PARALLEL         2
+/**< Parallel scheduling
+ *
+ * The scheduler performs priority scheduling, load balancing, etc. functions
+ * but does not provide additional event synchronization or ordering.
+ * It is free to schedule events from a single parallel flow of an event queue
+ * to multiple events ports for concurrent processing.
+ * The application is responsible for flow context synchronization and
+ * event ordering (SW synchronization).
+ *
+ * \see rte_event_queue_setup(), rte_event_dequeue()
+ */
+
+/* Event types to classify the event source */
+#define RTE_EVENT_TYPE_ETHDEV           0x0
+/**< The event generated from ethdev subsystem */
+#define RTE_EVENT_TYPE_CRYPTODEV        0x1
+/**< The event generated from crypodev subsystem */
+#define RTE_EVENT_TYPE_TIMERDEV         0x2
+/**< The event generated from timerdev subsystem */
+#define RTE_EVENT_TYPE_CORE             0x3
+/**< The event generated from core.
+ * Application may use *sub_event_type* to further classify the event
+ */
+#define RTE_EVENT_TYPE_MAX              0x10
+/**< Maximum number of event types */
+
+/* Event priority */
+#define RTE_EVENT_PRIORITY_HIGHEST      0
+/**< Highest event priority */
+#define RTE_EVENT_PRIORITY_NORMAL       128
+/**< Normal event priority */
+#define RTE_EVENT_PRIORITY_LOWEST       255
+/**< Lowest event priority */
+
+/* Event enqueue operations */
+#define RTE_EVENT_OP_NEW                0
+/**< New event without previous context */
+#define RTE_EVENT_OP_FORWARD            1
+/**< Re-enqueue previously dequeued event */
+#define RTE_EVENT_OP_RELEASE            2
+/**
+ * Release the flow context associated with the schedule type.
+ *
+ * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
+ * then this function hints the scheduler that the user has completed critical
+ * section processing in the current atomic context.
+ * The scheduler is now allowed to schedule events from the same flow from
+ * an event queue to another port. However, the context may be still held
+ * until the next rte_event_dequeue() or rte_event_dequeue_burst() call, this
+ * call allows but does not force the scheduler to release the context early.
+ *
+ * Early atomic context release may increase parallelism and thus system
+ * performance, but the user needs to design carefully the split into critical
+ * vs non-critical sections.
+ *
+ * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
+ * then this function hints the scheduler that the user has done all that need
+ * to maintain event order in the current ordered context.
+ * The scheduler is allowed to release the ordered context of this port and
+ * avoid reordering any following enqueues.
+ *
+ * Early ordered context release may increase parallelism and thus system
+ * performance.
+ *
+ * If current flow's scheduler type method is *RTE_SCHED_TYPE_PARALLEL*
+ * or no scheduling context is held then this function may be an NOOP,
+ * depending on the implementation.
+ *
+ */
+
+/**
+ * The generic *rte_event* structure to hold the event attributes
+ * for dequeue and enqueue operation
+ */
+struct rte_event {
+	/** WORD0 */
+	RTE_STD_C11
+	union {
+		uint64_t event;
+		/** Event attributes for dequeue or enqueue operation */
+		struct {
+			uint32_t flow_id:24;
+			/**< Targeted flow identifier for the enqueue and
+			 * dequeue operation.
+			 * The value must be in the range of
+			 * [0, nb_event_queue_flows - 1] which
+			 * previously supplied to rte_event_dev_configure().
+			 */
+			uint32_t operation:6;
+			/**< The type of event being enqueued - new/forward/etc
+			 *  This field is not preserved across an instance and
+			 *  is undefined on dequeue.
+			 */
+			uint8_t queue_id:8;
+			/**< Targeted event queue identifier for the enqueue or
+			 * dequeue operation.
+			 * The value must be in the range of
+			 * [0, nb_event_queues - 1] which previously supplied to
+			 * rte_event_dev_configure().
+			 */
+			uint8_t  sched_type;
+			/**< Scheduler synchronization type (RTE_SCHED_TYPE_)
+			 * associated with flow id on a given event queue
+			 * for the enqueue and dequeue operation.
+			 */
+			uint8_t  event_type;
+			/**< Event type to classify the event source. */
+			uint8_t  sub_event_type;
+			/**< Sub-event types based on the event source.
+			 * \see RTE_EVENT_TYPE_CORE
+			 */
+			uint8_t  priority;
+			/**< Event priority relative to other events in the
+			 * event queue. The requested priority should in the
+			 * range of  [RTE_EVENT_PRIORITY_HIGHEST,
+			 * RTE_EVENT_PRIORITY_LOWEST].
+			 * The implementation shall normalize the requested
+			 * priority to supported priority value.
+			 * Valid when the device has RTE_EVENT_DEV_CAP_EVENT_QOS
+			 * capability.
+			 */
+		};
+	};
+	/** WORD1 */
+	RTE_STD_C11
+	union {
+		uintptr_t event_ptr;
+		/**< Opaque event pointer */
+		struct rte_mbuf *mbuf;
+		/**< mbuf pointer if dequeued event is associated with mbuf */
+	};
+};
+
+typedef int (*event_schedule_t)(void);
+/**< @internal Schedule one or more events in the event dev. */
+
+typedef int (*event_enqueue_t)(void *port, struct rte_event *ev);
+/**< @internal Enqueue event on port of a device */
+
+typedef uint16_t (*event_enqueue_burst_t)(void *port, struct rte_event ev[],
+		uint16_t nb_events);
+/**< @internal Enqueue burst of events on port of a device */
+
+typedef bool (*event_dequeue_t)(void *port, struct rte_event *ev,
+		uint64_t wait);
+/**< @internal Dequeue event from port of a device */
+
+typedef uint16_t (*event_dequeue_burst_t)(void *port, struct rte_event ev[],
+		uint16_t nb_events, uint64_t wait);
+/**< @internal Dequeue burst of events from port of a device */
+
+struct rte_eventdev_driver;
+struct rte_eventdev_ops;
+
+#define RTE_EVENTDEV_NAME_MAX_LEN	(64)
+/**< @internal Max length of name of event PMD */
+
+/**
+ * @internal
+ * The data part, with no function pointers, associated with each device.
+ *
+ * This structure is safe to place in shared memory to be common among
+ * different processes in a multi-process configuration.
+ */
+struct rte_eventdev_data {
+	int socket_id;
+	/**< Socket ID where memory is allocated */
+	uint8_t dev_id;
+	/**< Device ID for this instance */
+	uint8_t nb_queues;
+	/**< Number of event queues. */
+	uint8_t nb_ports;
+	/**< Number of event ports. */
+	void **ports;
+	/**< Array of pointers to ports. */
+	uint8_t *ports_dequeue_depth;
+	/**< Array of port dequeue depth. */
+	uint8_t *ports_enqueue_depth;
+	/**< Array of port enqueue depth. */
+	void **queues;
+	/**< Array of pointers to queues. */
+	uint8_t *queues_prio;
+	/**< Array of queue priority. */
+	uint16_t *links_map;
+	/**< Memory to store queues to port connections. */
+	void *dev_private;
+	/**< PMD-specific private data */
+	uint32_t event_dev_cap;
+	/**< Event device capabilities(RTE_EVENT_DEV_CAP_)*/
+	struct rte_event_dev_config dev_conf;
+	/**< Configuration applied to device. */
+
+	RTE_STD_C11
+	uint8_t dev_started : 1;
+	/**< Device state: STARTED(1)/STOPPED(0) */
+
+	char name[RTE_EVENTDEV_NAME_MAX_LEN];
+	/**< Unique identifier name */
+} __rte_cache_aligned;
+
+
+/** @internal The data structure associated with each event device. */
+struct rte_eventdev {
+	event_schedule_t schedule;
+	/**< Pointer to PMD schedule function. */
+	event_enqueue_t enqueue;
+	/**< Pointer to PMD enqueue function. */
+	event_enqueue_burst_t enqueue_burst;
+	/**< Pointer to PMD enqueue burst function. */
+	event_dequeue_t dequeue;
+	/**< Pointer to PMD dequeue function. */
+	event_dequeue_burst_t dequeue_burst;
+	/**< Pointer to PMD dequeue burst function. */
+
+	struct rte_eventdev_data *data;
+	/**< Pointer to device data */
+	const struct rte_eventdev_ops *dev_ops;
+	/**< Functions exported by PMD */
+	struct rte_pci_device *pci_dev;
+	/**< PCI info. supplied by probing */
+	const struct rte_eventdev_driver *driver;
+	/**< Driver for this device */
+
+	RTE_STD_C11
+	uint8_t attached : 1;
+	/**< Flag indicating the device is attached */
+} __rte_cache_aligned;
+
+extern struct rte_eventdev *rte_eventdevs;
+/** @internal The pool of rte_eventdev structures. */
+
+
+/**
+ * Schedule one or more events in the event dev.
+ *
+ * An event dev implementation may define this is a NOOP, for instance if
+ * the event dev performs its scheduling in hardware.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ */
+static inline void
+rte_event_schedule(uint8_t dev_id)
+{
+	struct rte_eventdev *dev = &rte_eventdevs[dev_id];
+	if (*dev->schedule)
+		(*dev->schedule)();
+}
+
+/**
+ * Enqueue the event object supplied in the *rte_event* structure on an
+ * event device designated by its *dev_id* through the event port specified by
+ * *port_id*. The event object specifies the event queue on which this
+ * event will be enqueued.
+ *
+ * @param dev_id
+ *   Event device identifier.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param ev
+ *   Pointer to struct rte_event
+ *
+ * @return
+ *  - 0 on success
+ *  - <0 on failure. Failure can occur if the event port's output queue is
+ *     backpressured, for instance.
+ */
+static inline int
+rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev)
+{
+	struct rte_eventdev *dev = &rte_eventdevs[dev_id];
+
+	return (*dev->enqueue)(
+			dev->data->ports[port_id], ev);
+}
+
+/**
+ * Enqueue a burst of events objects supplied in *rte_event* structure on an
+ * event device designated by its *dev_id* through the event port specified by
+ * *port_id*. Each event object specifies the event queue on which it
+ * will be enqueued.
+ *
+ * The rte_event_enqueue_burst() function is invoked to enqueue
+ * multiple event objects.
+ * It is the burst variant of rte_event_enqueue() function.
+ *
+ * The *nb_events* parameter is the number of event objects to enqueue which are
+ * supplied in the *ev* array of *rte_event* structure.
+ *
+ * The rte_event_enqueue_burst() function returns the number of
+ * events objects it actually enqueued. A return value equal to *nb_events*
+ * means that all event objects have been enqueued.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param ev
+ *   Points to an array of *nb_events* objects of type *rte_event* structure
+ *   which contain the event object enqueue operations to be processed.
+ * @param nb_events
+ *   The number of event objects to enqueue, typically number of
+ *   rte_event_port_enqueue_depth() available for this port.
+ *
+ * @return
+ *   The number of event objects actually enqueued on the event device. The
+ *   return value can be less than the value of the *nb_events* parameter when
+ *   the event devices queue is full or if invalid parameters are specified in a
+ *   *rte_event*. If return value is less than *nb_events*, the remaining events
+ *   at the end of ev[] are not consumed,and the caller has to take care of them
+ *
+ * \see rte_event_enqueue(), rte_event_port_enqueue_depth()
+ */
+static inline uint16_t
+rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
+			uint16_t nb_events)
+{
+	struct rte_eventdev *dev = &rte_eventdevs[dev_id];
+
+	return (*dev->enqueue_burst)(
+			dev->data->ports[port_id], ev, nb_events);
+}
+
+/**
+ * Converts nanoseconds to *wait* value for rte_event_dequeue()
+ *
+ * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then
+ * application can use this function to convert wait value in nanoseconds to
+ * implementations specific wait value supplied in rte_event_dequeue()
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param ns
+ *   Wait time in nanosecond
+ * @param[out] wait_ticks
+ *   Value for the *wait* parameter in rte_event_dequeue() function
+ *
+ * @return
+ *  - 0 on success.
+ *  - <0 on failure.
+ *
+ * \see rte_event_dequeue(), RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
+ * \see rte_event_dev_configure()
+ *
+ */
+extern int
+rte_event_dequeue_wait_time(uint8_t dev_id, uint64_t ns, uint64_t *wait_ticks);
+
+/**
+ * Dequeue an event from the event port specified by *port_id* on the
+ * event device designated by its *dev_id*.
+ *
+ * rte_event_dequeue() does not dictate the specifics of scheduling algorithm as
+ * each eventdev driver may have different criteria to schedule an event.
+ * However, in general, from an application perspective scheduler may use the
+ * following scheme to dispatch an event to the port.
+ *
+ * 1) Selection of event queue based on
+ *   a) The list of event queues are linked to the event port.
+ *   b) If the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability then event
+ *   queue selection from list is based on event queue priority relative to
+ *   other event queue supplied as *priority* in rte_event_queue_setup()
+ *   c) If the device has RTE_EVENT_DEV_CAP_EVENT_QOS capability then event
+ *   queue selection from the list is based on event priority supplied as
+ *   *priority* in rte_event_enqueue_burst()
+ * 2) Selection of event
+ *   a) The number of flows available in selected event queue.
+ *   b) Schedule type method associated with the event
+ *
+ * On a successful dequeue, the event port holds flow id and schedule type
+ * context associated with the dispatched event. The context is automatically
+ * released in the next rte_event_dequeue() invocation, or invoking
+ * rte_event_enqueue() with RTE_EVENT_OP_RELEASE operation can be used
+ * to release the context early.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param[out] ev
+ *   Pointer to struct rte_event. On successful event dispatch, implementation
+ *   updates the event attributes.
+ *
+ * @param wait
+ *   0 - no-wait, returns immediately if there is no event.
+ *   >0 - wait for the event, if the device is configured with
+ *   RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until
+ *   the event available or *wait* time.
+ *   if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
+ *   then this function will wait until the event available or *dequeue_wait_ns*
+ *   ns which was previously supplied to rte_event_dev_configure()
+ *
+ * @return
+ * When true, a valid event has been dispatched by the scheduler.
+ *
+ */
+static inline bool
+rte_event_dequeue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev,
+		  uint64_t wait)
+{
+	struct rte_eventdev *dev = &rte_eventdevs[dev_id];
+
+	return (*dev->dequeue)(
+			dev->data->ports[port_id], ev, wait);
+}
+
+/**
+ * Dequeue a burst of events objects from the event port designated by its
+ * *event_port_id*, on an event device designated by its *dev_id*.
+ *
+ * The rte_event_dequeue_burst() function is invoked to dequeue
+ * multiple event objects. It is the burst variant of rte_event_dequeue()
+ * function.
+ *
+ * The *nb_events* parameter is the maximum number of event objects to dequeue
+ * which are returned in the *ev* array of *rte_event* structure.
+ *
+ * The rte_event_dequeue_burst() function returns the number of
+ * events objects it actually dequeued. A return value equal to
+ * *nb_events* means that all event objects have been dequeued.
+ *
+ * The number of events dequeued is the number of scheduler contexts held by
+ * this port. These contexts are automatically released in the next
+ * rte_event_dequeue() invocation, or invoking rte_event_enqueue() with
+ * RTE_EVENT_OP_RELEASE operation can be used to release the contexts early.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param[out] ev
+ *   Points to an array of *nb_events* objects of type *rte_event* structure
+ *   for output to be populated with the dequeued event objects.
+ * @param nb_events
+ *   The maximum number of event objects to dequeue, typically number of
+ *   rte_event_port_dequeue_depth() available for this port.
+ *
+ * @param wait
+ *   0 - no-wait, returns immediately if there is no event.
+ *   >0 - wait for the event, if the device is configured with
+ *   RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until the
+ *   event available or *wait* time.
+ *   if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
+ *   then this function will wait until the event available or *dequeue_wait_ns*
+ *   ns which was previously supplied to rte_event_dev_configure()
+ *
+ * @return
+ * The number of event objects actually dequeued from the port. The return
+ * value can be less than the value of the *nb_events* parameter when the
+ * event port's queue is not full.
+ *
+ * \see rte_event_dequeue(), rte_event_port_dequeue_depth()
+ */
+static inline uint16_t
+rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
+			uint16_t nb_events, uint64_t wait)
+{
+	struct rte_eventdev *dev = &rte_eventdevs[dev_id];
+
+	return (*dev->dequeue_burst)(
+			dev->data->ports[port_id], ev, nb_events, wait);
+}
+
+#define RTE_EVENT_QUEUE_SERVICE_PRIORITY_HIGHEST  0
+/**< Highest event queue servicing priority */
+#define RTE_EVENT_QUEUE_SERVICE_PRIORITY_NORMAL   128
+/**< Normal event queue servicing priority */
+#define RTE_EVENT_QUEUE_SERVICE_PRIORITY_LOWEST   255
+/**< Lowest event queue servicing priority */
+
+/** Structure to hold the queue to port link establishment attributes */
+struct rte_event_queue_link {
+	uint8_t queue_id;
+	/**< Event queue identifier to select the source queue to link */
+	uint8_t priority;
+	/**< The priority of the event queue for this event port.
+	 * The priority defines the event port's servicing priority for
+	 * event queue, which may be ignored by an implementation.
+	 * The requested priority should in the range of
+	 * [RTE_EVENT_QUEUE_SERVICE_PRIORITY_HIGHEST,
+	 * RTE_EVENT_QUEUE_SERVICE_PRIORITY_LOWEST].
+	 * The implementation shall normalize the requested priority to
+	 * implementation supported priority value.
+	 */
+};
+
+/**
+ * Link multiple source event queues supplied in *rte_event_queue_link*
+ * structure as *queue_id* to the destination event port designated by its
+ * *port_id* on the event device designated by its *dev_id*.
+ *
+ * The link establishment shall enable the event port *port_id* from
+ * receiving events from the specified event queue *queue_id*
+ *
+ * An event queue may link to one or more event ports.
+ * The number of links can be established from an event queue to event port is
+ * implementation defined.
+ *
+ * Event queue(s) to event port link establishment can be changed at runtime
+ * without re-configuring the device to support scaling and to reduce the
+ * latency of critical work by establishing the link with more event ports
+ * at runtime.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @param port_id
+ *   Event port identifier to select the destination port to link.
+ *
+ * @param link
+ *   Points to an array of *nb_links* objects of type *rte_event_queue_link*
+ *   structure which contain the event queue to event port link establishment
+ *   attributes.
+ *   NULL value is allowed, in which case this function links all the configured
+ *   event queues *nb_event_queues* which previously supplied to
+ *   rte_event_dev_configure() to the event port *port_id* with normal servicing
+ *   priority(RTE_EVENT_QUEUE_SERVICE_PRIORITY_NORMAL).
+ *
+ * @param nb_links
+ *   The number of links to establish
+ *
+ * @return
+ * The number of links actually established. The return value can be less than
+ * the value of the *nb_links* parameter when the implementation has the
+ * limitation on specific queue to port link establishment or if invalid
+ * parameters are specified in a *rte_event_queue_link*.
+ * If the return value is less than *nb_links*, the remaining links at the end
+ * of link[] are not established, and the caller has to take care of them.
+ * If return value is less than *nb_links* then implementation shall update the
+ * rte_errno accordingly, Possible rte_errno values are
+ * (-EDQUOT) Quota exceeded(Application tried to link the queue configured with
+ *  RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER to more than one event ports)
+ * (-EINVAL) Invalid parameter
+ *
+ */
+int
+rte_event_port_link(uint8_t dev_id, uint8_t port_id,
+		    struct rte_event_queue_link link[], uint16_t nb_links);
+
+/**
+ * Unlink multiple source event queues supplied in *queues* from the destination
+ * event port designated by its *port_id* on the event device designated
+ * by its *dev_id*.
+ *
+ * The unlink establishment shall disable the event port *port_id* from
+ * receiving events from the specified event queue *queue_id*
+ *
+ * Event queue(s) to event port unlink establishment can be changed at runtime
+ * without re-configuring the device.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @param port_id
+ *   Event port identifier to select the destination port to unlink.
+ *
+ * @param queues
+ *   Points to an array of *nb_unlinks* event queues to be unlinked
+ *   from the event port.
+ *   NULL value is allowed, in which case this function unlinks all the
+ *   event queue(s) from the event port *port_id*.
+ *
+ * @param nb_unlinks
+ *   The number of unlinks to establish
+ *
+ * @return
+ * The number of unlinks actually established. The return value can be less
+ * than the value of the *nb_unlinks* parameter when the implementation has the
+ * limitation on specific queue to port unlink establishment or
+ * if invalid parameters are specified.
+ * If the return value is less than *nb_unlinks*, the remaining queues at the
+ * end of queues[] are not established, and the caller has to take care of them.
+ * If return value is less than *nb_unlinks* then implementation shall update
+ * the rte_errno accordingly, Possible rte_errno values are
+ * (-EINVAL) Invalid parameter
+ *
+ */
+int
+rte_event_port_unlink(uint8_t dev_id, uint8_t port_id,
+		      uint8_t queues[], uint16_t nb_unlinks);
+
+/**
+ * Retrieve the list of source event queues and its associated attributes
+ * linked to the destination event port designated by its *port_id*
+ * on the event device designated by its *dev_id*.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @param port_id
+ *   Event port identifier.
+ *
+ * @param[out] link
+ *   Points to an array of *rte_event_queue_link* structure for output.
+ *   The caller has to allocate *RTE_EVENT_MAX_QUEUES_PER_DEV* objects of size
+ *   *rte_event_queue_link* structure to store the event queue to event port
+ *   link establishment attributes.
+ *
+ * @return
+ * The number of links established on the event port designated by its
+ *  *port_id*.
+ * - <0 on failure.
+ *
+ */
+int
+rte_event_port_links_get(uint8_t dev_id, uint8_t port_id,
+			struct rte_event_queue_link link[]);
+
+/**
+ * Dump internal information about *dev_id* to the FILE* provided in *f*.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @param f
+ *   A pointer to a file for output
+ *
+ * @return
+ *   - 0: on success
+ *   - <0: on failure.
+ */
+int
+rte_event_dev_dump(uint8_t dev_id, FILE *f);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_EVENTDEV_H_ */