[dpdk-dev,V9,1/5] eal: add uevent monitor api and callback func

Message ID 1515575544-2141-2-git-send-email-jia.guo@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Guo, Jia Jan. 10, 2018, 9:12 a.m. UTC
  This patch aim to add a general uevent mechanism in eal device layer,
to enable all linux kernel object uevent monitoring, user could use these
APIs to monitor and read out the device status info that sent from the
kernel side, then corresponding to handle it, such as when detect hotplug
uevent type, user could detach or attach the device, and more it benefit
to use to do smoothly fail safe work.

About uevent monitoring:
a: add one epolling to poll the netlink socket, to monitor the uevent of
   the device.
b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
c: add below APIs in rte eal device layer.
   rte_dev_callback_register
   rte_dev_callback_unregister
   _rte_dev_callback_process
   rte_dev_monitor_start
   rte_dev_monitor_stop

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v9->v8:
split the patch set into small and explicit patch
---
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  36 ++++
 .../bsdapp/eal/include/exec-env/rte_dev.h          |  39 ++++
 lib/librte_eal/common/eal_common_dev.c             | 150 ++++++++++++++
 lib/librte_eal/common/include/rte_dev.h            |  98 +++++++++
 lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 223 +++++++++++++++++++++
 .../linuxapp/eal/include/exec-env/rte_dev.h        |  39 ++++
 7 files changed, 587 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
  

Comments

Stephen Hemminger Jan. 10, 2018, 4:34 p.m. UTC | #1
On Wed, 10 Jan 2018 17:12:20 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> +static int
> +dev_monitor_fd_new(void)
> +{
> +
> +	int uevent_fd;
> +
> +	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
> +			SOCK_NONBLOCK,
> +			NETLINK_KOBJECT_UEVENT);
> +	if (uevent_fd < 0) {

If you used a blocking socket, then epoll would not be necessary.
There is one netlink socket for whole system, and the thread is only
reading from one fd.
  
Thomas Monjalon Jan. 11, 2018, 1:43 a.m. UTC | #2
Hi,

Thanks for splitting the patches.
I will review the first one today. Please see below.

10/01/2018 10:12, Jeff Guo:
> --- /dev/null
> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
> +int
> +rte_dev_monitor_start(void)
> +{
> +	return -1;
> +}
> +
> +int
> +rte_dev_monitor_stop(void)
> +{
> +	return -1;
> +}

You should add a log to show it is not supported.

> --- /dev/null
> +++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
> +#ifndef _RTE_DEV_H_
> +#error "don't include this file directly, please include generic <rte_dev.h>"
> +#endif

Why creating different rte_dev.h for BSD and Linux?
This is an API, it should be the same.

> +/**
> + * Start the device uevent monitoring.
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +rte_dev_monitor_start(void);
> +
> +/**
> + * Stop the device uevent monitoring .
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +
> +int
> +rte_dev_monitor_stop(void);

> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -42,9 +42,32 @@
>  #include <rte_devargs.h>
>  #include <rte_debug.h>
>  #include <rte_log.h>
> +#include <rte_spinlock.h>
> +#include <rte_malloc.h>
>  
>  #include "eal_private.h"
>  
> +/* spinlock for device callbacks */
> +static rte_spinlock_t rte_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;

Please rename to rte_dev_event_lock.
Let's use rte_dev_event_ prefix consistently.

> + * The user application callback description.
> + *
> + * It contains callback address to be registered by user application,
> + * the pointer to the parameters for callback, and the event type.
> + */
> +struct rte_eal_dev_callback {

Rename to rte_dev_event?

> +	TAILQ_ENTRY(rte_eal_dev_callback) next; /**< Callbacks list */
> +	rte_eal_dev_cb_fn cb_fn;                /**< Callback address */

Rename to rte_dev_event_callback?

> +	void *cb_arg;                           /**< Parameter for callback */

Comment should be about opaque context.

> +	void *ret_param;                        /**< Return parameter */
> +	enum rte_dev_event_type event;      /**< device event type */
> +	uint32_t active;                        /**< Callback is executing */

Why active is needed?

> +};
> +
> +/* A genaral callback for all new devices be added onto the bus */
> +static struct rte_eal_dev_callback *dev_add_cb;

It should not be a different callback for new devices.
You must allow registering the callback for all and new devices.
Please look how it's done for ethdev:
	https://dpdk.org/patch/32900/

> +int
> +rte_dev_callback_register(struct rte_device *device,
> +			enum rte_dev_event_type event,
> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
> +{

Why passing an event type at registration?
I think the event processing dispatch must be done in the callback,
not at registration.

> +		/* allocate a new interrupt callback entity */
> +		user_cb = rte_zmalloc("eal device event",
> +					sizeof(*user_cb), 0);

No need to use rte_malloc here.
Please check this callback API patch:
	https://dpdk.org/patch/33144/

> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> +enum uev_monitor_netlink_group {
> +	UEV_MONITOR_KERNEL,
> +	UEV_MONITOR_UDEV,
> +};

Please keep a namespace prefix like RTE_DEV_EVENT_ (same for enum name).
Some comments are missing for these constants.

> +/**
> + * The device event type.
> + */
> +enum rte_dev_event_type {
> +	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
> +	RTE_DEV_EVENT_ADD,	/**< device being added */
> +	RTE_DEV_EVENT_REMOVE,
> +				/**< device being removed */
> +	RTE_DEV_EVENT_CHANGE,
> +				/**< device status being changed,
> +				 * etc charger percent
> +				 */

What means status changed?
What means charger percent?

> +	RTE_DEV_EVENT_MOVE,	/**< device sysfs path being moved */

sysfs is Linux specific

> +	RTE_DEV_EVENT_ONLINE,	/**< device being enable */

You mean a device can be added but not enabled?
So enabling is switching it on by a register? or something else?

> +	RTE_DEV_EVENT_OFFLINE,	/**< device being disable */
> +	RTE_DEV_EVENT_MAX	/**< max value of this enum */
> +};
> +
> +struct rte_eal_uevent {
> +	enum rte_dev_event_type type;	/**< device event type */
> +	int subsystem;				/**< subsystem id */
> +	char *devname;				/**< device name */
> +	enum uev_monitor_netlink_group group;	/**< device netlink group */
> +};

I don't understand why this struct is exposed in the public API.
Please rename from rte_eal_ to rte_dev_.

> @@ -166,6 +204,8 @@ struct rte_device {
>  	const struct rte_driver *driver;/**< Associated driver */
>  	int numa_node;                /**< NUMA node connection */
>  	struct rte_devargs *devargs;  /**< Device user arguments */
> +	/** User application callbacks for device event */
> +	struct rte_eal_dev_cb_list uev_cbs;

Do not use uev word in API, it refers to uevent which is implementation
specific. You can name it event_callbacks.

I am afraid this change is breaking the ABI.
For the first time, 18.02 will be ABI stable.

> +/**
> + * It registers the callback for the specific event. Multiple
> + * callbacks cal be registered at the same time.
> + * @param event
> + *  The device event type.
> + * @param cb_fn
> + *  callback address.
> + * @param cb_arg
> + *  address of parameter for callback.
> + *
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int rte_dev_callback_register(struct rte_device *device,
> +			enum rte_dev_event_type event,
> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
> +
> +/**
> + * It unregisters the callback according to the specified event.
> + *
> + * @param event
> + *  The event type which corresponding to the callback.
> + * @param cb_fn
> + *  callback address.
> + *  address of parameter for callback, (void *)-1 means to remove all
> + *  registered which has the same callback address.
> + *
> + * @return
> + *  - On success, return the number of callback entities removed.
> + *  - On failure, a negative value.
> + */
> +int rte_dev_callback_unregister(struct rte_device *device,
> +			enum rte_dev_event_type event,
> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg);

Such new functions should be added as experimental.

There will be probably more to review in this patch.
Let's progress on these comments please.
  
Guo, Jia Jan. 11, 2018, 2:24 p.m. UTC | #3
On 1/11/2018 9:43 AM, Thomas Monjalon wrote:
> Hi,
>
> Thanks for splitting the patches.
> I will review the first one today. Please see below.
>
> 10/01/2018 10:12, Jeff Guo:
>> --- /dev/null
>> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
>> +int
>> +rte_dev_monitor_start(void)
>> +{
>> +	return -1;
>> +}
>> +
>> +int
>> +rte_dev_monitor_stop(void)
>> +{
>> +	return -1;
>> +}
> You should add a log to show it is not supported.
ok.
>> --- /dev/null
>> +++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>> +#ifndef _RTE_DEV_H_
>> +#error "don't include this file directly, please include generic <rte_dev.h>"
>> +#endif
> Why creating different rte_dev.h for BSD and Linux?
> This is an API, it should be the same.
if no need at this time, combine it to a file.
>> +/**
>> + * Start the device uevent monitoring.
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +rte_dev_monitor_start(void);
>> +
>> +/**
>> + * Stop the device uevent monitoring .
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +
>> +int
>> +rte_dev_monitor_stop(void);
>> --- a/lib/librte_eal/common/eal_common_dev.c
>> +++ b/lib/librte_eal/common/eal_common_dev.c
>> @@ -42,9 +42,32 @@
>>   #include <rte_devargs.h>
>>   #include <rte_debug.h>
>>   #include <rte_log.h>
>> +#include <rte_spinlock.h>
>> +#include <rte_malloc.h>
>>   
>>   #include "eal_private.h"
>>   
>> +/* spinlock for device callbacks */
>> +static rte_spinlock_t rte_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;
> Please rename to rte_dev_event_lock.
> Let's use rte_dev_event_ prefix consistently.
make consistently, agree.
>> + * The user application callback description.
>> + *
>> + * It contains callback address to be registered by user application,
>> + * the pointer to the parameters for callback, and the event type.
>> + */
>> +struct rte_eal_dev_callback {
> Rename to rte_dev_event?
>> +	TAILQ_ENTRY(rte_eal_dev_callback) next; /**< Callbacks list */
>> +	rte_eal_dev_cb_fn cb_fn;                /**< Callback address */
> Rename to rte_dev_event_callback?
>> +	void *cb_arg;                           /**< Parameter for callback */
> Comment should be about opaque context.
>> +	void *ret_param;                        /**< Return parameter */
>> +	enum rte_dev_event_type event;      /**< device event type */
>> +	uint32_t active;                        /**< Callback is executing */
> Why active is needed?
avoid the lock when unregistered  callback.
>> +};
>> +
>> +/* A genaral callback for all new devices be added onto the bus */
>> +static struct rte_eal_dev_callback *dev_add_cb;
> It should not be a different callback for new devices.
> You must allow registering the callback for all and new devices.
> Please look how it's done for ethdev:
> 	https://dpdk.org/patch/32900/
the aim to use this special callback is because when new device add onto 
the bus, no device instance to store the callback. i saw ethdev 
solution, that is base on port but that would not make sense in rte 
device layer. so
i try to abandon add callback in rte device, replace of add device name 
into callback , please see my v10 patch.
>> +int
>> +rte_dev_callback_register(struct rte_device *device,
>> +			enum rte_dev_event_type event,
>> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
>> +{
> Why passing an event type at registration?
> I think the event processing dispatch must be done in the callback,
> not at registration.
make sense, just register all type for device ,and let eal to pass the 
event.
>> +		/* allocate a new interrupt callback entity */
>> +		user_cb = rte_zmalloc("eal device event",
>> +					sizeof(*user_cb), 0);
> No need to use rte_malloc here.
> Please check this callback API patch:
> 	https://dpdk.org/patch/33144/
could be better to concentration the code. but if you could tell me why 
not use rte_zmalloc.
>> --- a/lib/librte_eal/common/include/rte_dev.h
>> +++ b/lib/librte_eal/common/include/rte_dev.h
>> +enum uev_monitor_netlink_group {
>> +	UEV_MONITOR_KERNEL,
>> +	UEV_MONITOR_UDEV,
>> +};
> Please keep a namespace prefix like RTE_DEV_EVENT_ (same for enum name).
> Some comments are missing for these constants.
>> +/**
>> + * The device event type.
>> + */
>> +enum rte_dev_event_type {
>> +	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
>> +	RTE_DEV_EVENT_ADD,	/**< device being added */
>> +	RTE_DEV_EVENT_REMOVE,
>> +				/**< device being removed */
>> +	RTE_DEV_EVENT_CHANGE,
>> +				/**< device status being changed,
>> +				 * etc charger percent
>> +				 */
> What means status changed?
> What means charger percent?
status changed means that object path change or other more,  charger 
percent just a example for some kobject status.  so i don't think we 
should explicit identify all , i will  delete it until we want to use it.
>> +	RTE_DEV_EVENT_MOVE,	/**< device sysfs path being moved */
> sysfs is Linux specific
>
>> +	RTE_DEV_EVENT_ONLINE,	/**< device being enable */
> You mean a device can be added but not enabled?
> So enabling is switching it on by a register? or something else?
>
>> +	RTE_DEV_EVENT_OFFLINE,	/**< device being disable */
>> +	RTE_DEV_EVENT_MAX	/**< max value of this enum */
>> +};
>> +
>> +struct rte_eal_uevent {
>> +	enum rte_dev_event_type type;	/**< device event type */
>> +	int subsystem;				/**< subsystem id */
>> +	char *devname;				/**< device name */
>> +	enum uev_monitor_netlink_group group;	/**< device netlink group */
>> +};
> I don't understand why this struct is exposed in the public API.
> Please rename from rte_eal_ to rte_dev_.
will modify the uevent to event.
>> @@ -166,6 +204,8 @@ struct rte_device {
>>   	const struct rte_driver *driver;/**< Associated driver */
>>   	int numa_node;                /**< NUMA node connection */
>>   	struct rte_devargs *devargs;  /**< Device user arguments */
>> +	/** User application callbacks for device event */
>> +	struct rte_eal_dev_cb_list uev_cbs;
> Do not use uev word in API, it refers to uevent which is implementation
> specific. You can name it event_callbacks.
>
> I am afraid this change is breaking the ABI.
> For the first time, 18.02 will be ABI stable.
will not modify rte device struct in v10.
>> +/**
>> + * It registers the callback for the specific event. Multiple
>> + * callbacks cal be registered at the same time.
>> + * @param event
>> + *  The device event type.
>> + * @param cb_fn
>> + *  callback address.
>> + * @param cb_arg
>> + *  address of parameter for callback.
>> + *
>> + * @return
>> + *  - On success, zero.
>> + *  - On failure, a negative value.
>> + */
>> +int rte_dev_callback_register(struct rte_device *device,
>> +			enum rte_dev_event_type event,
>> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
>> +
>> +/**
>> + * It unregisters the callback according to the specified event.
>> + *
>> + * @param event
>> + *  The event type which corresponding to the callback.
>> + * @param cb_fn
>> + *  callback address.
>> + *  address of parameter for callback, (void *)-1 means to remove all
>> + *  registered which has the same callback address.
>> + *
>> + * @return
>> + *  - On success, return the number of callback entities removed.
>> + *  - On failure, a negative value.
>> + */
>> +int rte_dev_callback_unregister(struct rte_device *device,
>> +			enum rte_dev_event_type event,
>> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
> Such new functions should be added as experimental.
>
> There will be probably more to review in this patch.
> Let's progress on these comments please.
thanks for your review!
  

Patch

diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..7fdc2c0
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,36 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+int
+rte_dev_monitor_start(void)
+{
+	return -1;
+}
+
+int
+rte_dev_monitor_stop(void)
+{
+	return -1;
+}
diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
new file mode 100644
index 0000000..70413b3
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
@@ -0,0 +1,39 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_DEV_H_
+#error "don't include this file directly, please include generic <rte_dev.h>"
+#endif
+
+#ifndef _RTE_LINUXAPP_DEV_H_
+#define _RTE_LINUXAPP_DEV_H_
+
+#include <stdio.h>
+
+#include <rte_dev.h>
+
+/**
+ * Start the device uevent monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_monitor_start(void);
+
+/**
+ * Stop the device uevent monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+
+int
+rte_dev_monitor_stop(void);
+
+#endif /* _RTE_LINUXAPP_DEV_H_ */
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index dda8f58..24c410e 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -42,9 +42,32 @@ 
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t rte_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The user application callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the event type.
+ */
+struct rte_eal_dev_callback {
+	TAILQ_ENTRY(rte_eal_dev_callback) next; /**< Callbacks list */
+	rte_eal_dev_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Parameter for callback */
+	void *ret_param;                        /**< Return parameter */
+	enum rte_dev_event_type event;      /**< device event type */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/* A genaral callback for all new devices be added onto the bus */
+static struct rte_eal_dev_callback *dev_add_cb;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -234,3 +257,130 @@  int rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int
+rte_dev_callback_register(struct rte_device *device,
+			enum rte_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
+{
+	struct rte_eal_dev_callback *user_cb;
+
+	if (!cb_fn || device == NULL)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+
+	if (TAILQ_EMPTY(&(device->uev_cbs)))
+		TAILQ_INIT(&(device->uev_cbs));
+
+	if (event == RTE_DEV_EVENT_ADD) {
+		user_cb = NULL;
+	} else {
+		TAILQ_FOREACH(user_cb, &(device->uev_cbs), next) {
+			if (user_cb->cb_fn == cb_fn &&
+				user_cb->cb_arg == cb_arg &&
+				user_cb->event == event) {
+				break;
+			}
+		}
+	}
+
+	/* create a new callback. */
+	if (user_cb == NULL) {
+		/* allocate a new interrupt callback entity */
+		user_cb = rte_zmalloc("eal device event",
+					sizeof(*user_cb), 0);
+		if (user_cb == NULL) {
+			RTE_LOG(ERR, EAL, "Can not allocate memory\n");
+			rte_spinlock_unlock(&rte_dev_cb_lock);
+			return -ENOMEM;
+		}
+		user_cb->cb_fn = cb_fn;
+		user_cb->cb_arg = cb_arg;
+		user_cb->event = event;
+		if (event == RTE_DEV_EVENT_ADD)
+			dev_add_cb = user_cb;
+		else
+			TAILQ_INSERT_TAIL(&(device->uev_cbs), user_cb, next);
+	}
+
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return 0;
+}
+
+int
+rte_dev_callback_unregister(struct rte_device *device,
+			enum rte_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
+{
+	int ret;
+	struct rte_eal_dev_callback *cb, *next;
+
+	if (!cb_fn || device == NULL)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+
+	ret = 0;
+	if (event == RTE_DEV_EVENT_ADD) {
+		rte_free(dev_add_cb);
+		dev_add_cb = NULL;
+	} else {
+		for (cb = TAILQ_FIRST(&(device->uev_cbs)); cb != NULL;
+		      cb = next) {
+
+			next = TAILQ_NEXT(cb, next);
+
+			if (cb->cb_fn != cb_fn || cb->event != event ||
+					(cb->cb_arg != (void *)-1 &&
+					cb->cb_arg != cb_arg))
+				continue;
+
+			/*
+			 * if this callback is not executing right now,
+			 * then remove it.
+			 */
+			if (cb->active == 0) {
+				TAILQ_REMOVE(&(device->uev_cbs), cb, next);
+				rte_free(cb);
+			} else {
+				ret = -EAGAIN;
+			}
+		}
+	}
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return ret;
+}
+
+int
+_rte_dev_callback_process(struct rte_device *device,
+			enum rte_dev_event_type event,
+			void *ret_param)
+{
+	struct rte_eal_dev_callback dev_cb;
+	struct rte_eal_dev_callback *cb_lst;
+	int rc = 0;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+	if (event == RTE_DEV_EVENT_ADD) {
+		if (ret_param != NULL)
+			dev_add_cb->ret_param = ret_param;
+
+		rc = dev_add_cb->cb_fn(dev_add_cb->event,
+				dev_add_cb->cb_arg, dev_add_cb->ret_param);
+	} else {
+		TAILQ_FOREACH(cb_lst, &(device->uev_cbs), next) {
+			if (cb_lst->cb_fn == NULL || cb_lst->event != event)
+				continue;
+			dev_cb = *cb_lst;
+			cb_lst->active = 1;
+			if (ret_param != NULL)
+				dev_cb.ret_param = ret_param;
+			rc = dev_cb.cb_fn(dev_cb.event,
+					dev_cb.cb_arg, dev_cb.ret_param);
+			cb_lst->active = 0;
+		}
+	}
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return rc;
+}
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 9342e0c..ab12862 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -51,6 +51,44 @@  extern "C" {
 
 #include <rte_log.h>
 
+#include <exec-env/rte_dev.h>
+
+enum uev_monitor_netlink_group {
+	UEV_MONITOR_KERNEL,
+	UEV_MONITOR_UDEV,
+};
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,
+				/**< device being removed */
+	RTE_DEV_EVENT_CHANGE,
+				/**< device status being changed,
+				 * etc charger percent
+				 */
+	RTE_DEV_EVENT_MOVE,	/**< device sysfs path being moved */
+	RTE_DEV_EVENT_ONLINE,	/**< device being enable */
+	RTE_DEV_EVENT_OFFLINE,	/**< device being disable */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_eal_uevent {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;				/**< subsystem id */
+	char *devname;				/**< device name */
+	enum uev_monitor_netlink_group group;	/**< device netlink group */
+};
+
+typedef int (*rte_eal_dev_cb_fn)(enum rte_dev_event_type event,
+					void *cb_arg, void *ret_param);
+
+struct rte_eal_dev_callback;
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(rte_eal_dev_cb_list, rte_eal_dev_callback);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -166,6 +204,8 @@  struct rte_device {
 	const struct rte_driver *driver;/**< Associated driver */
 	int numa_node;                /**< NUMA node connection */
 	struct rte_devargs *devargs;  /**< Device user arguments */
+	/** User application callbacks for device event */
+	struct rte_eal_dev_cb_list uev_cbs;
 };
 
 /**
@@ -293,4 +333,62 @@  __attribute__((used)) = str
 }
 #endif
 
+/**
+ * It registers the callback for the specific event. Multiple
+ * callbacks cal be registered at the same time.
+ * @param event
+ *  The device event type.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_register(struct rte_device *device,
+			enum rte_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * It unregisters the callback according to the specified event.
+ *
+ * @param event
+ *  The event type which corresponding to the callback.
+ * @param cb_fn
+ *  callback address.
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_unregister(struct rte_device *device,
+			enum rte_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * @internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param event
+ *  The device event type.
+ * @param cb_arg
+ *  callback parameter.
+ * @param ret_param
+ *  To pass data back to user application.
+ *  This allows the user application to decide if a particular function
+ *  is permitted or not.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int
+_rte_dev_callback_process(struct rte_device *device,
+			enum rte_dev_event_type event,
+			void *ret_param);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 588c0bd..b9f5d31 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -39,6 +39,7 @@  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
@@ -92,7 +93,7 @@  ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
 endif
 
-INC := rte_kni_common.h
+INC := rte_kni_common.h rte_dev.h
 
 SYMLINK-$(CONFIG_RTE_EXEC_ENV_LINUXAPP)-include/exec-env := \
 	$(addprefix include/exec-env/,$(INC))
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..4812dbc
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,223 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+#include <rte_service.h>
+#include <rte_service_component.h>
+
+#include "eal_thread.h"
+
+bool service_exit = true;
+
+bool service_no_init = true;
+
+#define DEV_EV_MNT_SERVICE_NAME "device_event_monitor_service"
+
+static int
+dev_monitor_fd_new(void)
+{
+
+	int uevent_fd;
+
+	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
+		return -1;
+	}
+	return uevent_fd;
+}
+
+static int
+dev_monitor_enable(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "bind failed\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static int
+dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+{
+	/* TODO: device uevent processing */
+	return 0;
+}
+
+/**
+ * It builds/rebuilds up the epoll file descriptor with all the
+ * file descriptors being waited on. Then handles the interrupts.
+ *
+ * @param arg
+ *  pointer. (unused)
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+static int32_t dev_uev_monitoring(__rte_unused void *arg)
+{
+	int netlink_fd = -1;
+	struct epoll_event ep_kernel;
+	int fd_ep = -1;
+
+	service_exit = false;
+
+	fd_ep = epoll_create1(EPOLL_CLOEXEC);
+	if (fd_ep < 0) {
+		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
+		goto out;
+	}
+
+	netlink_fd = dev_monitor_fd_new();
+
+	if (dev_monitor_enable(netlink_fd) < 0) {
+		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
+		goto out;
+	}
+
+	memset(&ep_kernel, 0, sizeof(struct epoll_event));
+	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+	ep_kernel.data.fd = netlink_fd;
+	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
+		&ep_kernel) < 0) {
+		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
+		goto out;
+	}
+
+	while (!service_exit) {
+		int fdcount;
+		struct epoll_event ev[1];
+
+		fdcount = epoll_wait(fd_ep, ev, 1, -1);
+		if (fdcount < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error receiving uevent "
+					"message: %m\n");
+				continue;
+			}
+
+		/* epoll_wait has at least one fd ready to read */
+		if (dev_uev_process(ev, fdcount) < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error processing uevent "
+					"message: %m\n");
+		}
+	}
+	return 0;
+out:
+	if (fd_ep >= 0)
+		close(fd_ep);
+	if (netlink_fd >= 0)
+		close(netlink_fd);
+	rte_panic("uev monitoring fail\n");
+	return -1;
+}
+
+int
+rte_dev_monitor_start(void)
+{
+	int ret;
+	struct rte_service_spec service;
+	uint32_t id;
+	const uint32_t sid = 0;
+
+	if (!service_no_init)
+		return 0;
+
+	uint32_t slcore_1 = rte_get_next_lcore(/* start core */ -1,
+					       /* skip master */ 1,
+					       /* wrap */ 0);
+
+	ret = rte_service_lcore_add(slcore_1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
+		return ret;
+	}
+
+	memset(&service, 0, sizeof(service));
+	snprintf(service.name, sizeof(service.name), DEV_EV_MNT_SERVICE_NAME);
+
+	service.socket_id = rte_socket_id();
+	service.callback = dev_uev_monitoring;
+	service.callback_userdata = NULL;
+	service.capabilities = 0;
+	ret = rte_service_component_register(&service, &id);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to register service %s "
+			"err = %" PRId32,
+			service.name, ret);
+		return ret;
+	}
+	ret = rte_service_runstate_set(sid, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the runstate of "
+			"the service");
+		return ret;
+	}
+	ret = rte_service_component_runstate_set(id, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the backend runstate"
+			" of a component");
+		return ret;
+	}
+	ret = rte_service_map_lcore_set(sid, slcore_1, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to enable lcore 1 on "
+			"dev event monitor service");
+		return ret;
+	}
+	rte_service_lcore_start(slcore_1);
+	service_no_init = false;
+	return 0;
+}
+
+int
+rte_dev_monitor_stop(void)
+{
+	service_exit = true;
+	service_no_init = true;
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
new file mode 100644
index 0000000..70413b3
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
@@ -0,0 +1,39 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_DEV_H_
+#error "don't include this file directly, please include generic <rte_dev.h>"
+#endif
+
+#ifndef _RTE_LINUXAPP_DEV_H_
+#define _RTE_LINUXAPP_DEV_H_
+
+#include <stdio.h>
+
+#include <rte_dev.h>
+
+/**
+ * Start the device uevent monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_monitor_start(void);
+
+/**
+ * Stop the device uevent monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+
+int
+rte_dev_monitor_stop(void);
+
+#endif /* _RTE_LINUXAPP_DEV_H_ */