[dpdk-dev,v3,2/3] eal: add synchronous multi-process communication

Message ID 1516853783-108023-3-git-send-email-jianfeng.tan@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation fail Compilation issues

Commit Message

Jianfeng Tan Jan. 25, 2018, 4:16 a.m. UTC
  We need the synchronous way for multi-process communication,
i.e., blockingly waiting for reply message when we send a request
to the peer process.

We add two APIs rte_eal_mp_request() and rte_eal_mp_reply() for
such use case. By invoking rte_eal_mp_request(), a request message
is sent out, and then it waits there for a reply message. The caller
can specify the timeout. And the response messages will be collected
and returned so that the caller can decide how to translate them.

The API rte_eal_mp_reply() is always called by an mp action handler.
Here we add another parameter for rte_eal_mp_t so that the action
handler knows which peer address to reply.

       sender-process                receiver-process
   ----------------------            ----------------

    thread-n
     |_rte_eal_mp_request() ----------> mp-thread
        |_timedwait()                    |_process_msg()
                                           |_action()
                                               |_rte_eal_mp_reply()
	        mp_thread  <---------------------|
                  |_process_msg()
                     |_signal(send_thread)
    thread-m <----------|
     |_collect-reply

 * A secondary process is only allowed to talk to the primary process.
 * If there are multiple secondary processes for the primary proces,
   it will send request to peer1, collect response from peer1; then
   send request to peer2, collect reponse from peer2, and so on.
 * When thread-n is sending request, thread-m of that process can send
   request at the same time.
 * For pair <action_name, peer>, we guarantee that only one such request
   is on the fly.

Suggested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 doc/guides/rel_notes/release_18_02.rst  |  15 ++
 lib/librte_eal/common/eal_common_proc.c | 237 +++++++++++++++++++++++++++++---
 lib/librte_eal/common/include/rte_eal.h |  58 +++++++-
 lib/librte_eal/rte_eal_version.map      |   3 +
 4 files changed, 295 insertions(+), 18 deletions(-)
  

Comments

Anatoly Burakov Jan. 25, 2018, noon UTC | #1
On the overall patch,

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

For request(), returning number of replies received actually makes 
sense, because now we get use the value to read our replies, if we were 
a primary process sending messages to secondary processes.

Few comments below.

On 25-Jan-18 4:16 AM, Jianfeng Tan wrote:
> We need the synchronous way for multi-process communication,
> i.e., blockingly waiting for reply message when we send a request
> to the peer process.
> 
> We add two APIs rte_eal_mp_request() and rte_eal_mp_reply() for
> such use case. By invoking rte_eal_mp_request(), a request message
> is sent out, and then it waits there for a reply message. The caller
> can specify the timeout. And the response messages will be collected
> and returned so that the caller can decide how to translate them.
> 
> The API rte_eal_mp_reply() is always called by an mp action handler.
> Here we add another parameter for rte_eal_mp_t so that the action
> handler knows which peer address to reply.
> 
>         sender-process                receiver-process
>     ----------------------            ----------------
> 
>      thread-n
>       |_rte_eal_mp_request() ----------> mp-thread
>          |_timedwait()                    |_process_msg()
>                                             |_action()
>                                                 |_rte_eal_mp_reply()
> 	        mp_thread  <---------------------|
>                    |_process_msg()
>                       |_signal(send_thread)
>      thread-m <----------|
>       |_collect-reply
> 
>   * A secondary process is only allowed to talk to the primary process.
>   * If there are multiple secondary processes for the primary proces,
>     it will send request to peer1, collect response from peer1; then
>     send request to peer2, collect reponse from peer2, and so on.
>   * When thread-n is sending request, thread-m of that process can send
>     request at the same time.
>   * For pair <action_name, peer>, we guarantee that only one such request
>     is on the fly.
> 
> Suggested-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
>   doc/guides/rel_notes/release_18_02.rst  |  15 ++
>   lib/librte_eal/common/eal_common_proc.c | 237 +++++++++++++++++++++++++++++---
>   lib/librte_eal/common/include/rte_eal.h |  58 +++++++-
>   lib/librte_eal/rte_eal_version.map      |   3 +
>   4 files changed, 295 insertions(+), 18 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_18_02.rst b/doc/guides/rel_notes/release_18_02.rst
> index 00b3224..f6ed666 100644
> --- a/doc/guides/rel_notes/release_18_02.rst
> +++ b/doc/guides/rel_notes/release_18_02.rst
> @@ -151,6 +151,21 @@ New Features
>     renamed the application from SW PMD specific ``eventdev_pipeline_sw_pmd``
>     to PMD agnostic ``eventdev_pipeline``.
>   
> +* **Added new multi-process communication channel**
> +
> +  Added a generic channel in EAL for multi-process (primary/secondary) synchronous
> +  and asynchronous communication. Each component who wants to reponse a message
> +  shall register the action; and each process has a thread to receive the message
> +  and invokes the registered action. The list of new APIs:
> +
> +  * ``rte_eal_mp_register``
> +  * ``rte_eal_mp_unregister``
> +  * ``rte_eal_mp_sendmsg``
> +  * ``rte_eal_mp_request``
> +  * ``rte_eal_mp_reply``
> +
> +  Note as we changed to use the new channel for communication, applications cannot
> +  talk with old version through the old (private) communication channel.

Some of this should've probably been added into previous patch.

>   
>   API Changes
>   -----------
> diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c
> index baeb7d1..69df943 100644
> --- a/lib/librte_eal/common/eal_common_proc.c
> +++ b/lib/librte_eal/common/eal_common_proc.c
> @@ -12,6 +12,7 @@
>   #include <stdio.h>
>   #include <stdlib.h>
>   #include <string.h>
> +#include <sys/time.h>
>   #include <sys/types.h>
>   #include <sys/socket.h>
>   #include <sys/un.h>
> @@ -44,6 +45,50 @@ TAILQ_HEAD(action_entry_list, action_entry);
>   static struct action_entry_list action_entry_list =
>   	TAILQ_HEAD_INITIALIZER(action_entry_list);
>   

<snip>

> +		return 0;
> +	}
> +
> +	if (send_msg(dst, req, MP_REQ) != 1) {
> +		RTE_LOG(ERR, EAL, "Fail to send request %s:%s\n",
> +			dst, req->name);
> +		return 0;
> +	}
> +
> +	pthread_mutex_lock(&sync_requests.lock);
> +	do {
> +		pthread_cond_timedwait(&sync_req.cond, &sync_requests.lock, ts);
> +		/* Check spurious wakeups */
> +		if (sync_req.reply_received == 1)
> +			break;
> +		/* Check if time is out */
> +		if (gettimeofday(&now, NULL) < 0)
> +			break;
> +		if (now.tv_sec < ts->tv_sec)
> +			break;
> +		else if (now.tv_sec == ts->tv_sec &&
> +			 now.tv_usec * 1000 < ts->tv_nsec)
> +			break;
> +	} while (1);
> +	/* We got the lock now */
> +	TAILQ_REMOVE(&sync_requests.requests, &sync_req, next);
> +	pthread_mutex_unlock(&sync_requests.lock);
> +
> +	if (sync_req.reply_received == 0) {
> +		RTE_LOG(ERR, EAL, "Fail to recv reply for request %s:%s\n",
> +			dst, req->name);
> +		return 1;

Why are we returning 1 here? There was no reply, so no reply structure 
was allocated. This looks like a potential buffer overflow on trying to 
read replies if one of them wasn't delivered.

> +	}
> +
> +	tmp = realloc(reply->msgs, sizeof(msg) * (reply->nb_msgs + 1));
> +	if (!tmp) {
> +		RTE_LOG(ERR, EAL, "Fail to alloc reply for request %s:%s\n",
> +			dst, req->name);
> +		return 1;
> +	}

Same here - we couldn't allocate a reply, so it won't get to the user. 
Why return 1 here?

> +	memcpy(&tmp[reply->nb_msgs], &msg, sizeof(msg));
> +	reply->msgs = tmp;
> +	reply->nb_msgs++;
> +	return 1;
> +}
> +
> +int
> +rte_eal_mp_request(struct rte_mp_msg *req,
> +		   struct rte_mp_reply *reply,
> +		   const struct timespec *ts)
> +{
> +	DIR *mp_dir;
> +	struct dirent *ent;
> +	int nb_snds = 0;
> +	struct timeval now;
> +	struct timespec end;
> +

<snip>

>   /**
>    * @warning
> @@ -262,6 +268,56 @@ void rte_eal_mp_action_unregister(const char *name);
>   int rte_eal_mp_sendmsg(struct rte_mp_msg *msg);
>   
>   /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Send a request to the peer process and expect a reply.
> + *
> + * This function sends a request message to the peer process, and will
> + * block until receiving reply message from the peer process.
> + *
> + * @note The caller is responsible to free reply->replies.
> + *
> + * @param req
> + *   The req argument contains the customized request message.
> + *
> + * @param reply
> + *   The reply argument will be for storing all the replied messages;
> + *   the caller is responsible for free reply->replies.
> + *
> + * @param ts
> + *   The ts argument specifies how long we can wait for the peer(s) to reply.
> + *
> + * @return
> + *  - (<0) on invalid parameters;
> + *  - (>=0) as the number of messages being sent successfully.
> + */
> +int rte_eal_mp_request(struct rte_mp_msg *req,
> +			struct rte_mp_reply *reply, const struct timespec *ts);

See above: it would be much more useful to return number of replies 
received, rather than number of messages sent, as that's the number we 
are most interested in. Otherwise, if we e.g. sent 5 messages but 
received 1 reply, you're essentially not telling the user how far can he 
index the reply pointer.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Send a reply to the peer process.
> + *
> + * This function will send a reply message in response to a request message
> + * received previously.
> + *
> + * @param msg
> + *   The msg argument contains the customized message.
> + *
> + * @param peer
> + *   The peer argument is the pointer to the peer socket path.
> + *
> + * @return
> + *  - (1) on success;
> + *  - (0) on failure;
> + *  - (<0) on invalid parameters.
> + */
> +int rte_eal_mp_reply(struct rte_mp_msg *msg, const char *peer);

I don't think there's much point in making distinction between invalid 
parameters and failure.

> +
> +/**
>    * Usage function typedef used by the application usage function.
>    *
>    * Use this function typedef to define and call rte_set_application_usage_hook()
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index adeadfb..3015bc6 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -220,6 +220,9 @@ EXPERIMENTAL {
>   	rte_eal_mp_action_register;
>   	rte_eal_mp_action_unregister;
>   	rte_eal_mp_sendmsg;
> +	rte_eal_mp_request;
> +	rte_eal_mp_reply;
> +	rte_eal_mp_sendmsg;

You're adding rte_eal_mp_sendmsg twice.

>   	rte_service_attr_get;
>   	rte_service_attr_reset_all;
>   	rte_service_component_register;
>
  
Anatoly Burakov Jan. 25, 2018, 12:19 p.m. UTC | #2
> 
> See above: it would be much more useful to return number of replies 
> received, rather than number of messages sent, as that's the number we 
> are most interested in. Otherwise, if we e.g. sent 5 messages but 
> received 1 reply, you're essentially not telling the user how far can he 
> index the reply pointer.

Apologies, just noticed that rte_mp_reply has nb_messages in it. So if 
we are getting number of replies along with reply, this API should too 
switch to 0/-1 on success/failure respectively, as the number of sent 
messages also becomes meaningless to the user.
  
Ananyev, Konstantin Jan. 25, 2018, 12:19 p.m. UTC | #3
> -----Original Message-----

> From: Burakov, Anatoly

> Sent: Thursday, January 25, 2018 12:00 PM

> To: Tan, Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org

> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; thomas@monjalon.net

> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process communication

> 

> On the overall patch,

> 

> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

> 

> For request(), returning number of replies received actually makes

> sense, because now we get use the value to read our replies, if we were

> a primary process sending messages to secondary processes.


Yes, I also think it is good to return number of sends.
Then caller can compare number of sended requests with number of
received replies and decide should it be considered a failure or no.

> > +		return 0;

> > +	}

> > +

> > +	if (send_msg(dst, req, MP_REQ) != 1) {

> > +		RTE_LOG(ERR, EAL, "Fail to send request %s:%s\n",

> > +			dst, req->name);

> > +		return 0;

> > +	}

> > +

> > +	pthread_mutex_lock(&sync_requests.lock);

> > +	do {

> > +		pthread_cond_timedwait(&sync_req.cond, &sync_requests.lock, ts);

> > +		/* Check spurious wakeups */

> > +		if (sync_req.reply_received == 1)

> > +			break;

> > +		/* Check if time is out */

> > +		if (gettimeofday(&now, NULL) < 0)

> > +			break;

> > +		if (now.tv_sec < ts->tv_sec)

> > +			break;

> > +		else if (now.tv_sec == ts->tv_sec &&

> > +			 now.tv_usec * 1000 < ts->tv_nsec)

> > +			break;

> > +	} while (1);

> > +	/* We got the lock now */

> > +	TAILQ_REMOVE(&sync_requests.requests, &sync_req, next);

> > +	pthread_mutex_unlock(&sync_requests.lock);

> > +

> > +	if (sync_req.reply_received == 0) {

> > +		RTE_LOG(ERR, EAL, "Fail to recv reply for request %s:%s\n",

> > +			dst, req->name);

> > +		return 1;

> 

> Why are we returning 1 here? There was no reply, so no reply structure

> was allocated. This looks like a potential buffer overflow on trying to

> read replies if one of them wasn't delivered.



As I understand - because we receive a number of sended requests.
Number of received replies  will be available in reply->nb_msgs.
Same below.
Konstantin

> 

> > +	}

> > +

> > +	tmp = realloc(reply->msgs, sizeof(msg) * (reply->nb_msgs + 1));

> > +	if (!tmp) {

> > +		RTE_LOG(ERR, EAL, "Fail to alloc reply for request %s:%s\n",

> > +			dst, req->name);

> > +		return 1;

> > +	}

> 

> Same here - we couldn't allocate a reply, so it won't get to the user.

> Why return 1 here?

> 

> > +	memcpy(&tmp[reply->nb_msgs], &msg, sizeof(msg));

> > +	reply->msgs = tmp;

> > +	reply->nb_msgs++;

> > +	return 1;

> > +}

> > +

> > +int

> > +rte_eal_mp_request(struct rte_mp_msg *req,

> > +		   struct rte_mp_reply *reply,

> > +		   const struct timespec *ts)

> > +{

> > +	DIR *mp_dir;

> > +	struct dirent *ent;

> > +	int nb_snds = 0;

> > +	struct timeval now;

> > +	struct timespec end;

> > +

> 

> <snip>

> 

> >   /**

> >    * @warning

> > @@ -262,6 +268,56 @@ void rte_eal_mp_action_unregister(const char *name);

> >   int rte_eal_mp_sendmsg(struct rte_mp_msg *msg);

> >

> >   /**

> > + * @warning

> > + * @b EXPERIMENTAL: this API may change without prior notice

> > + *

> > + * Send a request to the peer process and expect a reply.

> > + *

> > + * This function sends a request message to the peer process, and will

> > + * block until receiving reply message from the peer process.

> > + *

> > + * @note The caller is responsible to free reply->replies.

> > + *

> > + * @param req

> > + *   The req argument contains the customized request message.

> > + *

> > + * @param reply

> > + *   The reply argument will be for storing all the replied messages;

> > + *   the caller is responsible for free reply->replies.

> > + *

> > + * @param ts

> > + *   The ts argument specifies how long we can wait for the peer(s) to reply.

> > + *

> > + * @return

> > + *  - (<0) on invalid parameters;

> > + *  - (>=0) as the number of messages being sent successfully.

> > + */

> > +int rte_eal_mp_request(struct rte_mp_msg *req,

> > +			struct rte_mp_reply *reply, const struct timespec *ts);

> 

> See above: it would be much more useful to return number of replies

> received, rather than number of messages sent, as that's the number we

> are most interested in. Otherwise, if we e.g. sent 5 messages but

> received 1 reply, you're essentially not telling the user how far can he

> index the reply pointer.

> 

> > +

> > +/**

> > + * @warning

> > + * @b EXPERIMENTAL: this API may change without prior notice

> > + *

> > + * Send a reply to the peer process.

> > + *

> > + * This function will send a reply message in response to a request message

> > + * received previously.

> > + *

> > + * @param msg

> > + *   The msg argument contains the customized message.

> > + *

> > + * @param peer

> > + *   The peer argument is the pointer to the peer socket path.

> > + *

> > + * @return

> > + *  - (1) on success;

> > + *  - (0) on failure;

> > + *  - (<0) on invalid parameters.

> > + */

> > +int rte_eal_mp_reply(struct rte_mp_msg *msg, const char *peer);

> 

> I don't think there's much point in making distinction between invalid

> parameters and failure.

> 

> > +

> > +/**

> >    * Usage function typedef used by the application usage function.

> >    *

> >    * Use this function typedef to define and call rte_set_application_usage_hook()

> > diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map

> > index adeadfb..3015bc6 100644

> > --- a/lib/librte_eal/rte_eal_version.map

> > +++ b/lib/librte_eal/rte_eal_version.map

> > @@ -220,6 +220,9 @@ EXPERIMENTAL {

> >   	rte_eal_mp_action_register;

> >   	rte_eal_mp_action_unregister;

> >   	rte_eal_mp_sendmsg;

> > +	rte_eal_mp_request;

> > +	rte_eal_mp_reply;

> > +	rte_eal_mp_sendmsg;

> 

> You're adding rte_eal_mp_sendmsg twice.

> 

> >   	rte_service_attr_get;

> >   	rte_service_attr_reset_all;

> >   	rte_service_component_register;

> >

> 

> 

> --

> Thanks,

> Anatoly
  
Ananyev, Konstantin Jan. 25, 2018, 12:22 p.m. UTC | #4
> We need the synchronous way for multi-process communication,
> i.e., blockingly waiting for reply message when we send a request
> to the peer process.
> 
> We add two APIs rte_eal_mp_request() and rte_eal_mp_reply() for
> such use case. By invoking rte_eal_mp_request(), a request message
> is sent out, and then it waits there for a reply message. The caller
> can specify the timeout. And the response messages will be collected
> and returned so that the caller can decide how to translate them.
> 
> The API rte_eal_mp_reply() is always called by an mp action handler.
> Here we add another parameter for rte_eal_mp_t so that the action
> handler knows which peer address to reply.
> 
>        sender-process                receiver-process
>    ----------------------            ----------------
> 
>     thread-n
>      |_rte_eal_mp_request() ----------> mp-thread
>         |_timedwait()                    |_process_msg()
>                                            |_action()
>                                                |_rte_eal_mp_reply()
> 	        mp_thread  <---------------------|
>                   |_process_msg()
>                      |_signal(send_thread)
>     thread-m <----------|
>      |_collect-reply
> 
>  * A secondary process is only allowed to talk to the primary process.
>  * If there are multiple secondary processes for the primary proces,
>    it will send request to peer1, collect response from peer1; then
>    send request to peer2, collect reponse from peer2, and so on.
>  * When thread-n is sending request, thread-m of that process can send
>    request at the same time.
>  * For pair <action_name, peer>, we guarantee that only one such request
>    is on the fly.
> 
> Suggested-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
  
Anatoly Burakov Jan. 25, 2018, 12:25 p.m. UTC | #5
On 25-Jan-18 12:19 PM, Ananyev, Konstantin wrote:
> 
> 
>> -----Original Message-----
>> From: Burakov, Anatoly
>> Sent: Thursday, January 25, 2018 12:00 PM
>> To: Tan, Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org
>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; thomas@monjalon.net
>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process communication
>>
>> On the overall patch,
>>
>> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>
>> For request(), returning number of replies received actually makes
>> sense, because now we get use the value to read our replies, if we were
>> a primary process sending messages to secondary processes.
> 
> Yes, I also think it is good to return number of sends.
> Then caller can compare number of sended requests with number of
> received replies and decide should it be considered a failure or no.
> 

Well, OK, that might make sense. However, i think it would've be of more 
value to make the API consistent (0/-1 on success/failure) and put 
number of sent messages into the reply, like number of received. I.e. 
something like

struct reply {
    int nb_sent;
    int nb_received;
};

We do it for the latter already, so why not the former?
  
Ananyev, Konstantin Jan. 25, 2018, 1 p.m. UTC | #6
> -----Original Message-----

> From: Burakov, Anatoly

> Sent: Thursday, January 25, 2018 12:26 PM

> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org

> Cc: Richardson, Bruce <bruce.richardson@intel.com>; thomas@monjalon.net

> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process communication

> 

> On 25-Jan-18 12:19 PM, Ananyev, Konstantin wrote:

> >

> >

> >> -----Original Message-----

> >> From: Burakov, Anatoly

> >> Sent: Thursday, January 25, 2018 12:00 PM

> >> To: Tan, Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org

> >> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; thomas@monjalon.net

> >> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process communication

> >>

> >> On the overall patch,

> >>

> >> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

> >>

> >> For request(), returning number of replies received actually makes

> >> sense, because now we get use the value to read our replies, if we were

> >> a primary process sending messages to secondary processes.

> >

> > Yes, I also think it is good to return number of sends.

> > Then caller can compare number of sended requests with number of

> > received replies and decide should it be considered a failure or no.

> >

> 

> Well, OK, that might make sense. However, i think it would've be of more

> value to make the API consistent (0/-1 on success/failure) and put

> number of sent messages into the reply, like number of received. I.e.

> something like

> 

> struct reply {

>     int nb_sent;

>     int nb_received;

> };

> 

> We do it for the latter already, so why not the former?


The question is what treat as success/failure?
Let say we sent 2 requests (of 3 possible), got back 1 response...
Should we consider it as success or failure?
  
Anatoly Burakov Jan. 25, 2018, 1:05 p.m. UTC | #7
On 25-Jan-18 1:00 PM, Ananyev, Konstantin wrote:
> 
> 
>> -----Original Message-----
>> From: Burakov, Anatoly
>> Sent: Thursday, January 25, 2018 12:26 PM
>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org
>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; thomas@monjalon.net
>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process communication
>>
>> On 25-Jan-18 12:19 PM, Ananyev, Konstantin wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Burakov, Anatoly
>>>> Sent: Thursday, January 25, 2018 12:00 PM
>>>> To: Tan, Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org
>>>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; thomas@monjalon.net
>>>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process communication
>>>>
>>>> On the overall patch,
>>>>
>>>> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>
>>>> For request(), returning number of replies received actually makes
>>>> sense, because now we get use the value to read our replies, if we were
>>>> a primary process sending messages to secondary processes.
>>>
>>> Yes, I also think it is good to return number of sends.
>>> Then caller can compare number of sended requests with number of
>>> received replies and decide should it be considered a failure or no.
>>>
>>
>> Well, OK, that might make sense. However, i think it would've be of more
>> value to make the API consistent (0/-1 on success/failure) and put
>> number of sent messages into the reply, like number of received. I.e.
>> something like
>>
>> struct reply {
>>      int nb_sent;
>>      int nb_received;
>> };
>>
>> We do it for the latter already, so why not the former?
> 
> The question is what treat as success/failure?
> Let say we sent 2 requests (of 3 possible), got back 1 response...
> Should we consider it as success or failure?
> 

I think "failure" is "something went wrong", not "secondary processes 
didn't respond". For example, invalid parameters, or our socket suddenly 
being closed, or some other error that prevents us from sending requests 
to secondaries.

As far as i can tell from the code, there's no way to know if the 
secondary process is running other than by attempting to connect to it, 
and get a response. So, failed connection should not be a failure 
condition, because we can't know if we *can* connect to the process 
until we do. Process may have ended, but socket files will still be 
around, and there's nothing we can do about that. So i wouldn't consider 
inability to send a message a failure condition.
  
Anatoly Burakov Jan. 25, 2018, 1:10 p.m. UTC | #8
On 25-Jan-18 1:05 PM, Burakov, Anatoly wrote:
> On 25-Jan-18 1:00 PM, Ananyev, Konstantin wrote:
>>
>>
>>> -----Original Message-----
>>> From: Burakov, Anatoly
>>> Sent: Thursday, January 25, 2018 12:26 PM
>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Tan, Jianfeng 
>>> <jianfeng.tan@intel.com>; dev@dpdk.org
>>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; thomas@monjalon.net
>>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process 
>>> communication
>>>
>>> On 25-Jan-18 12:19 PM, Ananyev, Konstantin wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Burakov, Anatoly
>>>>> Sent: Thursday, January 25, 2018 12:00 PM
>>>>> To: Tan, Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org
>>>>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, 
>>>>> Konstantin <konstantin.ananyev@intel.com>; thomas@monjalon.net
>>>>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process 
>>>>> communication
>>>>>
>>>>> On the overall patch,
>>>>>
>>>>> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>>
>>>>> For request(), returning number of replies received actually makes
>>>>> sense, because now we get use the value to read our replies, if we 
>>>>> were
>>>>> a primary process sending messages to secondary processes.
>>>>
>>>> Yes, I also think it is good to return number of sends.
>>>> Then caller can compare number of sended requests with number of
>>>> received replies and decide should it be considered a failure or no.
>>>>
>>>
>>> Well, OK, that might make sense. However, i think it would've be of more
>>> value to make the API consistent (0/-1 on success/failure) and put
>>> number of sent messages into the reply, like number of received. I.e.
>>> something like
>>>
>>> struct reply {
>>>      int nb_sent;
>>>      int nb_received;
>>> };
>>>
>>> We do it for the latter already, so why not the former?
>>
>> The question is what treat as success/failure?
>> Let say we sent 2 requests (of 3 possible), got back 1 response...
>> Should we consider it as success or failure?
>>
> 
> I think "failure" is "something went wrong", not "secondary processes 
> didn't respond". For example, invalid parameters, or our socket suddenly 
> being closed, or some other error that prevents us from sending requests 
> to secondaries.
> 
> As far as i can tell from the code, there's no way to know if the 
> secondary process is running other than by attempting to connect to it, 
> and get a response. So, failed connection should not be a failure 
> condition, because we can't know if we *can* connect to the process 
> until we do. Process may have ended, but socket files will still be 
> around, and there's nothing we can do about that. So i wouldn't consider 
> inability to send a message a failure condition.
> 

Just to clarify - i'm suggesting leaving this decision up to the user. 
If a user expects there to be "n" processes running, but only "m" 
responses were received, he could treat it as error. Another user might 
simply send periodical updates/polls to secondaries, for whatever reason 
(say, stats display), and won't really care if one of them just died, so 
there's no error for that user.

However, all of this has nothing to do with API. If we're able to send 
messages - it's not a failure. If we can't - it is. That's the part API 
should be concerned about, and that's what the return value should 
indicate, IMO.
  
Ananyev, Konstantin Jan. 25, 2018, 3:03 p.m. UTC | #9
> -----Original Message-----

> From: Burakov, Anatoly

> Sent: Thursday, January 25, 2018 1:10 PM

> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org

> Cc: Richardson, Bruce <bruce.richardson@intel.com>; thomas@monjalon.net

> Subject: Re: [dpdk-dev] [PATCH v3 2/3] eal: add synchronous multi-process communication

> 

> On 25-Jan-18 1:05 PM, Burakov, Anatoly wrote:

> > On 25-Jan-18 1:00 PM, Ananyev, Konstantin wrote:

> >>

> >>

> >>> -----Original Message-----

> >>> From: Burakov, Anatoly

> >>> Sent: Thursday, January 25, 2018 12:26 PM

> >>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Tan, Jianfeng

> >>> <jianfeng.tan@intel.com>; dev@dpdk.org

> >>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; thomas@monjalon.net

> >>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process

> >>> communication

> >>>

> >>> On 25-Jan-18 12:19 PM, Ananyev, Konstantin wrote:

> >>>>

> >>>>

> >>>>> -----Original Message-----

> >>>>> From: Burakov, Anatoly

> >>>>> Sent: Thursday, January 25, 2018 12:00 PM

> >>>>> To: Tan, Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org

> >>>>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Ananyev,

> >>>>> Konstantin <konstantin.ananyev@intel.com>; thomas@monjalon.net

> >>>>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process

> >>>>> communication

> >>>>>

> >>>>> On the overall patch,

> >>>>>

> >>>>> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

> >>>>>

> >>>>> For request(), returning number of replies received actually makes

> >>>>> sense, because now we get use the value to read our replies, if we

> >>>>> were

> >>>>> a primary process sending messages to secondary processes.

> >>>>

> >>>> Yes, I also think it is good to return number of sends.

> >>>> Then caller can compare number of sended requests with number of

> >>>> received replies and decide should it be considered a failure or no.

> >>>>

> >>>

> >>> Well, OK, that might make sense. However, i think it would've be of more

> >>> value to make the API consistent (0/-1 on success/failure) and put

> >>> number of sent messages into the reply, like number of received. I.e.

> >>> something like

> >>>

> >>> struct reply {

> >>>      int nb_sent;

> >>>      int nb_received;

> >>> };

> >>>

> >>> We do it for the latter already, so why not the former?

> >>

> >> The question is what treat as success/failure?

> >> Let say we sent 2 requests (of 3 possible), got back 1 response...

> >> Should we consider it as success or failure?

> >>

> >

> > I think "failure" is "something went wrong", not "secondary processes

> > didn't respond". For example, invalid parameters, or our socket suddenly

> > being closed, or some other error that prevents us from sending requests

> > to secondaries.

> >

> > As far as i can tell from the code, there's no way to know if the

> > secondary process is running other than by attempting to connect to it,

> > and get a response. So, failed connection should not be a failure

> > condition, because we can't know if we *can* connect to the process

> > until we do. Process may have ended, but socket files will still be

> > around, and there's nothing we can do about that. So i wouldn't consider

> > inability to send a message a failure condition.

> >

> 

> Just to clarify - i'm suggesting leaving this decision up to the user.

> If a user expects there to be "n" processes running, but only "m"

> responses were received, he could treat it as error. Another user might

> simply send periodical updates/polls to secondaries, for whatever reason

> (say, stats display), and won't really care if one of them just died, so

> there's no error for that user.

> 

> However, all of this has nothing to do with API. If we're able to send

> messages - it's not a failure. If we can't - it is. That's the part API

> should be concerned about, and that's what the return value should

> indicate, IMO.


Ok so to clarify, you are suggesting: 
we have N peers - if send_msg() returns success for all N - return success
(no matter did we get a reply or not)
Otherwise return a failure.
?
Konstantin


> 

> --

> Thanks,

> Anatoly
  
Anatoly Burakov Jan. 25, 2018, 4:22 p.m. UTC | #10
On 25-Jan-18 3:03 PM, Ananyev, Konstantin wrote:
> 
> 
>> -----Original Message-----
>> From: Burakov, Anatoly
>> Sent: Thursday, January 25, 2018 1:10 PM
>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org
>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; thomas@monjalon.net
>> Subject: Re: [dpdk-dev] [PATCH v3 2/3] eal: add synchronous multi-process communication
>>
>> On 25-Jan-18 1:05 PM, Burakov, Anatoly wrote:
>>> On 25-Jan-18 1:00 PM, Ananyev, Konstantin wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Burakov, Anatoly
>>>>> Sent: Thursday, January 25, 2018 12:26 PM
>>>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Tan, Jianfeng
>>>>> <jianfeng.tan@intel.com>; dev@dpdk.org
>>>>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; thomas@monjalon.net
>>>>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process
>>>>> communication
>>>>>
>>>>> On 25-Jan-18 12:19 PM, Ananyev, Konstantin wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Burakov, Anatoly
>>>>>>> Sent: Thursday, January 25, 2018 12:00 PM
>>>>>>> To: Tan, Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org
>>>>>>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Ananyev,
>>>>>>> Konstantin <konstantin.ananyev@intel.com>; thomas@monjalon.net
>>>>>>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process
>>>>>>> communication
>>>>>>>
>>>>>>> On the overall patch,
>>>>>>>
>>>>>>> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>>>>
>>>>>>> For request(), returning number of replies received actually makes
>>>>>>> sense, because now we get use the value to read our replies, if we
>>>>>>> were
>>>>>>> a primary process sending messages to secondary processes.
>>>>>>
>>>>>> Yes, I also think it is good to return number of sends.
>>>>>> Then caller can compare number of sended requests with number of
>>>>>> received replies and decide should it be considered a failure or no.
>>>>>>
>>>>>
>>>>> Well, OK, that might make sense. However, i think it would've be of more
>>>>> value to make the API consistent (0/-1 on success/failure) and put
>>>>> number of sent messages into the reply, like number of received. I.e.
>>>>> something like
>>>>>
>>>>> struct reply {
>>>>>       int nb_sent;
>>>>>       int nb_received;
>>>>> };
>>>>>
>>>>> We do it for the latter already, so why not the former?
>>>>
>>>> The question is what treat as success/failure?
>>>> Let say we sent 2 requests (of 3 possible), got back 1 response...
>>>> Should we consider it as success or failure?
>>>>
>>>
>>> I think "failure" is "something went wrong", not "secondary processes
>>> didn't respond". For example, invalid parameters, or our socket suddenly
>>> being closed, or some other error that prevents us from sending requests
>>> to secondaries.
>>>
>>> As far as i can tell from the code, there's no way to know if the
>>> secondary process is running other than by attempting to connect to it,
>>> and get a response. So, failed connection should not be a failure
>>> condition, because we can't know if we *can* connect to the process
>>> until we do. Process may have ended, but socket files will still be
>>> around, and there's nothing we can do about that. So i wouldn't consider
>>> inability to send a message a failure condition.
>>>
>>
>> Just to clarify - i'm suggesting leaving this decision up to the user.
>> If a user expects there to be "n" processes running, but only "m"
>> responses were received, he could treat it as error. Another user might
>> simply send periodical updates/polls to secondaries, for whatever reason
>> (say, stats display), and won't really care if one of them just died, so
>> there's no error for that user.
>>
>> However, all of this has nothing to do with API. If we're able to send
>> messages - it's not a failure. If we can't - it is. That's the part API
>> should be concerned about, and that's what the return value should
>> indicate, IMO.
> 
> Ok so to clarify, you are suggesting:
> we have N peers - if send_msg() returns success for all N - return success
> (no matter did we get a reply or not)
> Otherwise return a failure.
> ?
> Konstantin

More along the lines of, return -1 if and only if something went wrong. 
That might be invalid parameters, or that might be an error with our own 
socket, or something else to that effect. In all other cases, return 0 
(that includes cases where we sent N messages but M replies where N != 
M). So, in other words, return 0 if we *could have succeeded* if nothing 
went wrong on the other side, and only return -1 if something went wrong 
on our side.

> 
> 
>>
>> --
>> Thanks,
>> Anatoly
  
Jianfeng Tan Jan. 25, 2018, 5:10 p.m. UTC | #11
On 1/26/2018 12:22 AM, Burakov, Anatoly wrote:
> On 25-Jan-18 3:03 PM, Ananyev, Konstantin wrote:
>>
>>
>>> -----Original Message-----
>>> From: Burakov, Anatoly
>>> Sent: Thursday, January 25, 2018 1:10 PM
>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Tan, 
>>> Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org
>>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; thomas@monjalon.net
>>> Subject: Re: [dpdk-dev] [PATCH v3 2/3] eal: add synchronous 
>>> multi-process communication
>>>
>>> On 25-Jan-18 1:05 PM, Burakov, Anatoly wrote:
>>>> On 25-Jan-18 1:00 PM, Ananyev, Konstantin wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Burakov, Anatoly
>>>>>> Sent: Thursday, January 25, 2018 12:26 PM
>>>>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Tan, 
>>>>>> Jianfeng
>>>>>> <jianfeng.tan@intel.com>; dev@dpdk.org
>>>>>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; 
>>>>>> thomas@monjalon.net
>>>>>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process
>>>>>> communication
>>>>>>
>>>>>> On 25-Jan-18 12:19 PM, Ananyev, Konstantin wrote:
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Burakov, Anatoly
>>>>>>>> Sent: Thursday, January 25, 2018 12:00 PM
>>>>>>>> To: Tan, Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org
>>>>>>>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Ananyev,
>>>>>>>> Konstantin <konstantin.ananyev@intel.com>; thomas@monjalon.net
>>>>>>>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process
>>>>>>>> communication
>>>>>>>>
>>>>>>>> On the overall patch,
>>>>>>>>
>>>>>>>> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>>>>>
>>>>>>>> For request(), returning number of replies received actually makes
>>>>>>>> sense, because now we get use the value to read our replies, if we
>>>>>>>> were
>>>>>>>> a primary process sending messages to secondary processes.
>>>>>>>
>>>>>>> Yes, I also think it is good to return number of sends.
>>>>>>> Then caller can compare number of sended requests with number of
>>>>>>> received replies and decide should it be considered a failure or 
>>>>>>> no.
>>>>>>>
>>>>>>
>>>>>> Well, OK, that might make sense. However, i think it would've be 
>>>>>> of more
>>>>>> value to make the API consistent (0/-1 on success/failure) and put
>>>>>> number of sent messages into the reply, like number of received. 
>>>>>> I.e.
>>>>>> something like
>>>>>>
>>>>>> struct reply {
>>>>>>       int nb_sent;
>>>>>>       int nb_received;
>>>>>> };
>>>>>>
>>>>>> We do it for the latter already, so why not the former?
>>>>>
>>>>> The question is what treat as success/failure?
>>>>> Let say we sent 2 requests (of 3 possible), got back 1 response...
>>>>> Should we consider it as success or failure?
>>>>>
>>>>
>>>> I think "failure" is "something went wrong", not "secondary processes
>>>> didn't respond". For example, invalid parameters, or our socket 
>>>> suddenly
>>>> being closed, or some other error that prevents us from sending 
>>>> requests
>>>> to secondaries.
>>>>
>>>> As far as i can tell from the code, there's no way to know if the
>>>> secondary process is running other than by attempting to connect to 
>>>> it,
>>>> and get a response. So, failed connection should not be a failure
>>>> condition, because we can't know if we *can* connect to the process
>>>> until we do. Process may have ended, but socket files will still be
>>>> around, and there's nothing we can do about that. So i wouldn't 
>>>> consider
>>>> inability to send a message a failure condition.
>>>>
>>>
>>> Just to clarify - i'm suggesting leaving this decision up to the user.
>>> If a user expects there to be "n" processes running, but only "m"
>>> responses were received, he could treat it as error. Another user might
>>> simply send periodical updates/polls to secondaries, for whatever 
>>> reason
>>> (say, stats display), and won't really care if one of them just 
>>> died, so
>>> there's no error for that user.
>>>
>>> However, all of this has nothing to do with API. If we're able to send
>>> messages - it's not a failure. If we can't - it is. That's the part API
>>> should be concerned about, and that's what the return value should
>>> indicate, IMO.
>>
>> Ok so to clarify, you are suggesting:
>> we have N peers - if send_msg() returns success for all N - return 
>> success
>> (no matter did we get a reply or not)
>> Otherwise return a failure.
>> ?
>> Konstantin
>
> More along the lines of, return -1 if and only if something went 
> wrong. That might be invalid parameters, or that might be an error 
> with our own socket,

To check if the error is caused by our own socket, we check the errno 
after sendmsg?

Like for remote socket errors, we check:
- ECONNRESET
- ECONNREFUSED
- ENOBUFS

Right?

Thanks,
Jianfeng


> or something else to that effect. In all other cases, return 0 (that 
> includes cases where we sent N messages but M replies where N != M). 
> So, in other words, return 0 if we *could have succeeded* if nothing 
> went wrong on the other side, and only return -1 if something went 
> wrong on our side.
>
>>
>>
>>>
>>> -- 
>>> Thanks,
>>> Anatoly
>
>
  
Anatoly Burakov Jan. 25, 2018, 6:02 p.m. UTC | #12
On 25-Jan-18 5:10 PM, Tan, Jianfeng wrote:
> 
> 
> On 1/26/2018 12:22 AM, Burakov, Anatoly wrote:
>> On 25-Jan-18 3:03 PM, Ananyev, Konstantin wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Burakov, Anatoly
>>>> Sent: Thursday, January 25, 2018 1:10 PM
>>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Tan, 
>>>> Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org
>>>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; thomas@monjalon.net
>>>> Subject: Re: [dpdk-dev] [PATCH v3 2/3] eal: add synchronous 
>>>> multi-process communication
>>>>
>>>> On 25-Jan-18 1:05 PM, Burakov, Anatoly wrote:
>>>>> On 25-Jan-18 1:00 PM, Ananyev, Konstantin wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Burakov, Anatoly
>>>>>>> Sent: Thursday, January 25, 2018 12:26 PM
>>>>>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Tan, 
>>>>>>> Jianfeng
>>>>>>> <jianfeng.tan@intel.com>; dev@dpdk.org
>>>>>>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; 
>>>>>>> thomas@monjalon.net
>>>>>>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process
>>>>>>> communication
>>>>>>>
>>>>>>> On 25-Jan-18 12:19 PM, Ananyev, Konstantin wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Burakov, Anatoly
>>>>>>>>> Sent: Thursday, January 25, 2018 12:00 PM
>>>>>>>>> To: Tan, Jianfeng <jianfeng.tan@intel.com>; dev@dpdk.org
>>>>>>>>> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Ananyev,
>>>>>>>>> Konstantin <konstantin.ananyev@intel.com>; thomas@monjalon.net
>>>>>>>>> Subject: Re: [PATCH v3 2/3] eal: add synchronous multi-process
>>>>>>>>> communication
>>>>>>>>>
>>>>>>>>> On the overall patch,
>>>>>>>>>
>>>>>>>>> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>>>>>>
>>>>>>>>> For request(), returning number of replies received actually makes
>>>>>>>>> sense, because now we get use the value to read our replies, if we
>>>>>>>>> were
>>>>>>>>> a primary process sending messages to secondary processes.
>>>>>>>>
>>>>>>>> Yes, I also think it is good to return number of sends.
>>>>>>>> Then caller can compare number of sended requests with number of
>>>>>>>> received replies and decide should it be considered a failure or 
>>>>>>>> no.
>>>>>>>>
>>>>>>>
>>>>>>> Well, OK, that might make sense. However, i think it would've be 
>>>>>>> of more
>>>>>>> value to make the API consistent (0/-1 on success/failure) and put
>>>>>>> number of sent messages into the reply, like number of received. 
>>>>>>> I.e.
>>>>>>> something like
>>>>>>>
>>>>>>> struct reply {
>>>>>>>       int nb_sent;
>>>>>>>       int nb_received;
>>>>>>> };
>>>>>>>
>>>>>>> We do it for the latter already, so why not the former?
>>>>>>
>>>>>> The question is what treat as success/failure?
>>>>>> Let say we sent 2 requests (of 3 possible), got back 1 response...
>>>>>> Should we consider it as success or failure?
>>>>>>
>>>>>
>>>>> I think "failure" is "something went wrong", not "secondary processes
>>>>> didn't respond". For example, invalid parameters, or our socket 
>>>>> suddenly
>>>>> being closed, or some other error that prevents us from sending 
>>>>> requests
>>>>> to secondaries.
>>>>>
>>>>> As far as i can tell from the code, there's no way to know if the
>>>>> secondary process is running other than by attempting to connect to 
>>>>> it,
>>>>> and get a response. So, failed connection should not be a failure
>>>>> condition, because we can't know if we *can* connect to the process
>>>>> until we do. Process may have ended, but socket files will still be
>>>>> around, and there's nothing we can do about that. So i wouldn't 
>>>>> consider
>>>>> inability to send a message a failure condition.
>>>>>
>>>>
>>>> Just to clarify - i'm suggesting leaving this decision up to the user.
>>>> If a user expects there to be "n" processes running, but only "m"
>>>> responses were received, he could treat it as error. Another user might
>>>> simply send periodical updates/polls to secondaries, for whatever 
>>>> reason
>>>> (say, stats display), and won't really care if one of them just 
>>>> died, so
>>>> there's no error for that user.
>>>>
>>>> However, all of this has nothing to do with API. If we're able to send
>>>> messages - it's not a failure. If we can't - it is. That's the part API
>>>> should be concerned about, and that's what the return value should
>>>> indicate, IMO.
>>>
>>> Ok so to clarify, you are suggesting:
>>> we have N peers - if send_msg() returns success for all N - return 
>>> success
>>> (no matter did we get a reply or not)
>>> Otherwise return a failure.
>>> ?
>>> Konstantin
>>
>> More along the lines of, return -1 if and only if something went 
>> wrong. That might be invalid parameters, or that might be an error 
>> with our own socket,
> 
> To check if the error is caused by our own socket, we check the errno 
> after sendmsg?
> 
> Like for remote socket errors, we check:
> - ECONNRESET
> - ECONNREFUSED
> - ENOBUFS
> 
> Right?
> 
> Thanks,
> Jianfeng

Well, that was only an example. If it doesn't make much sense to do so 
in this case, then don't, and only return -1 on invalid parameters. 
AFAIU we're using connectionless sockets so a bunch of these errors 
won't be applicable to us. Maybe -ENOBUFS, but i'm not sure it's worth 
it to check for that.

> 
> 
>> or something else to that effect. In all other cases, return 0 (that 
>> includes cases where we sent N messages but M replies where N != M). 
>> So, in other words, return 0 if we *could have succeeded* if nothing 
>> went wrong on the other side, and only return -1 if something went 
>> wrong on our side.
>>
>>>
>>>
>>>>
>>>> -- 
>>>> Thanks,
>>>> Anatoly
>>
>>
> 
>
  

Patch

diff --git a/doc/guides/rel_notes/release_18_02.rst b/doc/guides/rel_notes/release_18_02.rst
index 00b3224..f6ed666 100644
--- a/doc/guides/rel_notes/release_18_02.rst
+++ b/doc/guides/rel_notes/release_18_02.rst
@@ -151,6 +151,21 @@  New Features
   renamed the application from SW PMD specific ``eventdev_pipeline_sw_pmd``
   to PMD agnostic ``eventdev_pipeline``.
 
+* **Added new multi-process communication channel**
+
+  Added a generic channel in EAL for multi-process (primary/secondary) synchronous
+  and asynchronous communication. Each component who wants to reponse a message
+  shall register the action; and each process has a thread to receive the message
+  and invokes the registered action. The list of new APIs:
+
+  * ``rte_eal_mp_register``
+  * ``rte_eal_mp_unregister``
+  * ``rte_eal_mp_sendmsg``
+  * ``rte_eal_mp_request``
+  * ``rte_eal_mp_reply``
+
+  Note as we changed to use the new channel for communication, applications cannot
+  talk with old version through the old (private) communication channel.
 
 API Changes
 -----------
diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c
index baeb7d1..69df943 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -12,6 +12,7 @@ 
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
+#include <sys/time.h>
 #include <sys/types.h>
 #include <sys/socket.h>
 #include <sys/un.h>
@@ -44,6 +45,50 @@  TAILQ_HEAD(action_entry_list, action_entry);
 static struct action_entry_list action_entry_list =
 	TAILQ_HEAD_INITIALIZER(action_entry_list);
 
+enum mp_type {
+	MP_MSG, /* Share message with peers, will not block */
+	MP_REQ, /* Request for information, Will block for a reply */
+	MP_REP, /* Response to previously-received request */
+};
+
+struct mp_msg_internal {
+	int type;
+	struct rte_mp_msg msg;
+};
+
+struct sync_request {
+	int reply_received;
+	char dst[PATH_MAX];
+	struct rte_mp_msg *request;
+	struct rte_mp_msg *reply;
+	pthread_cond_t cond;
+	TAILQ_ENTRY(sync_request) next;
+};
+
+TAILQ_HEAD(sync_request_list, sync_request);
+
+static struct {
+	struct sync_request_list requests;
+	pthread_mutex_t lock;
+} sync_requests = {
+	.requests = TAILQ_HEAD_INITIALIZER(sync_requests.requests),
+	.lock = PTHREAD_MUTEX_INITIALIZER
+};
+
+static struct sync_request *
+find_sync_request(const char *dst, const char *act_name)
+{
+	struct sync_request *r;
+
+	TAILQ_FOREACH(r, &sync_requests.requests, next) {
+		if (!strcmp(r->dst, dst) &&
+		    !strcmp(r->request->name, act_name))
+			break;
+	}
+
+	return r;
+}
+
 int
 rte_eal_primary_proc_alive(const char *config_file_path)
 {
@@ -147,19 +192,21 @@  rte_eal_mp_action_unregister(const char *name)
 }
 
 static int
-read_msg(struct rte_mp_msg *msg)
+read_msg(struct mp_msg_internal *m, struct sockaddr_un *s)
 {
 	int msglen;
 	struct iovec iov;
 	struct msghdr msgh;
-	char control[CMSG_SPACE(sizeof(msg->fds))];
+	char control[CMSG_SPACE(sizeof(m->msg.fds))];
 	struct cmsghdr *cmsg;
-	int buflen = sizeof(*msg) - sizeof(msg->fds);
+	int buflen = sizeof(*m) - sizeof(m->msg.fds);
 
 	memset(&msgh, 0, sizeof(msgh));
-	iov.iov_base = msg;
+	iov.iov_base = m;
 	iov.iov_len  = buflen;
 
+	msgh.msg_name = s;
+	msgh.msg_namelen = sizeof(*s);
 	msgh.msg_iov = &iov;
 	msgh.msg_iovlen = 1;
 	msgh.msg_control = control;
@@ -181,7 +228,7 @@  read_msg(struct rte_mp_msg *msg)
 		cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
 		if ((cmsg->cmsg_level == SOL_SOCKET) &&
 			(cmsg->cmsg_type == SCM_RIGHTS)) {
-			memcpy(msg->fds, CMSG_DATA(cmsg), sizeof(msg->fds));
+			memcpy(m->msg.fds, CMSG_DATA(cmsg), sizeof(m->msg.fds));
 			break;
 		}
 	}
@@ -190,12 +237,28 @@  read_msg(struct rte_mp_msg *msg)
 }
 
 static void
-process_msg(struct rte_mp_msg *msg)
+process_msg(struct mp_msg_internal *m, struct sockaddr_un *s)
 {
+	struct sync_request *sync_req;
 	struct action_entry *entry;
+	struct rte_mp_msg *msg = &m->msg;
 	rte_eal_mp_t action = NULL;
 
 	RTE_LOG(DEBUG, EAL, "msg: %s\n", msg->name);
+
+	if (m->type == MP_REP) {
+		pthread_mutex_lock(&sync_requests.lock);
+		sync_req = find_sync_request(s->sun_path, msg->name);
+		if (sync_req) {
+			memcpy(sync_req->reply, msg, sizeof(*msg));
+			sync_req->reply_received = 1;
+			pthread_cond_signal(&sync_req->cond);
+		} else
+			RTE_LOG(ERR, EAL, "Drop mp reply: %s\n", msg->name);
+		pthread_mutex_unlock(&sync_requests.lock);
+		return;
+	}
+
 	pthread_mutex_lock(&mp_mutex_action);
 	entry = find_action_entry_by_name(msg->name);
 	if (entry != NULL)
@@ -204,18 +267,19 @@  process_msg(struct rte_mp_msg *msg)
 
 	if (!action)
 		RTE_LOG(ERR, EAL, "Cannot find action: %s\n", msg->name);
-	else if (action(msg) < 0)
+	else if (action(msg, s->sun_path) < 0)
 		RTE_LOG(ERR, EAL, "Fail to handle message: %s\n", msg->name);
 }
 
 static void *
 mp_handle(void *arg __rte_unused)
 {
-	struct rte_mp_msg msg;
+	struct mp_msg_internal msg;
+	struct sockaddr_un sa;
 
 	while (1) {
-		if (read_msg(&msg) == 0)
-			process_msg(&msg);
+		if (read_msg(&msg, &sa) == 0)
+			process_msg(&msg, &sa);
 	}
 
 	return NULL;
@@ -309,16 +373,20 @@  rte_eal_mp_channel_init(void)
 }
 
 static int
-send_msg(const char *dst_path, struct rte_mp_msg *msg)
+send_msg(const char *dst_path, struct rte_mp_msg *msg, int type)
 {
 	int snd;
 	struct iovec iov;
 	struct msghdr msgh;
 	struct cmsghdr *cmsg;
 	struct sockaddr_un dst;
+	struct mp_msg_internal m;
 	int fd_size = msg->num_fds * sizeof(int);
 	char control[CMSG_SPACE(fd_size)];
 
+	m.type = type;
+	memcpy(&m.msg, msg, sizeof(*msg));
+
 	memset(&dst, 0, sizeof(dst));
 	dst.sun_family = AF_UNIX;
 	snprintf(dst.sun_path, sizeof(dst.sun_path), "%s", dst_path);
@@ -326,8 +394,8 @@  send_msg(const char *dst_path, struct rte_mp_msg *msg)
 	memset(&msgh, 0, sizeof(msgh));
 	memset(control, 0, sizeof(control));
 
-	iov.iov_base = msg;
-	iov.iov_len = sizeof(*msg) - sizeof(msg->fds);
+	iov.iov_base = &m;
+	iov.iov_len = sizeof(m) - sizeof(msg->fds);
 
 	msgh.msg_name = &dst;
 	msgh.msg_namelen = sizeof(dst);
@@ -355,12 +423,16 @@  send_msg(const char *dst_path, struct rte_mp_msg *msg)
 }
 
 static int
-mp_send(struct rte_mp_msg *msg)
+mp_send(struct rte_mp_msg *msg, const char *peer, int type)
 {
 	int n = 0;
 	DIR *mp_dir;
 	struct dirent *ent;
 
+
+	if (peer)
+		return send_msg(peer, msg, type);
+
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
 		/* broadcast to all secondary processes */
 		mp_dir = opendir(mp_dir_path);
@@ -373,11 +445,11 @@  mp_send(struct rte_mp_msg *msg)
 			if (fnmatch(mp_filter, ent->d_name, 0) != 0)
 				continue;
 
-			n += send_msg(ent->d_name, msg);
+			n += send_msg(ent->d_name, msg, type);
 		}
 		closedir(mp_dir);
 	} else
-		n += send_msg(eal_mp_socket_path(), msg);
+		n += send_msg(eal_mp_socket_path(), msg, type);
 
 	return n;
 }
@@ -417,5 +489,136 @@  rte_eal_mp_sendmsg(struct rte_mp_msg *msg)
 		return -1;
 
 	RTE_LOG(DEBUG, EAL, "sendmsg: %s\n", msg->name);
-	return mp_send(msg);
+	return mp_send(msg, NULL, MP_MSG);
+}
+
+static int
+mp_request_one(const char *dst, struct rte_mp_msg *req,
+	       struct rte_mp_reply *reply, const struct timespec *ts)
+{
+	struct timeval now;
+	struct rte_mp_msg msg, *tmp;
+	struct sync_request sync_req, *exist;
+
+	sync_req.reply_received = 0;
+	strcpy(sync_req.dst, dst);
+	sync_req.request = req;
+	sync_req.reply = &msg;
+	pthread_cond_init(&sync_req.cond, NULL);
+
+	pthread_mutex_lock(&sync_requests.lock);
+	exist = find_sync_request(dst, req->name);
+	if (!exist)
+		TAILQ_INSERT_TAIL(&sync_requests.requests, &sync_req, next);
+	pthread_mutex_unlock(&sync_requests.lock);
+	if (exist) {
+		RTE_LOG(ERR, EAL, "A pending request %s:%s\n", dst, req->name);
+		return 0;
+	}
+
+	if (send_msg(dst, req, MP_REQ) != 1) {
+		RTE_LOG(ERR, EAL, "Fail to send request %s:%s\n",
+			dst, req->name);
+		return 0;
+	}
+
+	pthread_mutex_lock(&sync_requests.lock);
+	do {
+		pthread_cond_timedwait(&sync_req.cond, &sync_requests.lock, ts);
+		/* Check spurious wakeups */
+		if (sync_req.reply_received == 1)
+			break;
+		/* Check if time is out */
+		if (gettimeofday(&now, NULL) < 0)
+			break;
+		if (now.tv_sec < ts->tv_sec)
+			break;
+		else if (now.tv_sec == ts->tv_sec &&
+			 now.tv_usec * 1000 < ts->tv_nsec)
+			break;
+	} while (1);
+	/* We got the lock now */
+	TAILQ_REMOVE(&sync_requests.requests, &sync_req, next);
+	pthread_mutex_unlock(&sync_requests.lock);
+
+	if (sync_req.reply_received == 0) {
+		RTE_LOG(ERR, EAL, "Fail to recv reply for request %s:%s\n",
+			dst, req->name);
+		return 1;
+	}
+
+	tmp = realloc(reply->msgs, sizeof(msg) * (reply->nb_msgs + 1));
+	if (!tmp) {
+		RTE_LOG(ERR, EAL, "Fail to alloc reply for request %s:%s\n",
+			dst, req->name);
+		return 1;
+	}
+	memcpy(&tmp[reply->nb_msgs], &msg, sizeof(msg));
+	reply->msgs = tmp;
+	reply->nb_msgs++;
+	return 1;
+}
+
+int
+rte_eal_mp_request(struct rte_mp_msg *req,
+		   struct rte_mp_reply *reply,
+		   const struct timespec *ts)
+{
+	DIR *mp_dir;
+	struct dirent *ent;
+	int nb_snds = 0;
+	struct timeval now;
+	struct timespec end;
+
+	RTE_LOG(DEBUG, EAL, "request: %s\n", req->name);
+
+	if (check_input(req) == false)
+		return -1;
+	if (gettimeofday(&now, NULL) < 0) {
+		RTE_LOG(ERR, EAL, "Faile to get current time\n");
+		return -1;
+	}
+	end.tv_nsec = (now.tv_usec * 1000 + ts->tv_nsec) % 1000000000;
+	end.tv_sec = now.tv_sec + ts->tv_sec +
+			(now.tv_usec * 1000 + ts->tv_nsec) / 1000000000;
+
+	reply->nb_msgs = 0;
+	reply->msgs = NULL;
+
+	/* for secondary process, send request to the primary process only */
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
+		return mp_request_one(eal_mp_socket_path(), req, reply, &end);
+
+	/* for primary process, broadcast request, and collect reply 1 by 1 */
+	mp_dir = opendir(mp_dir_path);
+	if (!mp_dir) {
+		RTE_LOG(ERR, EAL, "Unable to open directory %s\n", mp_dir_path);
+		return -1;
+	}
+	while ((ent = readdir(mp_dir))) {
+		if (fnmatch(mp_filter, ent->d_name, 0) != 0)
+			continue;
+
+		nb_snds += mp_request_one(ent->d_name, req, reply, &end);
+	}
+	closedir(mp_dir);
+
+	return nb_snds;
+}
+
+int
+rte_eal_mp_reply(struct rte_mp_msg *msg, const char *peer)
+{
+
+	RTE_LOG(DEBUG, EAL, "reply: %s\n", msg->name);
+
+	if (check_input(msg) == false)
+		return -1;
+
+	if (peer == NULL) {
+		RTE_LOG(ERR, EAL, "peer is not specified\n");
+		return -1;
+	}
+
+	return mp_send(msg, peer, MP_REP);
 }
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 9a1aac2..8e234e0 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -13,6 +13,7 @@ 
 
 #include <stdint.h>
 #include <sched.h>
+#include <time.h>
 
 #include <rte_config.h>
 #include <rte_per_lcore.h>
@@ -197,13 +198,18 @@  struct rte_mp_msg {
 	int fds[RTE_MP_MAX_FD_NUM];
 };
 
+struct rte_mp_reply {
+	int nb_msgs;
+	struct rte_mp_msg *msgs;
+};
+
 /**
  * Action function typedef used by other components.
  *
  * As we create  socket channel for primary/secondary communication, use
  * this function typedef to register action for coming messages.
  */
-typedef int (*rte_eal_mp_t)(const struct rte_mp_msg *msg);
+typedef int (*rte_eal_mp_t)(const struct rte_mp_msg *msg, const void *peer);
 
 /**
  * @warning
@@ -262,6 +268,56 @@  void rte_eal_mp_action_unregister(const char *name);
 int rte_eal_mp_sendmsg(struct rte_mp_msg *msg);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Send a request to the peer process and expect a reply.
+ *
+ * This function sends a request message to the peer process, and will
+ * block until receiving reply message from the peer process.
+ *
+ * @note The caller is responsible to free reply->replies.
+ *
+ * @param req
+ *   The req argument contains the customized request message.
+ *
+ * @param reply
+ *   The reply argument will be for storing all the replied messages;
+ *   the caller is responsible for free reply->replies.
+ *
+ * @param ts
+ *   The ts argument specifies how long we can wait for the peer(s) to reply.
+ *
+ * @return
+ *  - (<0) on invalid parameters;
+ *  - (>=0) as the number of messages being sent successfully.
+ */
+int rte_eal_mp_request(struct rte_mp_msg *req,
+			struct rte_mp_reply *reply, const struct timespec *ts);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Send a reply to the peer process.
+ *
+ * This function will send a reply message in response to a request message
+ * received previously.
+ *
+ * @param msg
+ *   The msg argument contains the customized message.
+ *
+ * @param peer
+ *   The peer argument is the pointer to the peer socket path.
+ *
+ * @return
+ *  - (1) on success;
+ *  - (0) on failure;
+ *  - (<0) on invalid parameters.
+ */
+int rte_eal_mp_reply(struct rte_mp_msg *msg, const char *peer);
+
+/**
  * Usage function typedef used by the application usage function.
  *
  * Use this function typedef to define and call rte_set_application_usage_hook()
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index adeadfb..3015bc6 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -220,6 +220,9 @@  EXPERIMENTAL {
 	rte_eal_mp_action_register;
 	rte_eal_mp_action_unregister;
 	rte_eal_mp_sendmsg;
+	rte_eal_mp_request;
+	rte_eal_mp_reply;
+	rte_eal_mp_sendmsg;
 	rte_service_attr_get;
 	rte_service_attr_reset_all;
 	rte_service_component_register;