[V4,5/9] bus: add helper to handle sigbus

Message ID 1530268248-7328-6-git-send-email-jia.guo@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [V4,1/9] bus: introduce hotplug failure handler |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail apply issues

Commit Message

Guo, Jia June 29, 2018, 10:30 a.m. UTC
  This patch aim to add a helper to iterate all buses to find the
corresponding bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v4->v3:
split patches to be small and clear.
---
 lib/librte_eal/common/eal_common_bus.c | 34 +++++++++++++++++++++++++++++++++-
 lib/librte_eal/common/eal_private.h    | 11 +++++++++++
 2 files changed, 44 insertions(+), 1 deletion(-)
  

Comments

Ananyev, Konstantin June 29, 2018, 10:51 a.m. UTC | #1
> +int
> +rte_bus_sigbus_handler(const void *failure_addr)
> +{
> +	struct rte_bus *bus;
> +	int old_errno = rte_errno;
> +	int ret = 0;
> +
> +	rte_errno = 0;
> +
> +	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
> +	if (bus == NULL) {
> +		RTE_LOG(ERR, EAL, "No bus can handle the sigbus error!");
> +		ret = -1;
> +	} else if (rte_errno != 0) {
> +		RTE_LOG(ERR, EAL, "Failed to handle the sigbus error!");
> +		ret = -1;
> +	}
> +
> +	/* if sigbus not be handled, return back old errno. */
> +	if (ret)
> +		rte_errno = old_errno;

Hmm, not sure why we need to set/restore rte_errno here?

> +
> +	return ret;
> +}
  
Guo, Jia June 29, 2018, 11:23 a.m. UTC | #2
hi, konstantin


On 6/29/2018 6:51 PM, Ananyev, Konstantin wrote:
>> +int
>> +rte_bus_sigbus_handler(const void *failure_addr)
>> +{
>> +	struct rte_bus *bus;
>> +	int old_errno = rte_errno;
>> +	int ret = 0;
>> +
>> +	rte_errno = 0;
>> +
>> +	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
>> +	if (bus == NULL) {
>> +		RTE_LOG(ERR, EAL, "No bus can handle the sigbus error!");
>> +		ret = -1;
>> +	} else if (rte_errno != 0) {
>> +		RTE_LOG(ERR, EAL, "Failed to handle the sigbus error!");
>> +		ret = -1;
>> +	}
>> +
>> +	/* if sigbus not be handled, return back old errno. */
>> +	if (ret)
>> +		rte_errno = old_errno;
> Hmm, not sure why we need to set/restore rte_errno here?

restore old_errno just use to let caller know that the generic sigbus 
still not handler by bus hotplug handler,  that involve find a bus 
handle but failed and can not find a hander,  and can corresponding use 
the previous sigbus handler to process it.
that is also unwser your question in other patch. do you think that make 
sense?

>> +
>> +	return ret;
>> +}
  
Ananyev, Konstantin June 29, 2018, 12:21 p.m. UTC | #3
> -----Original Message-----
> From: Guo, Jia
> Sent: Friday, June 29, 2018 12:23 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; Wu, Jingjing
> <jingjing.wu@intel.com>; thomas@monjalon.net; motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry
> <harry.van.haaren@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; He, Shaopeng <shaopeng.he@intel.com>; Iremonger, Bernard
> <bernard.iremonger@intel.com>
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH V4 5/9] bus: add helper to handle sigbus
> 
> hi, konstantin
> 
> 
> On 6/29/2018 6:51 PM, Ananyev, Konstantin wrote:
> >> +int
> >> +rte_bus_sigbus_handler(const void *failure_addr)
> >> +{
> >> +	struct rte_bus *bus;
> >> +	int old_errno = rte_errno;
> >> +	int ret = 0;
> >> +
> >> +	rte_errno = 0;
> >> +
> >> +	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
> >> +	if (bus == NULL) {
> >> +		RTE_LOG(ERR, EAL, "No bus can handle the sigbus error!");
> >> +		ret = -1;
> >> +	} else if (rte_errno != 0) {
> >> +		RTE_LOG(ERR, EAL, "Failed to handle the sigbus error!");
> >> +		ret = -1;
> >> +	}
> >> +
> >> +	/* if sigbus not be handled, return back old errno. */
> >> +	if (ret)
> >> +		rte_errno = old_errno;
> > Hmm, not sure why we need to set/restore rte_errno here?
> 
> restore old_errno just use to let caller know that the generic sigbus
> still not handler by bus hotplug handler,  that involve find a bus
> handle but failed and can not find a hander,  and can corresponding use
> the previous sigbus handler to process it.
> that is also unwser your question in other patch. do you think that make
> sense?

Sorry, still don't understand the intention.
Suppose rte_bus_find() will return NULL, in that case you'll setup rte_errno
to what it was before calling that function.
If the returned bus is not NULL, but bus_find() set's an rte_errno,
you still would restore rte_ernno?
What is the prupose?
Why do you need to touch rte_errno at all in that function?
Konstantin

> 
> >> +
> >> +	return ret;
> >> +}
  
Gaëtan Rivet June 29, 2018, 12:52 p.m. UTC | #4
On Fri, Jun 29, 2018 at 12:21:39PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Guo, Jia
> > Sent: Friday, June 29, 2018 12:23 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; Richardson, Bruce
> > <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; Wu, Jingjing
> > <jingjing.wu@intel.com>; thomas@monjalon.net; motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry
> > <harry.van.haaren@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; He, Shaopeng <shaopeng.he@intel.com>; Iremonger, Bernard
> > <bernard.iremonger@intel.com>
> > Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>
> > Subject: Re: [PATCH V4 5/9] bus: add helper to handle sigbus
> > 
> > hi, konstantin
> > 
> > 
> > On 6/29/2018 6:51 PM, Ananyev, Konstantin wrote:
> > >> +int
> > >> +rte_bus_sigbus_handler(const void *failure_addr)
> > >> +{
> > >> +	struct rte_bus *bus;
> > >> +	int old_errno = rte_errno;
> > >> +	int ret = 0;
> > >> +
> > >> +	rte_errno = 0;
> > >> +
> > >> +	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
> > >> +	if (bus == NULL) {
> > >> +		RTE_LOG(ERR, EAL, "No bus can handle the sigbus error!");
> > >> +		ret = -1;
> > >> +	} else if (rte_errno != 0) {
> > >> +		RTE_LOG(ERR, EAL, "Failed to handle the sigbus error!");
> > >> +		ret = -1;
> > >> +	}
> > >> +
> > >> +	/* if sigbus not be handled, return back old errno. */
> > >> +	if (ret)
> > >> +		rte_errno = old_errno;
> > > Hmm, not sure why we need to set/restore rte_errno here?
> > 
> > restore old_errno just use to let caller know that the generic sigbus
> > still not handler by bus hotplug handler,  that involve find a bus
> > handle but failed and can not find a hander,  and can corresponding use
> > the previous sigbus handler to process it.
> > that is also unwser your question in other patch. do you think that make
> > sense?
> 
> Sorry, still don't understand the intention.
> Suppose rte_bus_find() will return NULL, in that case you'll setup rte_errno
> to what it was before calling that function.
> If the returned bus is not NULL, but bus_find() set's an rte_errno,
> you still would restore rte_ernno?
> What is the prupose?
> Why do you need to touch rte_errno at all in that function?
> Konstantin
> 

The way it is written here does not work, but the intention is
to make sure that a previous error is still catched. Something like
that:

   int old_errno = rte_errno;
   
   rte_errno = 0;
   rte_eal_call();
   
   if (rte_errno)
       return -1;
   else {
       rte_errno = old_errno;
       return 0;
   }

If someone calls the function while rte_errno is already set, then an
earlier error would be hidden by setting rte_errno to 0 within the
function.

I'm not sure this is useful, but sometimes when using errno within a
library call I'm bothered that I am masking previous issues.

Should it be avoided?
  
Guo, Jia July 3, 2018, 11:24 a.m. UTC | #5
hi, gaetan and konstantin

answer both of your questions here as below.

On 6/29/2018 8:52 PM, Gaëtan Rivet wrote:
> On Fri, Jun 29, 2018 at 12:21:39PM +0000, Ananyev, Konstantin wrote:
>>
>>> -----Original Message-----
>>> From: Guo, Jia
>>> Sent: Friday, June 29, 2018 12:23 PM
>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; Richardson, Bruce
>>> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; Wu, Jingjing
>>> <jingjing.wu@intel.com>; thomas@monjalon.net; motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry
>>> <harry.van.haaren@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; He, Shaopeng <shaopeng.he@intel.com>; Iremonger, Bernard
>>> <bernard.iremonger@intel.com>
>>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>
>>> Subject: Re: [PATCH V4 5/9] bus: add helper to handle sigbus
>>>
>>> hi, konstantin
>>>
>>>
>>> On 6/29/2018 6:51 PM, Ananyev, Konstantin wrote:
>>>>> +int
>>>>> +rte_bus_sigbus_handler(const void *failure_addr)
>>>>> +{
>>>>> +	struct rte_bus *bus;
>>>>> +	int old_errno = rte_errno;
>>>>> +	int ret = 0;
>>>>> +
>>>>> +	rte_errno = 0;
>>>>> +
>>>>> +	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
>>>>> +	if (bus == NULL) {
>>>>> +		RTE_LOG(ERR, EAL, "No bus can handle the sigbus error!");
>>>>> +		ret = -1;
>>>>> +	} else if (rte_errno != 0) {
>>>>> +		RTE_LOG(ERR, EAL, "Failed to handle the sigbus error!");
>>>>> +		ret = -1;
>>>>> +	}
>>>>> +
>>>>> +	/* if sigbus not be handled, return back old errno. */
>>>>> +	if (ret)
>>>>> +		rte_errno = old_errno;
>>>> Hmm, not sure why we need to set/restore rte_errno here?
>>> restore old_errno just use to let caller know that the generic sigbus
>>> still not handler by bus hotplug handler,  that involve find a bus
>>> handle but failed and can not find a hander,  and can corresponding use
>>> the previous sigbus handler to process it.
>>> that is also unwser your question in other patch. do you think that make
>>> sense?
>> Sorry, still don't understand the intention.
>> Suppose rte_bus_find() will return NULL, in that case you'll setup rte_errno
>> to what it was before calling that function.
>> If the returned bus is not NULL, but bus_find() set's an rte_errno,
>> you still would restore rte_ernno?
>> What is the prupose?
>> Why do you need to touch rte_errno at all in that function?
>> Konstantin
>>
> The way it is written here does not work, but the intention is
> to make sure that a previous error is still catched. Something like
> that:
>
>     int old_errno = rte_errno;
>     
>     rte_errno = 0;
>     rte_eal_call();
>     
>     if (rte_errno)
>         return -1;
>     else {
>         rte_errno = old_errno;
>         return 0;
>     }
>
> If someone calls the function while rte_errno is already set, then an
> earlier error would be hidden by setting rte_errno to 0 within the
> function.
>
> I'm not sure this is useful, but sometimes when using errno within a
> library call I'm bothered that I am masking previous issues.
>
> Should it be avoided?

i agree with konstantin about distinguish to process the handle failed 
or no handle,
and agree with gaetan about restore the errno if it is not belong to the 
sigbus handler.
Could you check if it is fulfill that as bellow,

-1 means find bus but handle failed,  use rte_exit.
1 means can no find bus, use older handler to handle.
0 means find bus and success handle. the handler is the new handler.

static int
bus_handle_sigbus(const struct rte_bus *bus,
             const void *failure_addr)
{
     int ret;
     ret = bus->sigbus_handler(failure_addr);
     rte_errno = ret;
     return !(bus->sigbus_handler && ret <= 0);
}

int
rte_bus_sigbus_handler(const void *failure_addr)
{
     struct rte_bus *bus;
     int ret = 0;
     int old_errno = rte_errno;
     rte_errno = 0;
     bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
     /* failed to handle the sigbus, pass the new errno. */
     if (bus && rte_errno == -1)
         return -1;
     else if (!bus)
         ret =1;

     /* otherwise restore the old errno. */
     rte_errno = old_errno;

     return ret;
}

static void sigbus_handler(int signum, siginfo_t *info,
                 void *ctx __rte_unused)
{
     int ret;

     rte_spinlock_lock(&dev_failure_lock);
     ret = rte_bus_sigbus_handler(info->si_addr);
     rte_spinlock_unlock(&dev_failure_lock);
     if (ret == -1) {
         rte_exit(EXIT_FAILURE,
              "Failed to handle SIGBUS for hotplug, "
              "(rte_errno: %s)!", strerror(rte_errno));
     } else if (ret == 1) {
         if (sigbus_action_old.sa_handler)
             (*(sigbus_action_old.sa_handler))(signum);
         else
             rte_exit(EXIT_FAILURE,
                  "Failed to handle generic SIGBUS!");
     }
}
  

Patch

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..34c4f2d 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@ 
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -220,7 +221,6 @@  rte_bus_find_by_device_name(const char *str)
 	return rte_bus_find(NULL, bus_can_parse, name);
 }
 
-
 /*
  * Get iommu class of devices on the bus.
  */
@@ -242,3 +242,35 @@  rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	return !(bus->sigbus_handler && bus->sigbus_handler(failure_addr) <= 0);
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+	int old_errno = rte_errno;
+	int ret = 0;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	if (bus == NULL) {
+		RTE_LOG(ERR, EAL, "No bus can handle the sigbus error!");
+		ret = -1;
+	} else if (rte_errno != 0) {
+		RTE_LOG(ERR, EAL, "Failed to handle the sigbus error!");
+		ret = -1;
+	}
+
+	/* if sigbus not be handled, return back old errno. */
+	if (ret)
+		rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index bdadc4d..9517f2b 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -258,4 +258,15 @@  int rte_mp_channel_init(void);
  */
 void dev_callback_process(char *device_name, enum rte_dev_event_type event);
 
+
+/**
+ * Iterate all buses to find the corresponding bus, to handle the sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 on success.
+ *	-1 on error
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
 #endif /* _EAL_PRIVATE_H_ */