[dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API

Stephen Hemminger stephen at networkplumber.org
Tue Aug 17 17:52:31 CEST 2021


On Tue, 17 Aug 2021 20:57:50 +0530
Jerin Jacob <jerinjacobk at gmail.com> wrote:

> On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger
> <stephen at networkplumber.org> wrote:
> >
> > On Tue, 17 Aug 2021 13:08:46 +0530
> > Jerin Jacob <jerinjacobk at gmail.com> wrote:
> >  
> > > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger
> > > <stephen at networkplumber.org> wrote:  
> > > >
> > > > On Tue, 17 Aug 2021 08:57:18 +0530
> > > > <jerinj at marvell.com> wrote:
> > > >  
> > > > > From: Jerin Jacob <jerinj at marvell.com>
> > > > >
> > > > > Introducing oops handling API with following specification
> > > > > and enable stub implementation for Linux and FreeBSD.
> > > > >
> > > > > On rte_eal_init() invocation, the EAL library installs the
> > > > > oops handler for the essential signals.
> > > > > The rte_oops_signals_enabled() API provides the list
> > > > > of signals the library installed by the EAL.  
> > > >
> > > > This is a big change, and many applications already handle these
> > > > signals themselves. Therefore adding this needs to be opt-in
> > > > and not enabled by default.  
> > >
> > > In order to avoid every application explicitly register this
> > > sighandler and to cater to the
> > > co-existing application-specific signal-hander usage.
> > > The following design has been chosen. (It is mentioned in the commit log,
> > > I will describe here for more clarity)
> > >
> > > Case 1:
> > > a) The application installs the signal handler prior to rte_eal_init().
> > > b) Implementation stores the application-specific signal and replace a
> > > signal handler as oops eal handler
> > > c) when application/DPDK get the segfault, the default EAL oops
> > > handler gets invoked
> > > d) Then it dumps the EAL specific message, it calls the
> > > application-specific signal handler
> > > installed in step 1 by application. This avoids breaking any contract
> > > with the application.
> > > i.e Behavior is the same current EAL now.
> > > That is the reason for not using SA_RESETHAND(which call SIG_DFL after
> > > eal oops handler instead
> > > application-specific handler)
> > >
> > > Case 2:
> > > a) The application install the signal handler after rte_eal_init(),
> > > b) EAL hander get replaced with application handle then the application can call
> > > rte_oops_decode() to decode.
> > >
> > > In order to cater the above use case, rte_oops_signals_enabled() and
> > > rte_oops_decode()
> > > provided.
> > >
> > > Here we are not breaking any contract with the application.
> > > Do you have concerns about this design?  
> >
> > In our application as a service it is important not to do any backtrace
> > in production. We rely on other infrastructure to process coredumps.  
> 
> Other infrastructure will work. For example, If we are using standard coredump
> using linux infra. In Current implementation,
> - EAL handler dump the DPDK OOPS like kernel on stderr
> - Implementation calls SIG_DFL in eal oops handler
> - The above step creates the coredump or re-directs any other
> infrastructure you are using for coredump.
> 
> >
> > This should be controlled enabled by a command line argument.  
> 
> If we allow other infrastructure coredump to work as-is, why
> enable/disable required from eal?

The addition of DPDK OOPS adds additional steps which make all
faults be identified as the oops code.



More information about the dev mailing list