[PATCH v2] eal/unix: allow creating thread with real-time priority

Stephen Hemminger stephen at networkplumber.org
Thu Oct 26 18:32:13 CEST 2023


On Thu, 26 Oct 2023 09:33:42 +0200
Morten Brørup <mb at smartsharesystems.com> wrote:

> > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > Sent: Wednesday, 25 October 2023 23.33
> > 
> > On Wed, 25 Oct 2023 19:54:06 +0200
> > Morten Brørup <mb at smartsharesystems.com> wrote:
> >   
> > > I agree with Thomas on this.
> > >
> > > If you want the log message, please degrade it to INFO or DEBUG level. It is  
> > only relevant when chasing problems, not for normal production - and thus
> > NOTICE is too high.
> > 
> > I don't want the message to be hidden.
> > If we get any bug reports want to be able to say "read the log, don't do
> > that".  
> 
> Since Stephen is arguing so strongly for it, I have changed my mind, and now support Stephen's suggestion.
> 
> It's a tradeoff: Noise for carefully designed systems, vs. important bug hunting information for systems under development (or casually developed systems).
> As Stephen points out, it is a good starting point to check for bug reports possibly related to this. And, I suppose the experienced users who really understands it will not be seriously confused by such a NOTICE message in the log.
> 
> >   
> > > Someone might build a kernel with options to keep non-dataplane threads off  
> > some dedicated CPU cores, so they can be used for guaranteed low-latency
> > dataplane threads. We do. We don't use real-time priority, though.
> > 
> > This is really, hard to do.  
> 
> As my kids would say: This is really, really, really, really, really hard to do!
> 
> We have not been able to find an authoritative source of documentation describing how to do it. :-(
> 
> And our experiment shows that we didn't 100 % succeed doing it. But we got close enough for our purposes. Outliers of max 9,000 CPU cycles on a 3+ GHz CPU corresponds to max 3 microseconds of added worst-case latency.
> 
> It would be great for latency-sensitive applications if the DPDK documentation went more into detail on this topic. However, if the DPDK runs on top of a Linux distro, it essentially depends on the distro, and should be documented there. And if running on top of a custom built Linux Kernel, it essentially depends on the kernel, and should be documented there. In other words: Such information should be contributed there, and not in the DPDK documentation. ;-)
> 
> > Isolated CPU's are not isolated from interrupts
> > and other sources which end up scheduling work as kernel threads. Plus there
> > is the behavior where kernel decides to turn a soft irq into a kernel thread,
> > then starve itself.  
> 
> We have configured the kernel to put all of this on CPU 0. (Details further below.)
> 
> > Under starvation, disk corruption is likely if interrupts never get
> > processed :-(
> >   
> > > For reference, we did some experiments (using this custom built kernel) with  
> > a dedicated thread doing nothing but a loop calling rte_rdtsc_precise() and
> > registering the delta. Although the overwhelming majority is ca. CPU 80
> > cycles, there are some big outliers at ca. 9,000 CPU cycles. (Order of
> > magnitude: ca. 45 of these big outliers per minute.) Apparently some kernel
> > threads steal some cycles from this thread, regardless of our customizations.
> > We haven't bothered analyzing and optimizing it further.
> > 
> > Was this on isolated CPU?  
> 
> Yes. We isolate all CPUs but CPU 0.
> 
> > Did you check that that CPU was excluded from the smp_affinty mask on all
> > devices?  
> 
> Not sure how to do that?
> 
> NB: We are currently only using single-socket hardware - this makes some things easier. Perhaps this is one of those things?
> 
> > Did you enable the kernel feature to avoid clock ticks if CPU is dedicated?  
> 
> Yes:
> # Timers subsystem
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ_COMMON=y
> CONFIG_NO_HZ_FULL=y
> CONFIG_NO_HZ_FULL_ALL=y
> 
> CONFIG_CMDLINE="isolcpus=1-32 irqaffinity=0 rcu_nocb_poll"
> 
> > Same thing for RCU, need to adjust parameters?  
> 
> Yes:
> # RCU Subsystem
> CONFIG_TREE_RCU=y
> CONFIG_SRCU=y
> CONFIG_RCU_STALL_COMMON=y
> CONFIG_CONTEXT_TRACKING=y
> CONFIG_RCU_NOCB_CPU=y
> CONFIG_RCU_NOCB_CPU_ALL=y
> 
> > 
> > Also, on many systems there can be SMI BIOS hidden execution that will cause
> > big outliers.  
> 
> Yes, this is a big surprise to many people, when it happens. Our hardware doesn't suffer from that.
> 
> > 
> > Lastly never try and use CPU 0. The kernel uses CPU 0 as catch all in lots of
> > places.  
> 
> Yes, this is very important! We treat CPU 0 as if any random process or interrupt handler can take it away at any time.
> 
> >   
> > > I think our experiment supports the need to allow kernel threads to run,  
> > e.g. by calling sleep() or similar, when an EAL thread has real-time priority.  
> 

One benefit of doing real-time thread is that kernel will be more precise in
any calls to sleep. If you do small sleep in normal thread, the kernel will round
up the timer to try and avoid reprogramming timer chip and to save power (less wakeups from idle).
With RT thread it will do "you wanted 21us, ok for you will do 21us"

The project that was originally Vyatta, has a script that tries to isolate interrupts etc.
I started it but they have worked on it since then.

   https://github.com/danos/vyatta-cpu-shield

It adjust kernel workers, softirq, cgroups etc


More information about the stable mailing list