[PATCH v2] eal/unix: allow creating thread with real-time priority

Morten Brørup mb at smartsharesystems.com
Thu Oct 26 09:33:42 CEST 2023


> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Wednesday, 25 October 2023 23.33
> 
> On Wed, 25 Oct 2023 19:54:06 +0200
> Morten Brørup <mb at smartsharesystems.com> wrote:
> 
> > I agree with Thomas on this.
> >
> > If you want the log message, please degrade it to INFO or DEBUG level. It is
> only relevant when chasing problems, not for normal production - and thus
> NOTICE is too high.
> 
> I don't want the message to be hidden.
> If we get any bug reports want to be able to say "read the log, don't do
> that".

Since Stephen is arguing so strongly for it, I have changed my mind, and now support Stephen's suggestion.

It's a tradeoff: Noise for carefully designed systems, vs. important bug hunting information for systems under development (or casually developed systems).
As Stephen points out, it is a good starting point to check for bug reports possibly related to this. And, I suppose the experienced users who really understands it will not be seriously confused by such a NOTICE message in the log.

> 
> > Someone might build a kernel with options to keep non-dataplane threads off
> some dedicated CPU cores, so they can be used for guaranteed low-latency
> dataplane threads. We do. We don't use real-time priority, though.
> 
> This is really, hard to do.

As my kids would say: This is really, really, really, really, really hard to do!

We have not been able to find an authoritative source of documentation describing how to do it. :-(

And our experiment shows that we didn't 100 % succeed doing it. But we got close enough for our purposes. Outliers of max 9,000 CPU cycles on a 3+ GHz CPU corresponds to max 3 microseconds of added worst-case latency.

It would be great for latency-sensitive applications if the DPDK documentation went more into detail on this topic. However, if the DPDK runs on top of a Linux distro, it essentially depends on the distro, and should be documented there. And if running on top of a custom built Linux Kernel, it essentially depends on the kernel, and should be documented there. In other words: Such information should be contributed there, and not in the DPDK documentation. ;-)

> Isolated CPU's are not isolated from interrupts
> and other sources which end up scheduling work as kernel threads. Plus there
> is the behavior where kernel decides to turn a soft irq into a kernel thread,
> then starve itself.

We have configured the kernel to put all of this on CPU 0. (Details further below.)

> Under starvation, disk corruption is likely if interrupts never get
> processed :-(
> 
> > For reference, we did some experiments (using this custom built kernel) with
> a dedicated thread doing nothing but a loop calling rte_rdtsc_precise() and
> registering the delta. Although the overwhelming majority is ca. CPU 80
> cycles, there are some big outliers at ca. 9,000 CPU cycles. (Order of
> magnitude: ca. 45 of these big outliers per minute.) Apparently some kernel
> threads steal some cycles from this thread, regardless of our customizations.
> We haven't bothered analyzing and optimizing it further.
> 
> Was this on isolated CPU?

Yes. We isolate all CPUs but CPU 0.

> Did you check that that CPU was excluded from the smp_affinty mask on all
> devices?

Not sure how to do that?

NB: We are currently only using single-socket hardware - this makes some things easier. Perhaps this is one of those things?

> Did you enable the kernel feature to avoid clock ticks if CPU is dedicated?

Yes:
# Timers subsystem
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=y

CONFIG_CMDLINE="isolcpus=1-32 irqaffinity=0 rcu_nocb_poll"

> Same thing for RCU, need to adjust parameters?

Yes:
# RCU Subsystem
CONFIG_TREE_RCU=y
CONFIG_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_CONTEXT_TRACKING=y
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y

> 
> Also, on many systems there can be SMI BIOS hidden execution that will cause
> big outliers.

Yes, this is a big surprise to many people, when it happens. Our hardware doesn't suffer from that.

> 
> Lastly never try and use CPU 0. The kernel uses CPU 0 as catch all in lots of
> places.

Yes, this is very important! We treat CPU 0 as if any random process or interrupt handler can take it away at any time.

> 
> > I think our experiment supports the need to allow kernel threads to run,
> e.g. by calling sleep() or similar, when an EAL thread has real-time priority.



More information about the stable mailing list