[RFC PATCH] ring: adding TPAUSE instruction to ring dequeue

Coyle, David david.coyle at intel.com
Wed May 3 17:31:47 CEST 2023


Hi Morten

> -----Original Message-----
> From: Morten Brørup <mb at smartsharesystems.com>
> 
> > From: David Coyle [mailto:david.coyle at intel.com]
> > Sent: Wednesday, 3 May 2023 13.39
> >
> > This is NOT for upstreaming. This is being submitted to allow early
> > comparison testing with the preferred solution, which will add TAPUSE
> > power management support to the ring library through the addition of
> > callbacks. Initial stages of the preferred solution are available at
> > http://dpdk.org/patch/125454.
> >
> > This patch adds functionality directly to rte_ring_dequeue functions
> > to monitor the empty reads of the ring. When a configurable number of
> > empty reads is reached, a TPAUSE instruction is triggered by using
> > rte_power_pause() on supported architectures. rte_pause() is used on
> > other architectures. The functionality can be included or excluded at
> > compilation time using the RTE_RING_PMGMT flag. If included, the new
> > API can be used to enable/disable the feature on a per-ring basis.
> > Other related settings can also be configured using the API.
> 
> I don't understand why DPDK developers keep spending time on trying to
> invent methods to determine application busyness based on entry/exit
> points in a variety of libraries, when the application is in a much better
> position to determine busyness. All of these "busyness measuring" library
> extensions have their own specific assumptions and weird limitations.
> 
> I do understand that the goal is power saving, which certainly is relevant! I
> only criticize the measuring methods.
> 
> For reference, we implemented something very simple in our application
> framework:
> 1. When each pipeline stage has completed a burst, it reports if it was busy or
> not.
> 2. If the pipeline busyness is low, we take a nap to save some power.
> 
> And here is the magic twist to this simple algorithm:
> 3. A pipeline stage is not considered busy unless it processed a full burst, and
> is ready to process more packets immediately. This interpretation of
> busyness has a significant impact on the percentage of time spent napping
> during the low-traffic hours.
> 
> This algorithm was very quickly implemented. It might not be perfect, and we
> do intend to improve it (also to determine CPU Utilization on a scale that the
> end user can translate to a linear interpretation of how busy the system is).
> But I seriously doubt that any of the proposed "busyness measuring" library
> extensions are any better.
> 
> So: The application knows better, please spend your precious time on
> something useful instead.
> 
> @David, my outburst is not directed at you specifically. Generally, I do
> appreciate experimenting as a good way of obtaining knowledge. So thank
> you for sharing your experiments with this audience!
> 
> PS: If cruft can be disabled at build time, I generally don't oppose to it.

[DC] Appreciate that feedback, and it is certainly another way of looking at
and tackling the problem that we are ultimately trying to solve (i.e power
saving)

The problem however is that we work with a large number of ISVs and operators,
each with their own workload architecture and implementation. That means we
would have to work individually with each of these to integrate this type of
pipeline-stage-busyness algorithm into their applications. And as these
applications are usually commercial, non-open-source applications, that could
prove to be very difficult.

Also most ISVs and operators don't want to have to worry about changing their
application, especially their fast-path dataplane, in order to get power
savings. They prefer for it to just happen without them caring about the finer
details.

For these reasons, consolidating the busyness algorithms down into the DPDK
libraries and PMDs is currently the preferred solution. As you say though, the
libraries and PMDs may not be in the best position to determine the busyness
of the pipeline, but it provides a good balance between achieving power savings
and ease of adoption.

It's also worth calling out again that this patch is only to allow early
testing by some customers of the benefit of adding TPAUSE support to the ring
library. We don't intend on this patch being upstreamed. The preferred longer
term solution is to use callbacks from the ring library to initiate the pause
(either via the DPDK power management API or through functions that an ISV
may write themselves). This is mentioned in the commit message.

Also, the pipeline stage busyness algorithm that you have added to your
pipeline - have you ever considered implementing this into DPDK as a generic
type library. This could certainly be of benefit to other DPDK application
developers, and having this mechanism in DPDK could again ease the adoption
and realisation of power savings for others. I understand though if this is your
own secret sauce and you want to keep it like that :)

David


More information about the dev mailing list