[RFC] Dynamic log/trace control via telemetry

Stephen Hemminger stephen at networkplumber.org
Wed Aug 17 17:34:15 CEST 2022


On Wed, 17 Aug 2022 18:15:03 +0300
Dmitry Kozlyuk <dmitry.kozliuk at gmail.com> wrote:

> 2022-08-16 19:08 (UTC-0700), Stephen Hemminger:
> > Not sure if turning telemetry into a do all control api makes sense.  
> 
> I'm sure it doesn't, for "do all".
> Controlling diagnostic collection and output, however,
> is directly related to the telemetry purpose.
> 
> > This seems like a different API.
> > Also, the default would have to be disabled for application safety reasons.  
> 
> This feature would be for collecting additional info
> in case the collection was not planned and a restart is not desired.
> If it is disabled by default, it is likely to be off when it's needed.
> 
> Let's consider how exactly can safety be compromised.
> 
> 1. Securing telemetry socket access is out of scope for DPDK,
>    that is, any successful access is considered trusted.
> 
> 2. Even read-only telemetry still comes at cost, for example,
>    memory telemetry takes a global lock that blocks all allocations,
>    so affecting the app performance is already possible.
> 
> 3. Important logs and traces enabled at startup may be disabled dynamically.
>    If it's an issue, the API can refuse to disable them.
> 
> 4. Bogus logs may flood the output and slow down the app.
>    Bogus traces can exhaust disk space.
>    Logs should be monitored automatically, so flooding is just an annoyance.
>    Disk space can have a quota.
>    Since the user is trusted (item 1), even if they do it by mistake,
>    they can quickly correct themselves using the same API.

There can be security impact to telemetry.
There always is some performance cost to telemetry.

My interest is that we run a performance sensitive application and it gets
lots of security review. If a new version of DPDK magically enabled something
that had impact, you would cause extra effort and confusion.

Developers often have the wrong point of view "my feature is great, everyone wants it"
and also "why should I test with this disabled".  New features should be opt-in not
opt-out.



More information about the dev mailing list