[PATCH v2 1/5] telemetry: escape special char when tel string

Power, Ciara ciara.power at intel.com
Wed Jun 22 09:57:43 CEST 2022


Hi folks,

> -----Original Message-----
> From: Morten Brørup <mb at smartsharesystems.com>
> Sent: Saturday 18 June 2022 10:59
> To: fengchengwen <fengchengwen at huawei.com>; Stephen Hemminger
> <stephen at networkplumber.org>; Richardson, Bruce
> <bruce.richardson at intel.com>
> Cc: thomas at monjalon.net; ferruh.yigit at xilinx.com; Laatz, Kevin
> <kevin.laatz at intel.com>; andrew.rybchenko at oktetlabs.ru;
> jerinj at marvell.com; sachin.saxena at oss.nxp.com;
> hemant.agrawal at nxp.com; dev at dpdk.org; Power, Ciara
> <ciara.power at intel.com>
> Subject: RE: [PATCH v2 1/5] telemetry: escape special char when tel string
> 
> +CC: Ciara Power, Telemetry library maintainer
> 
> > From: fengchengwen [mailto:fengchengwen at huawei.com]
> > Sent: Saturday, 18 June 2022 05.52
> >
> > On 2022/6/18 1:05, Stephen Hemminger wrote:
> > > On Fri, 17 Jun 2022 12:25:04 +0100
> > > Bruce Richardson <bruce.richardson at intel.com> wrote:
> > >
> > >> On Fri, Jun 17, 2022 at 01:16:08PM +0200, Morten Brørup wrote:
> > >>>> From: Chengwen Feng [mailto:fengchengwen at huawei.com]
> > >>>> Sent: Friday, 17 June 2022 11.46
> > >>>>
> > >>>> This patch supports escape special characters (including:
> > \",\\,/,\b,
> > >>>> /f,/n,/r,/t) when telemetry string.
> > >>>> This patch is used to support telemetry xxx-dump commands which
> > the
> > >>>> string may include special characters.
> > >>>>
> > >>>> Signed-off-by: Chengwen Feng <fengchengwen at huawei.com>
> > >>>> ---
> > >>>>  lib/telemetry/telemetry.c | 96
> > +++++++++++++++++++++++++++++++++++++--
> > >>>>  1 file changed, 93 insertions(+), 3 deletions(-)
> > >>>>
> > >>>> diff --git a/lib/telemetry/telemetry.c
> > >>>> b/lib/telemetry/telemetry.c index c6fd03a5ab..0f762f633e 100644
> > >>>> --- a/lib/telemetry/telemetry.c
> > >>>> +++ b/lib/telemetry/telemetry.c
> > >>>> @@ -215,6 +215,94 @@ container_to_json(const struct rte_tel_data
> > *d,
> > >>>> char *out_buf, size_t buf_len)
> > >>>>  	return used;
> > >>>>  }
> > >>>>
> > >>>> +static bool
> > >>>> +json_is_special_char(char ch)
> > >>>> +{
> > >>>> +	static unsigned char is_spec[256] = { 0 };
> > >>>> +	static bool init_once;
> > >>>> +
> > >>>> +	if (!init_once) {
> > >>>> +		is_spec['\"'] = 1;
> > >>>> +		is_spec['\\'] = 1;
> > >>>> +		is_spec['/'] = 1;
> > >>>> +		is_spec['\b'] = 1;
> > >>>> +		is_spec['\f'] = 1;
> > >>>> +		is_spec['\n'] = 1;
> > >>>> +		is_spec['\r'] = 1;
> > >>>> +		is_spec['\t'] = 1;
> > >>>> +		init_once = true;
> > >>>> +	}
> > >>>> +
> > >>>> +	return (bool)is_spec[(unsigned char)ch]; }
> > >>
> > >> According to the json spec at [1], the characters that need to be
> > escaped
> > >> are:
> > >> a) any characters <0x20
> > >> b) inverted commas/quote character \"
> > >> c) the "reverse solidus character", better known to you and I as
> > >> the back-slash.
> > >>
> > >> Therefore, I think this table generation could be simplified, but
> > also
> > >> expanded using this. For completeness we should also see about
> > handling all
> > >> control characters if they are encountered.
> > >>
> > >> [1] https://www.rfc-editor.org/rfc/rfc8259.txt
> > >>
> > >> /Bruce
> > >
> > > Since it is trivial could be initializer?
> > >
> > > static const uint8_t is_spec[256] = {
> > >    [0 ... 0x20] = 1,
> > >    ['\"' ] = 1,
> > >    ['\\' ] = 1,
> > >    ['/'] = 1,
> > >
> > > etc
> > >
> > > Or we could change the telemetry API to disallow control characters?
> >
> > I was thinking about converting 0~0x20, but I don't think there's a
> > scenario.
> >
> > I prefer change the telemetry API to disallow control characters, and
> > this may not be a violation of the ABI, if yes, the dpdk-telemetry.py
> > will returns an error.
> 
> I agree with Chengwen Feng. The telemetry data type is STRING, not BLOB.
> 
> So we need to define exactly what the STRING type contains.
> 
> I hope we can all agree that control characters should be disallowed.
> 
> The more complicated question is: Do we want to use the ASCII character set
> only, or do we want to use UTF-8 encoded Unicode?
> 
> Personally, think UTF-8 encoded Unicode is more future proof, and would
> vote for that.
> 
> But I wouldn't reject limiting it to ASCII, and perhaps in the future introduce
> another data type for UTF-8 strings.
> 
> UTF-8 is the modern choice, but it is incompatible with old stuff, e.g. many
> SNMP MIBs.
> 
[CP] 

Just from looking at the spec [1] , I would say UTF-8, as it seems to suggest its use for JSON (section 8.1).

[1] https://www.rfc-editor.org/rfc/rfc8259.txt

> >
> > So I think we could add declaring and checking functions to make sure
> > telemetry string do not allow control characters.
[CP] 

I am not sure why we don't want these at all - I thought we do want some of them, like tab (\u0009) for example.

<snip>

In general, I think Bruce's suggestion of using a customised printf function instead of snprintf would be a good way forward, to scan the chars as they are being copied in.

Thanks,
Ciara


More information about the dev mailing list