[v2,02/13] telemetry: fix escaping of invalid json characters

Message ID 20220725163543.875775-3-bruce.richardson@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series telemetry JSON escaping and other enhancements |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Bruce Richardson July 25, 2022, 4:35 p.m. UTC
  For string values returned from telemetry, escape any values that cannot
normally appear in a json string. According to the json spec[1], the
characters than need to be handled are control chars (char value < 0x20)
and '"' and '\' characters.

To handle this, we replace the snprintf call with a separate string
copying and encapsulation routine which checks each character as it
copies it to the final array.

[1] https://www.rfc-editor.org/rfc/rfc8259.txt

Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality")
Bugzilla ID: 1037

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/telemetry/telemetry.c      | 11 +++++---
 lib/telemetry/telemetry_json.h | 48 +++++++++++++++++++++++++++++++++-
 2 files changed, 55 insertions(+), 4 deletions(-)
  

Comments

Morten Brørup July 26, 2022, 6:25 p.m. UTC | #1
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Monday, 25 July 2022 18.36
> To: dev@dpdk.org
> Cc: Bruce Richardson; Ciara Power; Keith Wiles
> Subject: [PATCH v2 02/13] telemetry: fix escaping of invalid json
> characters
> 
> For string values returned from telemetry, escape any values that
> cannot
> normally appear in a json string. According to the json spec[1], the
> characters than need to be handled are control chars (char value <
> 0x20)
> and '"' and '\' characters.
> 
> To handle this, we replace the snprintf call with a separate string
> copying and encapsulation routine which checks each character as it
> copies it to the final array.
> 
> [1] https://www.rfc-editor.org/rfc/rfc8259.txt
> 
> Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality")
> Bugzilla ID: 1037
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Patchwork didn't pick up my reply to the 00/13 of the series, so I'll try again here...

Series-Acked-by: Morten Brørup <mb@smartsharesystems.com>
  
fengchengwen July 27, 2022, 1:13 a.m. UTC | #2
On 2022/7/26 0:35, Bruce Richardson wrote:
> For string values returned from telemetry, escape any values that cannot
> normally appear in a json string. According to the json spec[1], the
> characters than need to be handled are control chars (char value < 0x20)
> and '"' and '\' characters.
> 
> To handle this, we replace the snprintf call with a separate string
> copying and encapsulation routine which checks each character as it
> copies it to the final array.
> 
> [1] https://www.rfc-editor.org/rfc/rfc8259.txt
> 
> Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality")
> Bugzilla ID: 1037
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  lib/telemetry/telemetry.c      | 11 +++++---
>  lib/telemetry/telemetry_json.h | 48 +++++++++++++++++++++++++++++++++-
>  2 files changed, 55 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
> index c6fd03a5ab..7188b1905c 100644
> --- a/lib/telemetry/telemetry.c
> +++ b/lib/telemetry/telemetry.c
> @@ -232,9 +232,14 @@ output_json(const char *cmd, const struct rte_tel_data *d, int s)
>  				MAX_CMD_LEN, cmd ? cmd : "none");
>  		break;
>  	case RTE_TEL_STRING:
> -		used = snprintf(out_buf, sizeof(out_buf), "{\"%.*s\":\"%.*s\"}",
> -				MAX_CMD_LEN, cmd,
> -				RTE_TEL_MAX_SINGLE_STRING_LEN, d->data.str);
> +		prefix_used = snprintf(out_buf, sizeof(out_buf), "{\"%.*s\":",
> +				MAX_CMD_LEN, cmd);

The cmd need also escaped.
But I notice the [PATCH v2 06/13] limit it. Suggest move 06 at the head of patchset.

> +		cb_data_buf = &out_buf[prefix_used];
> +		buf_len = sizeof(out_buf) - prefix_used - 1; /* space for '}' */
> +
> +		used = rte_tel_json_str(cb_data_buf, buf_len, 0, d->data.str);
> +		used += prefix_used;
> +		used += strlcat(out_buf + used, "}", sizeof(out_buf) - used);
>  		break;
>  	case RTE_TEL_DICT:
>  		prefix_used = snprintf(out_buf, sizeof(out_buf), "{\"%.*s\":",
> diff --git a/lib/telemetry/telemetry_json.h b/lib/telemetry/telemetry_json.h
> index db70690274..13df5d07e3 100644
> --- a/lib/telemetry/telemetry_json.h
> +++ b/lib/telemetry/telemetry_json.h
> @@ -44,6 +44,52 @@ __json_snprintf(char *buf, const int len, const char *format, ...)
>  	return 0; /* nothing written or modified */
>  }
>  
> +static const char control_chars[0x20] = {
> +		['\n'] = 'n',
> +		['\r'] = 'r',
> +		['\t'] = 't',
> +};
> +
> +/**
> + * @internal
> + * Does the same as __json_snprintf(buf, len, "\"%s\"", str)
> + * except that it does proper escaping as necessary.
> + * Drops any invalid characters we don't support
> + */
> +static inline int
> +__json_format_str(char *buf, const int len, const char *str)
> +{
> +	char tmp[len];

Could reuse buf otherthan tmp

> +	int tmpidx = 0;
> +
> +	tmp[tmpidx++] = '"';
> +	while (*str != '\0') {
> +		if (*str < (int)RTE_DIM(control_chars)) {
> +			int idx = *str;  /* compilers don't like char type as index */
> +			if (control_chars[idx] != 0) {
> +				tmp[tmpidx++] = '\\';
> +				tmp[tmpidx++] = control_chars[idx];

Why not espace all control chars?

> +			}
> +		} else if (*str == '"' || *str == '\\') {
> +			tmp[tmpidx++] = '\\';
> +			tmp[tmpidx++] = *str;
> +		} else
> +			tmp[tmpidx++] = *str;
> +		/* we always need space for closing quote and null character.
> +		 * Ensuring at least two free characters also means we can always take an
> +		 * escaped character like "\n" without overflowing
> +		 */
> +		if (tmpidx > len - 2)
> +			return 0;

Suggest add log here to help find out problem.

> +		str++;
> +	}
> +	tmp[tmpidx++] = '"';
> +	tmp[tmpidx] = '\0';
> +
> +	strcpy(buf, tmp);
> +	return tmpidx;
> +}
> +
>  /* Copies an empty array into the provided buffer. */
>  static inline int
>  rte_tel_json_empty_array(char *buf, const int len, const int used)
> @@ -62,7 +108,7 @@ rte_tel_json_empty_obj(char *buf, const int len, const int used)
>  static inline int
>  rte_tel_json_str(char *buf, const int len, const int used, const char *str)
>  {
> -	return used + __json_snprintf(buf + used, len - used, "\"%s\"", str);
> +	return used + __json_format_str(buf + used, len - used, str);
>  }
>  
>  /* Appends a string into the JSON array in the provided buffer. */
>
  
Bruce Richardson July 27, 2022, 8:21 a.m. UTC | #3
On Tue, Jul 26, 2022 at 08:25:05PM +0200, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Monday, 25 July 2022 18.36
> > To: dev@dpdk.org
> > Cc: Bruce Richardson; Ciara Power; Keith Wiles
> > Subject: [PATCH v2 02/13] telemetry: fix escaping of invalid json
> > characters
> > 
> > For string values returned from telemetry, escape any values that
> > cannot
> > normally appear in a json string. According to the json spec[1], the
> > characters than need to be handled are control chars (char value <
> > 0x20)
> > and '"' and '\' characters.
> > 
> > To handle this, we replace the snprintf call with a separate string
> > copying and encapsulation routine which checks each character as it
> > copies it to the final array.
> > 
> > [1] https://www.rfc-editor.org/rfc/rfc8259.txt
> > 
> > Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality")
> > Bugzilla ID: 1037
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> 
> Patchwork didn't pick up my reply to the 00/13 of the series, so I'll try again here...
> 
> Series-Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 
Unfortunately, patchwork doesn't work with series acks, only individual
ones. However, the maintainers recognise when they are present, and if I do
a V3, I'll split your ack across the set so patchwork does recognise it.

/Bruce
  
Bruce Richardson July 27, 2022, 8:27 a.m. UTC | #4
On Wed, Jul 27, 2022 at 09:13:18AM +0800, fengchengwen wrote:
> On 2022/7/26 0:35, Bruce Richardson wrote:
> > For string values returned from telemetry, escape any values that cannot
> > normally appear in a json string. According to the json spec[1], the
> > characters than need to be handled are control chars (char value < 0x20)
> > and '"' and '\' characters.
> > 
> > To handle this, we replace the snprintf call with a separate string
> > copying and encapsulation routine which checks each character as it
> > copies it to the final array.
> > 
> > [1] https://www.rfc-editor.org/rfc/rfc8259.txt
> > 
> > Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality")
> > Bugzilla ID: 1037
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  lib/telemetry/telemetry.c      | 11 +++++---
> >  lib/telemetry/telemetry_json.h | 48 +++++++++++++++++++++++++++++++++-
> >  2 files changed, 55 insertions(+), 4 deletions(-)
> > 
> > diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
> > index c6fd03a5ab..7188b1905c 100644
> > --- a/lib/telemetry/telemetry.c
> > +++ b/lib/telemetry/telemetry.c
> > @@ -232,9 +232,14 @@ output_json(const char *cmd, const struct rte_tel_data *d, int s)
> >  				MAX_CMD_LEN, cmd ? cmd : "none");
> >  		break;
> >  	case RTE_TEL_STRING:
> > -		used = snprintf(out_buf, sizeof(out_buf), "{\"%.*s\":\"%.*s\"}",
> > -				MAX_CMD_LEN, cmd,
> > -				RTE_TEL_MAX_SINGLE_STRING_LEN, d->data.str);
> > +		prefix_used = snprintf(out_buf, sizeof(out_buf), "{\"%.*s\":",
> > +				MAX_CMD_LEN, cmd);
> 
> The cmd need also escaped.
> But I notice the [PATCH v2 06/13] limit it. Suggest move 06 at the head of patchset.
>
Right. I'll try some patch reordering in the next version of this set.
 
> > +		cb_data_buf = &out_buf[prefix_used];
> > +		buf_len = sizeof(out_buf) - prefix_used - 1; /* space for '}' */
> > +
> > +		used = rte_tel_json_str(cb_data_buf, buf_len, 0, d->data.str);
> > +		used += prefix_used;
> > +		used += strlcat(out_buf + used, "}", sizeof(out_buf) - used);
> >  		break;
> >  	case RTE_TEL_DICT:
> >  		prefix_used = snprintf(out_buf, sizeof(out_buf), "{\"%.*s\":",
> > diff --git a/lib/telemetry/telemetry_json.h b/lib/telemetry/telemetry_json.h
> > index db70690274..13df5d07e3 100644
> > --- a/lib/telemetry/telemetry_json.h
> > +++ b/lib/telemetry/telemetry_json.h
> > @@ -44,6 +44,52 @@ __json_snprintf(char *buf, const int len, const char *format, ...)
> >  	return 0; /* nothing written or modified */
> >  }
> >  
> > +static const char control_chars[0x20] = {
> > +		['\n'] = 'n',
> > +		['\r'] = 'r',
> > +		['\t'] = 't',
> > +};
> > +
> > +/**
> > + * @internal
> > + * Does the same as __json_snprintf(buf, len, "\"%s\"", str)
> > + * except that it does proper escaping as necessary.
> > + * Drops any invalid characters we don't support
> > + */
> > +static inline int
> > +__json_format_str(char *buf, const int len, const char *str)
> > +{
> > +	char tmp[len];
> 
> Could reuse buf otherthan tmp
> 
The approach here is to guarantee that we always output valid json.
Therefore, we build up the output in a temporary buffer until we are sure
that it's all correct and can fit, before moving it into the final buffer.
That way, if there are any issues, the original buffer is unmodified, and
we can return the bytes-appended as 0.

> > +	int tmpidx = 0;
> > +
> > +	tmp[tmpidx++] = '"';
> > +	while (*str != '\0') {
> > +		if (*str < (int)RTE_DIM(control_chars)) {
> > +			int idx = *str;  /* compilers don't like char type as index */
> > +			if (control_chars[idx] != 0) {
> > +				tmp[tmpidx++] = '\\';
> > +				tmp[tmpidx++] = control_chars[idx];
> 
> Why not espace all control chars?
> 
Because only certain characters have valid escape codes, and any other
characters would have to be replaced with unicode values. These should not
be ever appearing in our text output fields anyway.

> > +			}
> > +		} else if (*str == '"' || *str == '\\') {
> > +			tmp[tmpidx++] = '\\';
> > +			tmp[tmpidx++] = *str;
> > +		} else
> > +			tmp[tmpidx++] = *str;
> > +		/* we always need space for closing quote and null character.
> > +		 * Ensuring at least two free characters also means we can always take an
> > +		 * escaped character like "\n" without overflowing
> > +		 */
> > +		if (tmpidx > len - 2)
> > +			return 0;
> 
> Suggest add log here to help find out problem.
> 
Telemetry is operating in a background thread, so not sure logging is a
good idea in such cases. I'd look for other opinions on this...

> > +		str++;
> > +	}
> > +	tmp[tmpidx++] = '"';
> > +	tmp[tmpidx] = '\0';
> > +
> > +	strcpy(buf, tmp);
> > +	return tmpidx;
> > +}
> > +
> >  /* Copies an empty array into the provided buffer. */
> >  static inline int
> >  rte_tel_json_empty_array(char *buf, const int len, const int used)
> > @@ -62,7 +108,7 @@ rte_tel_json_empty_obj(char *buf, const int len, const int used)
> >  static inline int
> >  rte_tel_json_str(char *buf, const int len, const int used, const char *str)
> >  {
> > -	return used + __json_snprintf(buf + used, len - used, "\"%s\"", str);
> > +	return used + __json_format_str(buf + used, len - used, str);
> >  }
> >  
> >  /* Appends a string into the JSON array in the provided buffer. */
> > 
>
  

Patch

diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
index c6fd03a5ab..7188b1905c 100644
--- a/lib/telemetry/telemetry.c
+++ b/lib/telemetry/telemetry.c
@@ -232,9 +232,14 @@  output_json(const char *cmd, const struct rte_tel_data *d, int s)
 				MAX_CMD_LEN, cmd ? cmd : "none");
 		break;
 	case RTE_TEL_STRING:
-		used = snprintf(out_buf, sizeof(out_buf), "{\"%.*s\":\"%.*s\"}",
-				MAX_CMD_LEN, cmd,
-				RTE_TEL_MAX_SINGLE_STRING_LEN, d->data.str);
+		prefix_used = snprintf(out_buf, sizeof(out_buf), "{\"%.*s\":",
+				MAX_CMD_LEN, cmd);
+		cb_data_buf = &out_buf[prefix_used];
+		buf_len = sizeof(out_buf) - prefix_used - 1; /* space for '}' */
+
+		used = rte_tel_json_str(cb_data_buf, buf_len, 0, d->data.str);
+		used += prefix_used;
+		used += strlcat(out_buf + used, "}", sizeof(out_buf) - used);
 		break;
 	case RTE_TEL_DICT:
 		prefix_used = snprintf(out_buf, sizeof(out_buf), "{\"%.*s\":",
diff --git a/lib/telemetry/telemetry_json.h b/lib/telemetry/telemetry_json.h
index db70690274..13df5d07e3 100644
--- a/lib/telemetry/telemetry_json.h
+++ b/lib/telemetry/telemetry_json.h
@@ -44,6 +44,52 @@  __json_snprintf(char *buf, const int len, const char *format, ...)
 	return 0; /* nothing written or modified */
 }
 
+static const char control_chars[0x20] = {
+		['\n'] = 'n',
+		['\r'] = 'r',
+		['\t'] = 't',
+};
+
+/**
+ * @internal
+ * Does the same as __json_snprintf(buf, len, "\"%s\"", str)
+ * except that it does proper escaping as necessary.
+ * Drops any invalid characters we don't support
+ */
+static inline int
+__json_format_str(char *buf, const int len, const char *str)
+{
+	char tmp[len];
+	int tmpidx = 0;
+
+	tmp[tmpidx++] = '"';
+	while (*str != '\0') {
+		if (*str < (int)RTE_DIM(control_chars)) {
+			int idx = *str;  /* compilers don't like char type as index */
+			if (control_chars[idx] != 0) {
+				tmp[tmpidx++] = '\\';
+				tmp[tmpidx++] = control_chars[idx];
+			}
+		} else if (*str == '"' || *str == '\\') {
+			tmp[tmpidx++] = '\\';
+			tmp[tmpidx++] = *str;
+		} else
+			tmp[tmpidx++] = *str;
+		/* we always need space for closing quote and null character.
+		 * Ensuring at least two free characters also means we can always take an
+		 * escaped character like "\n" without overflowing
+		 */
+		if (tmpidx > len - 2)
+			return 0;
+		str++;
+	}
+	tmp[tmpidx++] = '"';
+	tmp[tmpidx] = '\0';
+
+	strcpy(buf, tmp);
+	return tmpidx;
+}
+
 /* Copies an empty array into the provided buffer. */
 static inline int
 rte_tel_json_empty_array(char *buf, const int len, const int used)
@@ -62,7 +108,7 @@  rte_tel_json_empty_obj(char *buf, const int len, const int used)
 static inline int
 rte_tel_json_str(char *buf, const int len, const int used, const char *str)
 {
-	return used + __json_snprintf(buf + used, len - used, "\"%s\"", str);
+	return used + __json_format_str(buf + used, len - used, str);
 }
 
 /* Appends a string into the JSON array in the provided buffer. */