[dpdk-dev] ip_frag: handle MTU sizes not aligned to 8 bytes

Message ID 1489504487-67456-1-git-send-email-allain.legacy@windriver.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

Allain Legacy March 14, 2017, 3:14 p.m. UTC
  The rte_ipv4_fragment_packet API expects that the link/interface MTU value
passed in be divisible by 8 bytes.  Given the name of the parameter is
"mtu" rather than "frag_size" it is not necessarily the case that it will
be divisible by 8.  An MTU of 1500 happens to produce a max fragment size
of 1480 (1500 - sizeof(ipv4_hdr)) which is divisible by 8 but other MTU
values such as 1600 or 9000 do not produce values that are divisible by 8.

Unfortunately, the API checks that the frag_size value produced is
divisible by 8 with a call to RTE_ASSERT which is only enabled when the
RTE_LOG_LEVEL >= RTE_LOG_DEBUG.  In cases where the log level is set
normally the code silently continues and produces IP fragments that have
invalid fragment offset values.

An application may not have control over what MTU a user selects and rather
than have each application adjust the MTU to pass a suitable value to the
fragmentation API this change modifies the fragmentation API to handle
cases where the "mtu" argument is not divisible by 8 and automatically
adjust the internal "frag_size".

Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
---
 lib/librte_ip_frag/rte_ipv4_fragmentation.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)
  

Comments

Ananyev, Konstantin June 4, 2017, 4:55 p.m. UTC | #1
> 

> The rte_ipv4_fragment_packet API expects that the link/interface MTU value

> passed in be divisible by 8 bytes.  Given the name of the parameter is

> "mtu" rather than "frag_size" it is not necessarily the case that it will

> be divisible by 8.  An MTU of 1500 happens to produce a max fragment size

> of 1480 (1500 - sizeof(ipv4_hdr)) which is divisible by 8 but other MTU

> values such as 1600 or 9000 do not produce values that are divisible by 8.

> 

> Unfortunately, the API checks that the frag_size value produced is

> divisible by 8 with a call to RTE_ASSERT which is only enabled when the

> RTE_LOG_LEVEL >= RTE_LOG_DEBUG.  In cases where the log level is set

> normally the code silently continues and produces IP fragments that have

> invalid fragment offset values.

> 

> An application may not have control over what MTU a user selects and rather

> than have each application adjust the MTU to pass a suitable value to the

> fragmentation API this change modifies the fragmentation API to handle

> cases where the "mtu" argument is not divisible by 8 and automatically

> adjust the internal "frag_size".

> 

> Signed-off-by: Allain Legacy <allain.legacy@windriver.com>

> ---


Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
  
Thomas Monjalon June 22, 2017, 4 p.m. UTC | #2
> > The rte_ipv4_fragment_packet API expects that the link/interface MTU value
> > passed in be divisible by 8 bytes.  Given the name of the parameter is
> > "mtu" rather than "frag_size" it is not necessarily the case that it will
> > be divisible by 8.  An MTU of 1500 happens to produce a max fragment size
> > of 1480 (1500 - sizeof(ipv4_hdr)) which is divisible by 8 but other MTU
> > values such as 1600 or 9000 do not produce values that are divisible by 8.
> > 
> > Unfortunately, the API checks that the frag_size value produced is
> > divisible by 8 with a call to RTE_ASSERT which is only enabled when the
> > RTE_LOG_LEVEL >= RTE_LOG_DEBUG.  In cases where the log level is set
> > normally the code silently continues and produces IP fragments that have
> > invalid fragment offset values.
> > 
> > An application may not have control over what MTU a user selects and rather
> > than have each application adjust the MTU to pass a suitable value to the
> > fragmentation API this change modifies the fragmentation API to handle
> > cases where the "mtu" argument is not divisible by 8 and automatically
> > adjust the internal "frag_size".
> > 
> > Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
> 
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

Applied, thanks
  

Patch

diff --git a/lib/librte_ip_frag/rte_ipv4_fragmentation.c b/lib/librte_ip_frag/rte_ipv4_fragmentation.c
index a2259e8..8c5f5ec 100644
--- a/lib/librte_ip_frag/rte_ipv4_fragmentation.c
+++ b/lib/librte_ip_frag/rte_ipv4_fragmentation.c
@@ -48,7 +48,7 @@ 
 #define	IPV4_HDR_DF_MASK			(1 << IPV4_HDR_DF_SHIFT)
 #define	IPV4_HDR_MF_MASK			(1 << IPV4_HDR_MF_SHIFT)
 
-#define	IPV4_HDR_FO_MASK			((1 << IPV4_HDR_FO_SHIFT) - 1)
+#define	IPV4_HDR_FO_ALIGN			(1 << IPV4_HDR_FO_SHIFT)
 
 static inline void __fill_ipv4hdr_frag(struct ipv4_hdr *dst,
 		const struct ipv4_hdr *src, uint16_t len, uint16_t fofs,
@@ -103,11 +103,14 @@  static inline void __free_fragments(struct rte_mbuf *mb[], uint32_t num)
 	uint32_t out_pkt_pos, in_seg_data_pos;
 	uint32_t more_in_segs;
 	uint16_t fragment_offset, flag_offset, frag_size;
+	uint16_t frag_bytes_remaining;
 
-	frag_size = (uint16_t)(mtu_size - sizeof(struct ipv4_hdr));
-
-	/* Fragment size should be a multiply of 8. */
-	RTE_ASSERT((frag_size & IPV4_HDR_FO_MASK) == 0);
+	/*
+	 * Ensure the IP payload length of all fragments is aligned to a
+	 * multiple of 8 bytes as per RFC791 section 2.3.
+	 */
+	frag_size = RTE_ALIGN_FLOOR((mtu_size - sizeof(struct ipv4_hdr)),
+				    IPV4_HDR_FO_ALIGN);
 
 	in_hdr = rte_pktmbuf_mtod(pkt_in, struct ipv4_hdr *);
 	flag_offset = rte_cpu_to_be_16(in_hdr->fragment_offset);
@@ -142,6 +145,7 @@  static inline void __free_fragments(struct rte_mbuf *mb[], uint32_t num)
 		/* Reserve space for the IP header that will be built later */
 		out_pkt->data_len = sizeof(struct ipv4_hdr);
 		out_pkt->pkt_len = sizeof(struct ipv4_hdr);
+		frag_bytes_remaining = frag_size;
 
 		out_seg_prev = out_pkt;
 		more_out_segs = 1;
@@ -161,7 +165,7 @@  static inline void __free_fragments(struct rte_mbuf *mb[], uint32_t num)
 
 			/* Prepare indirect buffer */
 			rte_pktmbuf_attach(out_seg, in_seg);
-			len = mtu_size - out_pkt->pkt_len;
+			len = frag_bytes_remaining;
 			if (len > (in_seg->data_len - in_seg_data_pos)) {
 				len = in_seg->data_len - in_seg_data_pos;
 			}
@@ -171,9 +175,10 @@  static inline void __free_fragments(struct rte_mbuf *mb[], uint32_t num)
 			    out_pkt->pkt_len);
 			out_pkt->nb_segs += 1;
 			in_seg_data_pos += len;
+			frag_bytes_remaining -= len;
 
 			/* Current output packet (i.e. fragment) done ? */
-			if (unlikely(out_pkt->pkt_len >= mtu_size))
+			if (unlikely(frag_bytes_remaining == 0))
 				more_out_segs = 0;
 
 			/* Current input segment done ? */