[dpdk-dev] [PATCH v3 2/5] gso: add TCP/IPv4 GSO support

Kavanagh, Mark B mark.b.kavanagh at intel.com
Thu Sep 14 11:35:43 CEST 2017


>From: Ananyev, Konstantin
>Sent: Thursday, September 14, 2017 10:11 AM
>To: Kavanagh, Mark B <mark.b.kavanagh at intel.com>; Hu, Jiayu
><jiayu.hu at intel.com>
>Cc: dev at dpdk.org; Tan, Jianfeng <jianfeng.tan at intel.com>
>Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>
>
>
>> -----Original Message-----
>> From: Kavanagh, Mark B
>> Sent: Thursday, September 14, 2017 10:01 AM
>> To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; Hu, Jiayu
><jiayu.hu at intel.com>
>> Cc: dev at dpdk.org; Tan, Jianfeng <jianfeng.tan at intel.com>
>> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>>
>> >From: Ananyev, Konstantin
>> >Sent: Thursday, September 14, 2017 9:40 AM
>> >To: Kavanagh, Mark B <mark.b.kavanagh at intel.com>; Hu, Jiayu
>> ><jiayu.hu at intel.com>
>> >Cc: dev at dpdk.org; Tan, Jianfeng <jianfeng.tan at intel.com>
>> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >
>> >
>> >
>> >> -----Original Message-----
>> >> From: Kavanagh, Mark B
>> >> Sent: Thursday, September 14, 2017 9:35 AM
>> >> To: Hu, Jiayu <jiayu.hu at intel.com>; Ananyev, Konstantin
>> ><konstantin.ananyev at intel.com>
>> >> Cc: dev at dpdk.org; Tan, Jianfeng <jianfeng.tan at intel.com>
>> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >>
>> >> >From: Hu, Jiayu
>> >> >Sent: Thursday, September 14, 2017 2:00 AM
>> >> >To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; Kavanagh, Mark B
>> >> ><mark.b.kavanagh at intel.com>
>> >> >Cc: dev at dpdk.org; Tan, Jianfeng <jianfeng.tan at intel.com>
>> >> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >> >
>> >> >Hi Konstantin,
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Ananyev, Konstantin
>> >> >> Sent: Wednesday, September 13, 2017 11:13 PM
>> >> >> To: Kavanagh, Mark B <mark.b.kavanagh at intel.com>; Hu, Jiayu
>> >> >> <jiayu.hu at intel.com>
>> >> >> Cc: dev at dpdk.org; Tan, Jianfeng <jianfeng.tan at intel.com>
>> >> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >> >>
>> >> >> Hi Mark,
>> >> >>
>> >> >> > -----Original Message-----
>> >> >> > From: Kavanagh, Mark B
>> >> >> > Sent: Wednesday, September 13, 2017 3:52 PM
>> >> >> > To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; Hu, Jiayu
>> >> >> <jiayu.hu at intel.com>
>> >> >> > Cc: dev at dpdk.org; Tan, Jianfeng <jianfeng.tan at intel.com>
>> >> >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >> >> >
>> >> >> > >From: Ananyev, Konstantin
>> >> >> > >Sent: Wednesday, September 13, 2017 10:38 AM
>> >> >> > >To: Hu, Jiayu <jiayu.hu at intel.com>
>> >> >> > >Cc: dev at dpdk.org; Kavanagh, Mark B <mark.b.kavanagh at intel.com>;
>> >> >> Tan, Jianfeng
>> >> >> > ><jianfeng.tan at intel.com>
>> >> >> > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > >> > > +
>> >> >> > >> > > +int
>> >> >> > >> > > +gso_tcp4_segment(struct rte_mbuf *pkt,
>> >> >> > >> > > +		uint16_t gso_size,
>> >> >> > >> > > +		uint8_t ipid_delta,
>> >> >> > >> > > +		struct rte_mempool *direct_pool,
>> >> >> > >> > > +		struct rte_mempool *indirect_pool,
>> >> >> > >> > > +		struct rte_mbuf **pkts_out,
>> >> >> > >> > > +		uint16_t nb_pkts_out)
>> >> >> > >> > > +{
>> >> >> > >> > > +	struct ipv4_hdr *ipv4_hdr;
>> >> >> > >> > > +	uint16_t tcp_dl;
>> >> >> > >> > > +	uint16_t pyld_unit_size;
>> >> >> > >> > > +	uint16_t hdr_offset;
>> >> >> > >> > > +	int ret = 1;
>> >> >> > >> > > +
>> >> >> > >> > > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
>char *)
>> >> >> +
>> >> >> > >> > > +			pkt->l2_len);
>> >> >> > >> > > +	/* Don't process the fragmented packet */
>> >> >> > >> > > +	if (unlikely((ipv4_hdr->fragment_offset &
>rte_cpu_to_be_16(
>> >> >> > >> > > +
>> >> >> 	IPV4_HDR_DF_MASK)) == 0)) {
>> >> >> > >> >
>> >> >> > >> >
>> >> >> > >> > It is not a check for fragmented packet - it is a check that
>> >> >> fragmentation
>> >> >> > >is allowed for that packet.
>> >> >> > >> > Should be IPV4_HDR_DF_MASK - 1,  I think.
>> >> >> > >
>> >> >> > >DF bit doesn't indicate is packet fragmented or not.
>> >> >> > >It forbids to fragment packet any further.
>> >> >> > >To check is packet already fragmented or not, you have to check MF
>bit
>> >> >> and
>> >> >> > >frag_offset.
>> >> >> > >Both have to be zero for un-fragmented packets.
>> >> >> > >
>> >> >> > >>
>> >> >> > >> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF
>> >bit.
>> >> >> It's a
>> >> >> > >> little-endian value. But ipv4_hdr->fragment_offset is big-endian
>> >order.
>> >> >> > >> So the value of DF bit should be "ipv4_hdr->fragment_offset &
>> >> >> > >rte_cpu_to_be_16(
>> >> >> > >> IPV4_HDR_DF_MASK)". If this value is 0, the input packet is
>> >fragmented.
>> >> >> > >>
>> >> >> > >> >
>> >> >> > >> > > +		pkts_out[0] = pkt;
>> >> >> > >> > > +		return ret;
>> >> >> > >> > > +	}
>> >> >> > >> > > +
>> >> >> > >> > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt-
>> >> >> >l3_len -
>> >> >> > >> > > +		pkt->l4_len;
>> >> >> > >> >
>> >> >> > >> > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len -
>pkt_l4_len?
>> >> >> > >>
>> >> >> > >> Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len -
>pkt_l4_len
>> >> >here.
>> >> >> > >>
>> >> >> > >> >
>> >> >> > >> > > +	/* Don't process the packet without data */
>> >> >> > >> > > +	if (unlikely(tcp_dl == 0)) {
>> >> >> > >> > > +		pkts_out[0] = pkt;
>> >> >> > >> > > +		return ret;
>> >> >> > >> > > +	}
>> >> >> > >> > > +
>> >> >> > >> > > +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
>> >> >> > >> > > +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
>> >> >> > >> >
>> >> >> > >> > Hmm, why do we need to count CRC_LEN here?
>> >> >> > >>
>> >> >> > >> Yes, we shouldn't count ETHER_CRC_LEN here. Its length should be
>> >> >> > >> included in gso_size.
>> >> >> > >
>> >> >> > >Why?
>> >> >> > >What is the point to account crc len into this computation?
>> >> >> > >Why not just assume that gso_size is already a max_frame_size -
>> >crc_len
>> >> >> > >As I remember, when we RX packet crc bytes will be already
>stripped,
>> >> >> > >when user populates the packet, he doesn't care about crc bytes
>too.
>> >> >> >
>> >> >> > Hi Konstantin,
>> >> >> >
>> >> >> > When packet is tx'd, the 4B for CRC are added back into the packet;
>if
>> >the
>> >> >> payload is already at max capacity, then the actual segment size
>> >> >> > will be 4B larger than expected (e.g. 1522B, as opposed to 1518B).
>> >> >> > To prevent that from happening, we account for the CRC len in this
>> >> >> calculation.
>> >> >>
>> >> >>
>> >> >> Ok, and what prevents you to set gso_ctx.gso_size = 1514;  /*ether
>frame
>> >> >> size without crc bytes */
>> >> >> ?
>> >>
>> >> Hey Konstantin,
>> >>
>> >> If the user sets the gso_size to 1514, the resultant output segments'
>size
>> >should be 1514, and not 1518.
>>
>> Just to clarify - I meant here that the final output segment, including CRC
>len, should be 1514. I think this is where we're crossing wires ;)
>>
>> >
>> >Yes and then NIC HW will add CRC bytes for you.
>> >You are not filling CRC bytes in HW, and when providing to the HW size to
>send
>> >- it is a payload size
>> >(CRC bytes are not accounted).
>> >Konstantin
>>
>> Yes, exactly - in that case though, the gso_size specified by the user is
>not the actual final output segment size, but (segment size - 4B),
>> right?
>
>CRC bytes will be add by HW, it is totally transparent for user.

Yes - I completely agree/understand.

>
>>
>> We can set that expectation in documentation, but from an
>application's/user's perspective, do you think that this might be
>> confusing/misleading?
>
>I think it would be much more confusing to make user account for CRC bytes.
>Let say when in DPDK you form a packet and send it out via rte_eth_tx_burst()
>you specify only your payload size, not payload size plus crc bytes that HW
>will add for you.
>Konstantin

I guess I've just been looking at it from a different perspective (i.e. the user wants to decide the final total packet size); using the example of rte_eth_tx_burst above, I see where you're coming from though.
Thanks for clarifying,
Mark

>
>>
>> Thanks again,
>> Mark
>>
>> >
>> > Consequently, the payload capacity
>> >> of each segment would be reduced accordingly.
>> >> The user only cares about the output segment size (i.e.
>gso_ctx.gso_size);
>> >we need to ensure that the size of the segments that are
>> >> produced is consistent with that. As a result, we need to ensure that any
>> >packet overhead is accounted for in the segment size, before we
>> >> can determine how much space remains for data.
>> >>
>> >> Hope this makes sense.
>> >>
>> >> Thanks,
>> >> Mark
>> >>
>> >> >
>> >> >Exactly, applications can set 1514 to gso_segsz instead of 1518, if the
>> >lower
>> >> >layer
>> >> >will add CRC to the packet.
>> >> >
>> >> >Jiayu
>> >> >
>> >> >> Konstantin
>> >> >>
>> >> >> >
>> >> >> > If I've missed anything, please do let me know!
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Mark
>> >> >> >
>> >> >> > >
>> >> >> > >Konstantin


More information about the dev mailing list