[PATCH v2 2/2] net: have checksum routines accept unaligned data

Mattias Rönnblom mattias.ronnblom at ericsson.com
Mon Jul 11 12:53:27 CEST 2022


On 2022-07-11 11:53, Olivier Matz wrote:
> Hi,
> 
> On Fri, Jul 08, 2022 at 02:56:08PM +0200, Mattias Rönnblom wrote:
>> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
>> data through an uint16_t pointer, which allowed the compiler to assume
>> the data was 16-bit aligned. This in turn would, with certain
>> architectures and compiler flag combinations, result in code with SIMD
>> load or store instructions with restrictions on data alignment.
>>
>> This patch keeps the old algorithm, but data is read using memcpy()
>> instead of direct pointer access, forcing the compiler to always
>> generate code that handles unaligned input. The __may_alias__ GCC
>> attribute is no longer needed.
>>
>> The data on which the Internet checksum functions operates are almost
>> always 16-bit aligned, but there are exceptions. In particular, the
>> PDCP protocol header may (literally) have an odd size.
>>
>> Performance impact seems to range from none to a very slight
>> regression.
>>
>> Bugzilla ID: 1035
>> Cc: stable at dpdk.org
>>
>> ---
> 
> Using memcpy() looks to be a good solution fix the issue, while avoiding a
> branch and the __may_alias__.
> 
> I just have one minor comment below.
> 
>>
>> v2:
>>    * Simplified the odd-length conditional (Morten Brørup).
>>
>> Reviewed-by: Morten Brørup <mb at smartsharesystems.com>
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom at ericsson.com>
>> ---
>>   lib/net/rte_ip.h | 17 ++++++++++-------
>>   1 file changed, 10 insertions(+), 7 deletions(-)
>>
>> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
>> index b502481670..a0334d931e 100644
>> --- a/lib/net/rte_ip.h
>> +++ b/lib/net/rte_ip.h
>> @@ -160,18 +160,21 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr *ipv4_hdr)
>>   static inline uint32_t
>>   __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
>>   {
>> -	/* extend strict-aliasing rules */
>> -	typedef uint16_t __attribute__((__may_alias__)) u16_p;
>> -	const u16_p *u16_buf = (const u16_p *)buf;
>> -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
>> +	const void *end;
>>   
>> -	for (; u16_buf != end; ++u16_buf)
>> -		sum += *u16_buf;
>> +	for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) * sizeof(uint16_t));
> 
> What do you think about this form:
> 
> 	for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t)));
> 
> This also has the good property to solve the debate about the
> spaces around the '/' :)
> 

Shorter, more readable. Looks good to me.

Thanks.

> 
>> +	     buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
>> +		uint16_t v;
>> +
>> +		memcpy(&v, buf, sizeof(uint16_t));
>> +		sum += v;
>> +	}
>>   
>>   	/* if length is odd, keeping it byte order independent */
>>   	if (unlikely(len % 2)) {
>>   		uint16_t left = 0;
>> -		*(unsigned char *)&left = *(const unsigned char *)end;
>> +
>> +		memcpy(&left, end, 1);
>>   		sum += left;
>>   	}
>>   
>> -- 
>> 2.25.1
>>



More information about the stable mailing list