[dpdk-dev] [PATCH 1/2] net/ixgbe: calculate the correct number of received packets in bulk alloc function

Jianbo Liu jianbo.liu at linaro.org
Fri Feb 3 07:22:15 CET 2017


On 2 February 2017 at 00:19, Ananyev, Konstantin
<konstantin.ananyev at intel.com> wrote:
> Hi,
>
>> -----Original Message-----
>> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
>> Sent: Monday, December 19, 2016 6:09 AM
>> To: dev at dpdk.org; Zhang, Helin <helin.zhang at intel.com>; Ananyev, Konstantin <konstantin.ananyev at intel.com>;
>> jerin.jacob at caviumnetworks.com
>> Cc: Jianbo Liu <jianbo.liu at linaro.org>
>> Subject: [PATCH 1/2] net/ixgbe: calculate the correct number of received packets in bulk alloc function
>>
>> To get better performance, Rx bulk alloc recv function will scan 8 descriptors
>> in one time, but the statuses are not consistent on ARM platform because
>> the memory allocated for Rx descriptors is cacheable hugepages.
>> This patch is to calculate the number of received packets by scanning DD bit
>> sequentially, and stops when meeting the first packet with DD bit unset.
>>
>> Signed-off-by: Jianbo Liu <jianbo.liu at linaro.org>
>> ---
>>  drivers/net/ixgbe/ixgbe_rxtx.c | 12 ++++++++----
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
>> index b2d9f45..2866bdb 100644
>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
>> @@ -1402,17 +1402,21 @@ ixgbe_rx_scan_hw_ring(struct ixgbe_rx_queue *rxq)
>>       for (i = 0; i < RTE_PMD_IXGBE_RX_MAX_BURST;
>>            i += LOOK_AHEAD, rxdp += LOOK_AHEAD, rxep += LOOK_AHEAD) {
>>               /* Read desc statuses backwards to avoid race condition */
>> -             for (j = LOOK_AHEAD-1; j >= 0; --j)
>> +             for (j = LOOK_AHEAD - 1; j >= 0; --j) {
>>                       s[j] = rte_le_to_cpu_32(rxdp[j].wb.upper.status_error);
>> -
>> -             for (j = LOOK_AHEAD - 1; j >= 0; --j)
>>                       pkt_info[j] = rte_le_to_cpu_32(rxdp[j].wb.lower.
>>                                                      lo_dword.data);
>> +             }
>> +
>> +             rte_smp_rmb();
>
> If reads can be reordered, shouldn't we fill pkt_info[] after smp_rmb() here?

The barrier is to forbid the reordering from the following readings,
which will count the number of actual received packets.
And as wb.uper and wb.lower of one descriptor are in the same
cacheline, could it be better to read them at the same time?.

> As another nit - with rmb() in and because you are looking the first gap in s[] now,
> no need to read TXDs in backward order.

Reading backward is just to keep as it is for x86 platform.

> How it looks to me (as a suggestion):
>
> for (j = 0; j != LOOK_AHEAD; j++)
>         s[j] = rte_le_to_cpu_32(rxdp[j].wb.upper.status_error);
>
> rte_smp_rmb();
>
> for (j = 0; j < LOOK_AHEAD && (s[j] & IXGBE_RXDADV_STAT_DD) != 0; j++)
>         ;
>
> for (j = 0; j < nb_dd; ++j) {
>         pkt_info[j] = rte_le_to_cpu_32(rxdp[j].wb.lower.lo_dword.data);
>                ....
>
> Konstantin
>
>
>>
>>               /* Compute how many status bits were set */
>>               nb_dd = 0;
>>               for (j = 0; j < LOOK_AHEAD; ++j)
>> -                     nb_dd += s[j] & IXGBE_RXDADV_STAT_DD;
>> +                     if (s[j] & IXGBE_RXDADV_STAT_DD)
>> +                             ++nb_dd;
>> +                     else
>> +                             break;
>>
>>               nb_rx += nb_dd;
>>
>> --
>> 2.4.11
>


More information about the dev mailing list