[dpdk-dev] [PATCH v2] net/af_packet: fix ignoring full ring on tx

Ferruh Yigit ferruh.yigit at intel.com
Tue Oct 26 16:30:00 CEST 2021


On 10/5/2021 4:11 PM, Tudor Cornea wrote:
> Hi Ferruh,
> 
> I have attempted to narrow down the issue.
> I have the following bash script, which computes packet rates on an
> interface.
> 
> [root at localhost ~]# cat compute-rates.sh
> #!/usr/bin/env bash
> 
> if [[ ${#} -ne 2 ]]; then
>      echo "Usage: ${0} <iface-name> <sleep-interval-seconds>"
>      exit 1
> fi
> 
> IFACE_NAME="${1}"
> SLEEP_INTERVAL_SECONDS="${2}"
> TMP_STATS_FILE="/tmp/netstat"
> 
> # Clear Previous stats file
> echo "0 0 0 0" > "${TMP_STATS_FILE}"
> 
> echo "Press CTRL+C to exit..."
> 
> while true; do
>      export "RxB=0" "RxP=0" "TxB=0" "TxP=0"
> 
>      # Extract Rx{Bytes,Packets} and Tx{Bytes,Packets} and
>      # format the output. Individual fields will be exported
>      export $(\
>          ifconfig "${IFACE_NAME}" \
>              | grep 'packets' \
>              | awk '{print $5, $3}' \
>              | xargs echo \
>              | sed -E -e \
>                  "s/([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+)/RxB=\1 RxP=\2
> TxB=\3 TxP=\4/")
> 
>      # Print Packet and Byte Rates
>      # Format: | Rx Bytes | Rx Packets | Tx Bytes | Tx Packets |
> 
>      echo "${RxB}" "${RxP}" "${TxB}" "${TxP}" $(cat "${TMP_STATS_FILE}") \
>          | awk '{print "RxB="$1-$5, "RxP="$2-$6, "TxB="$3-$7, "TxP="$4-$8}'
> 
>      # Save the new values
>      echo "${RxB}" "${RxP}" "${TxB}" "${TxP}" > "${TMP_STATS_FILE}"
> 
>      sleep "${SLEEP_INTERVAL_SECONDS}"
> 
> done
> 
> On the transmit side, I'm using the engine behind [1] with the af_packet
> PMD.
> 
> The configuration for the af_packet PMD is the following:
> --vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192,framecnt=2048,qpairs=1,qdisc_bypass=0
> 
> I'm configuring a Tx rate of 335 packets / second and a packet size of 300
> Bytes.
> These seem to be the values using which we seem to have better chances of
> seeing the problem. I suspect it might also be linked with the af_packet
> configuration.
> 
> I'm starting traffic using the specified configuration, and in parallel,
> running the script that computes the rates as follows:
> ./compute-rates.sh eth1 0.1
> 
> Initially, the packet rates seem steady
> 
> RxB=0 RxP=0 TxB=10952 TxP=37
> RxB=0 RxP=0 TxB=10656 TxP=36
> RxB=0 RxP=0 TxB=10656 TxP=36
> RxB=0 RxP=0 TxB=10656 TxP=36
> RxB=0 RxP=0 TxB=10952 TxP=37
> RxB=0 RxP=0 TxB=10952 TxP=37
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10952 TxP=37
> 
> [...]
> 
> After a while, we toggle the interface up / down with a sleep between the
> steps. I suspect the length of the sleep might be a variable in the
> equation.
> 
> ifconfig eth1 down; sleep 7; ifconfig eth1 up
> 
> 
> What we see, is that even after the interface is toggled back up, the rates
> never seem to recover.
> 
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=2072 TxP=7
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=521256 TxP=1761
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
> 
> [...]
> 
> 
> I've attempted to mirror the same behavior using dpdk-pktgen [2] on a
> different machine (Ubuntu 20.04). This time, af_packet runs on top of
> a Linux virtio_net interface.
> 
> I seem to be getting a  similar behavior. I have used the following
> dpdk-pktgen configuration and run-time settings
> 
> 
> pktgen \
>      -l 1-4 \
>      -n 4 \
>      --proc-type=primary \
>      --no-pci \
>      --no-telemetry \
>      --no-huge \
>      -m 512 \
>      --vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192,framecnt=2048,qpairs=1,qdisc_bypass=0
> \
>      -- \
>      -P \
>      -T \
>      -m "3.0" \
>      -f themes/black-yellow.theme
> 
> set 0 size 300
> set 0 rate 0.008
> set 0 burst 1
> start 0
> 
> 
> [1] https://github.com/open-traffic-generator/ixia-c
> [2] http://code.dpdk.org/pktgen-dpdk/pktgen-20.11.2/source/INSTALL.md
> 
> On Wed, 29 Sept 2021 at 13:03, Tudor Cornea <tudor.cornea at gmail.com> wrote:
> 

Hi Tudor,

I have used testpmd, 'txonly' forwarding. Tx recovers after interface up,
but by adding some debug logs I can see 'poll()' returns with POLLOUT even
there is no space in the buffer.

According the logic in the PMD, when 'poll()' returns success, it expects
to have some space in the Tx buffer.

So I agree to add the check.

Only have a question on the POLLERR, should we separate the POLLERR check
to cover ifdown case, what do you think about following logic:

if (!TP_STATUS_AVAILABLE) {
     if (poll() < 0)
         break;
     if (pfd.revents & POLLERR)
         break;
}

if (!TP_STATUS_AVAILABLE)
     break;



>> Hi Ferruh,
>>
>> What you described above looks like a ring buffer with single producer and
>>> single consumer, and producer overwrites the not consumed items.
>>
>>
>> Indeed. This is also my understanding of the bug.
>> I am going to try to isolate the issue, and should probably be able to
>> come up with a script in a few days.
>>
>> Our of curiosity, are you using an modified af_packet implementation in
>>> kernel
>>> for above described usage?
>>
>>
>> We are currently using an Ubuntu-based distro with a 4.15 Linux kernel.
>> We don't have any kernel patches for the af_packet implementation to my
>> knowledge (probably excepting patches that are back-ported by Ubuntu
>> maintainers from newer releases).
>>
>>
>> On Mon, 20 Sept 2021 at 20:44, Ferruh Yigit <ferruh.yigit at intel.com>
>> wrote:
>>
>>> On 9/13/2021 2:45 PM, Tudor Cornea wrote:
>>>> The poll call can return POLLERR which is ignored, or it can return
>>>> POLLOUT, even if there are no free frames in the mmap-ed area.
>>>>
>>>> We can account for both of these cases by re-checking if the next
>>>> frame is empty before writing into it.
>>>>
>>>> Signed-off-by: Mihai Pogonaru <pogonarumihai at gmail.com>
>>>> Signed-off-by: Tudor Cornea <tudor.cornea at gmail.com>
>>>> ---
>>>>   drivers/net/af_packet/rte_eth_af_packet.c | 19 +++++++++++++++++++
>>>>   1 file changed, 19 insertions(+)
>>>>
>>>> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c
>>> b/drivers/net/af_packet/rte_eth_af_packet.c
>>>> index b73b211..087c196 100644
>>>> --- a/drivers/net/af_packet/rte_eth_af_packet.c
>>>> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
>>>> @@ -216,6 +216,25 @@ eth_af_packet_tx(void *queue, struct rte_mbuf
>>> **bufs, uint16_t nb_pkts)
>>>>                    (poll(&pfd, 1, -1) < 0))
>>>>                        break;
>>>>
>>>> +             /*
>>>> +              * Poll can return POLLERR if the interface is down
>>>> +              *
>>>> +              * It will almost always return POLLOUT, even if there
>>>> +              * are no extra buffers available
>>>> +              *
>>>> +              * This happens, because packet_poll() calls
>>> datagram_poll()
>>>> +              * which checks the space left in the socket buffer and,
>>>> +              * in the case of packet_mmap, the default socket buffer
>>> length
>>>> +              * doesn't match the requested size for the tx_ring.
>>>> +              * As such, there is almost always space left in socket
>>> buffer,
>>>> +              * which doesn't seem to be correlated to the requested
>>> size
>>>> +              * for the tx_ring in packet_mmap.
>>>> +              *
>>>> +              * This results in poll() returning POLLOUT.
>>>> +              */
>>>> +             if (ppd->tp_status != TP_STATUS_AVAILABLE)
>>>> +                     break;
>>>> +
>>>
>>> If 'POLLOUT' doesn't indicate that there is space in the buffer, what is
>>> the
>>> point of the 'poll()' at all?
>>>
>>> What can we test/reproduce the mentioned behavior? Or is there a way to
>>> fix the
>>> behavior of poll() or use an alternative of it?
>>>
>>>
>>> OK to break on the 'POLLERR', I guess it can be detected in the
>>> 'pfd.revent'.
>>>
>>>
>>>>                /* copy the tx frame data */
>>>>                pbuf = (uint8_t *) ppd + TPACKET2_HDRLEN -
>>>>                        sizeof(struct sockaddr_ll);
>>>>
>>>
>>>



More information about the dev mailing list