[dpdk-users] Strange packet loss with multi-frame payloads

Harold Demure harold.demure87 at gmail.com
Mon Jul 17 15:18:30 CEST 2017


Hello,
  I am having a problem with packets loss and I hope you can help me out.
Below you find a description of the application and of the problem.
It is a little long, but I really hope somebody out there can help me,
because this is driving me crazy.

*Application*

I have a client-server application; single server, multiple clients.
The machines have 8 active cores which poll 8 distinct RX queues to receive
packets and use 8 distinct TX queues to burst out packets (i.e.,
run-to-completion model).

*Workload*

The workload is composed of mostly single-frame packets, but occasionally
clients send to the server multi-frame packets, and occasionally the server
sends back to the client multi-frame replies.
Packets are fragmented at the UDP level (i.e., no IP fragmentation, every
packet of the same requests has a frag_id == 0, even though they share the
same packet_id).

*Problem*

I experience huge packet loss on the server when the occasional multi-frame
requests of the clients correspond to a big payload ( > 300 Kb).
The eth stats that I gather on the server say that there is no error, nor
any packet loss (q_errors, imissed, ierrors, oerrors, rx_nombuf are all
equal to 0). Yet, the application is not seeing some packets of big
requests that the clients send.

I record some interesting facts
1) The clients do not experience such packet loss, although they also
receive  packets with an aggregate payload of the same size of the packets
received by the server. The only differences w.r.t. the server is that a
client machine of course has a lower RX load (it only gets the replies to
its own requests) and a client thread only receives packets from a single
machine (the server).
2) This behavior does not arise as long as the biggest payload exchanged
between clients and servers is < 200 Kb. This leads me to conclude that
fragmentation is not te issue (also, if I implement a stubborn
retransmission, eventually all packets are received even with bigger
payloads). Also, I reserve plenty of memory for my mempool, so I don't
think the server runs out of mbufs (and if that was the case I guess I
would see this in the dropped packets count, right?).
3) If I switch to the pipeline model (on the server only) this problem
basically disappears. By pipeline model I mean something like the
load-balancing app, where a single core on the server receives client
packets on a single RX queue (worker cores reply back to the client using
their own TX queue). This leads me to think that the problem is on the
server, and not on the clients.
4) It doesn't seem to be a "load" problem. If I run the same tests multiple
times, in some "lucky" runs I get that the run-to-completion model
 outperforms the pipeline one. Also, the run-to-completion model with
single-frame packets can handle a number of single-frame packets per second
that is much higher than the number of frames per second that are generated
with the workload with some big packets.


*Question*

Do you have any idea why I am witnessing this behavior? I know that having
fewer queues can help performance by relieving contention on the NIC, but
is it possible that the contention is actually causing packets to get
dropped?

*Platform*

DPDK: v  2.2-0  (I know this is an old version, but I am dealing with
legacy code I cannot change)

MLNX_OFED_LINUX-3.1-1.0.3-ubuntu14.04-x86_64

My NIC : Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

My machine runs a 4.4.0-72-generic  on Ubuntu 16.04.02

CPU is Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz  2x8 cores


Thank you a lot, especially if you went through the whole email :)
Regards,
   Harold


More information about the users mailing list