Bug 29
Summary: | pktgen hangs when it tries to send packets through libvirt driver, works for all other drivers | ||
---|---|---|---|
Product: | DPDK | Reporter: | Gregory Shimansky (gregory.shimansky) |
Component: | ethdev | Assignee: | Keith Wiles (keith.wiles) |
Status: | CONFIRMED --- | ||
Severity: | normal | CC: | ajit.khaparde, daria.kolistratova, keith.wiles, tiwei.bie |
Priority: | Normal | ||
Version: | 18.02 | ||
Target Milestone: | --- | ||
Hardware: | x86 | ||
OS: | Linux | ||
Attachments: | Vagrantfile and scripts to create test VMs |
Description
Gregory Shimansky
2018-04-27 22:03:47 CEST
There is also a comment from pktgen maintainer that virtio makes modifications to mbuf structures which may interfere in how pktgen works https://github.com/pktgen/Pktgen-DPDK/issues/148#issuecomment-383088512 What is the DPDK (dpdk-devbind.py) and Pktgen command line options for each nff-go instances you are executing? I see three interfaces, but I assumed you were using virtIO are these eth0-2 virtio interfaces? The first interface is management interface and is used to connect to external world. Second and third are interfaces connected to each other, like this: | | nff-go-0 ==== nff-go-1 There is a scripts.sh file which is sourced when you log in, so you should have two functions: "bindports" and "runpktgen". First one binds second and third ports to DPDK driver built inside of NFF-Go source tree, and second command launches pktgen on these two ports. I want to make some changes to DPDK or pktgen inside the vagrant instance. What is the command to rebuild dpdk and pktgen? You can run make within ${NFF_GO}/dpdk directory. It should rebuild both DPDK and pktgen. I found a line in the file virtio_rxtx.c at around line 1048 'need = RTE_MIN(need, (int)nb_used);' The problem appears when nb_used is zero and the virtio_xmit_cleanup is called with zero. I commented out the line and now pktgen runs. It appears a deadlock is happening when nb_used is zero and when we need a ring entry. I do not know the virtio driver that well and the virtio maintainer(s) need to address the correct fix. I tried to run pktgen with this change several times on Libvirt/Qemu and VirtualBox VMs, and got several different results: 1. Several times on Qemu I got a freeze in terminal and VM froze so that I had to revive it with "vagrant reload nff-go-0" command because I couldn't ssh to the system any more. But the packets were actually were going, I saw them arriving on a second VM, so pktgen was successfully sending packets! 2. A lot of times pktgen crashed with Segmentation fault both on Qemu and VB. 3. Several times on VB I saw it start to send packets juts fine. I suspect, that in the #1 case in the comment above, something bad happens with Linux kernel driver on network interface used by ssh because VM doesn't freeze, kernel doesn't crash and pktgen executes just fine. Just no ssh connection, that is why terminal freezes. Pktgen is consuming all of the cycles sending traffic and the VM can not timeshare enough time to manage anything else in the VM. This means to me the VM is telling us we have 8 cores, but in reality that is not the case most likely one physical core or two. Look at how the QEMU is setup and see if you can increase the number of core as DPDK wants physical cores. I ran pktgen a number times and never saw a crash and I am guessing it is not a pktgen problem, but a DPDK or virtio problem. Hava a similar problem on AWS images with ENA driver. Who gets to work on this? I tried to reproduce this issue (that pktgen stops sending packets after few seconds) in a similar setup (two VMs with virtio ports connected by UDP tunnel, one of them runs pktgen and the other one runs testpmd/rxonly). Seems the issue has already been fixed in the latest code (pktgen-3.6.2/DPDK v19.02-rc2). (The issue can be reproduced in this setup with pktgen-3.4.9/DPDK-18.02) The issue is fixed in a way that pktgen doesn't stop sending packets any more (I see them coming on the other side). But pktgen's UI freezes and I can only kill it on another console because it doesn't accept any commands after I run "start 0". Possibly now it is an application bug, not DPDK. I used DPDK 19.02.0-rc3 (0a703f0f36c11b6f23fad4fab9e79c308811329d) and pktgen-3.6.4. Dpdk also had a bug reported as the one you referenced and they just reported it was fixed after testing it again.
This one I have wait till I am back from vacation and I am having an operation on my shoulder when I get back. This means it maybe some time before I can get to this one.
Sent from my iPhone
> On Jan 24, 2019, at 6:49 PM, "bugzilla@dpdk.org" <bugzilla@dpdk.org> wrote:
>
> https://bugs.dpdk.org/show_bug.cgi?id=29
>
> --- Comment #13 from Gregory Shimansky (gregory.shimansky@intel.com) ---
> The issue is fixed in a way that pktgen doesn't stop sending packets any more
> (I see them coming on the other side). But pktgen's UI freezes and I can only
> kill it on another console because it doesn't accept any commands after I run
> "start 0". Possibly now it is an application bug, not DPDK.
>
> I used DPDK 19.02.0-rc3 (0a703f0f36c11b6f23fad4fab9e79c308811329d) and
> pktgen-3.6.4.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
I remember now the testing that dpdk did was around virtio and not VMs.
I do not use VMs much, but sometimes the cpus in a VM are not mapped as CPU’s in the host. Because pktgen needs at a minimum of two cpus to run could it be the configuration of the VMs is not really using two different cpus.
Pktgen try’s to make sure you do not put the rx/tx core on the same core as the cli and display. Could the vm be putting the vCPU on the same cores?
Not sure if that is the problem here.
Sent from my iPhone
> On Jan 24, 2019, at 6:49 PM, "bugzilla@dpdk.org" <bugzilla@dpdk.org> wrote:
>
> https://bugs.dpdk.org/show_bug.cgi?id=29
>
> --- Comment #13 from Gregory Shimansky (gregory.shimansky@intel.com) ---
> The issue is fixed in a way that pktgen doesn't stop sending packets any more
> (I see them coming on the other side). But pktgen's UI freezes and I can only
> kill it on another console because it doesn't accept any commands after I run
> "start 0". Possibly now it is an application bug, not DPDK.
>
> I used DPDK 19.02.0-rc3 (0a703f0f36c11b6f23fad4fab9e79c308811329d) and
> pktgen-3.6.4.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
Which is the case pktgen UI freezes up, but sending traffic or does not send at all after a few seconds? Which is the case pktgen UI freezes up, but sending traffic or does not send at all after a few seconds? Which is the case pktgen UI freezes up, but sending traffic or does not send at all after a few seconds? |