[dpdk-dev] dpdk-2.0.0: crash in ixgbe_recv_scattered_pkts_vec->_recv_raw_pkts_vec->desc_to_olflags_v

Gopakumar Choorakkot Edakkunni gopakumar.c.e at gmail.com
Wed Jul 1 02:50:14 CEST 2015


So update on this. Summary is that its purely my fault, apologies for
prematurely suspecting the wrong areas. Details below

1. So my AWS box had an eth0 interface without DPDK, I enabled dpdk
AND created a KNI interface also AND named the KNI interface to be
eth0

2. So Ubuntu started its dhcpclient on that interface, but my app
doesnt really do anything do read the dhcp (renews) from the KNI and
send it out the physical port and vice versa .. The kni was just
sitting there not doing much of Rx/Tx

3. Now my l2fwd-equivalent code started working fine, after a few
minutes, the dhcp client on ubuntu gave up attempting dhcp renew (eth0
already had an IP) and attempted to take off the IP from eth0

4. At this point the standard KNI examples in dpdk which has callbacks
registered, ended up being invoked - and the examples have a
port_stop() and a port_start() in them - and exactly at this point my
app crashed

So my bad! I just no-oped the callbacks for now and changed AWS eht0
from dhcp to static IP and this are fine now ! My system has been up
for long with no issues.

Thanks again Thomas and Bruce for the quick response and suggestions

Rgds,
Gopa.

On Tue, Jun 30, 2015 at 11:28 AM, Gopakumar Choorakkot Edakkunni
<gopakumar.c.e at gmail.com> wrote:
> Hi Thomas, Bruce,
>
> Thanks for the responses. Please find my answers as below.
>
> Thomas>> "You mean you are using SR-IOV from Amazon, right? Do you
> have more hardware details?"
>
> That is correct. I am attaching three files cpuinfo.txt lcpci.txt and
> portconf.txt (just the port config that I am using, nothing special,
> yanked off of l2fwd example). The two 82599 VF interfaces seen in
> lspci output are the ones of interest - I use one of them in dpdk
> mode.
>
> Thomas>> Did you try to disable CONFIG_RTE_IXGBE_INC_VECTOR?
>
> Thanks for the suggestion, I made that change and was giving it some
> time. Now the result of that is not entirely black and white:
> previously (in vector mode) my app used to Rx/Tx packets nicely
> without any hiccups, but would crash in 10 minutes :). Now with this
> suggested change, its been running for a while and doesnt crash, but
> the Tx latency and Tx loss is so high (around 10% tx loss) that the
> app is not doing a great job - but that might just be something that I
> need to adapt to when using non-vector mode ? I will experiment on
> that a bit more. So I "think" its fair to say that with the vector
> disabled, theres no crash, but I need to chase this latency/loss now.
>
> Thomas>> Not needed. A DPDK application is fast enough to do the job
> in 10 minutes ;)
>
> Haha, good one :). Thats where I want to get to eventually, but right
> now some distance from it.
>
> Bruce>> Can you perhaps isolate any further the root cause of the
> issue. For example, does it only occur when you get three packets at
> the receive ring wraps back around to zero?
>
> I will try some more experiments, will read and understand this Rx
> code a bit more to be able to answer the qn about whether ring wraps
> around when the problem happens etc..
>
> Rgds,
> Gopa.
>
>
> On Tue, Jun 30, 2015 at 9:08 AM, Thomas Monjalon
> <thomas.monjalon at 6wind.com> wrote:
>> 2015-06-30 08:49, Gopakumar Choorakkot Edakkunni:
>>> I am starting to tryout dpdk-2.0.0 with a simple Rx routine very
>>> similar to the l2fwd example - I am running this on a c3.8xlarge aws
>>> sr-iov enabled vpc instance (inside the vm it uses ixgbevf driver).
>>
>> You mean you are using SR-IOV from Amazon, right?
>> Do you have more hardware details?
>>
>>> Once in every 10 minutes my application crashes in the recieve path.
>>> And whenever I check the crash reason its because it always has three
>>> packets in the burst array (I have provided array size of 32) instead
>>> of the four that it tries to collect in one bunch. And inside
>>> desc_to_olflags_v(), theres the assumption that there are four
>>> packets, and obviously it crashes trying to access the fourth buffer.
>>
>> Did you try to disable CONFIG_RTE_IXGBE_INC_VECTOR?
>>
>>> With a brief look at the code, I really cant make out how its
>>> guaranteed that we will always have four descriptors fully populated ?
>>> After the first iteration, the loop does break out if (likely(var !=
>>> RTE_IXGBE_DESCS_PER_LOOP)), but how about the very first iteration
>>> where we might not have four ?
>>>
>>> Any thoughts will be helpful here, trying to get my app working for
>>> more than 10 minutes :)
>>
>> Not needed. A DPDK application is fast enough to do the job in 10 minutes ;)
>>


More information about the dev mailing list