[dpdk-users] BUG: unable to handle kernel paging request

Sushil Adhikari sushil446 at gmail.com
Tue Feb 28 17:49:42 CET 2017


I tried to print a byte @ the data_kva address and it fails right there
without printing anything.

Thank you Keith for your help and support.

@ferruh.yigit can you please look in to this problem and suggest me some
idea on what could be wrong.

I will summarize my progress and finding so far

I was trying to run DPDK KNI application with dpdk version 16.07.2,

For that I first unbinded the ports from ixgbe and binded them to igb_uio
module with following command

echo 0000:05:00.1 > /sys/bus/pci/drivers/ixgbe/unbind
echo 0000:05:00.0 > /sys/bus/pci/drivers/ixgbe/unbind
echo 0x8086 0x1528 > /sys/bus/pci/drivers/igb_uio/new_id

I compiled the kni application for target machine with Linux version 4.4.20
(sushila at dev03) (gcc version 4.9.2 (crosstool-NG 1.20.0) ) #1 SMP Fri Feb
24 14:32:28 CST 2017

and when I ran the application it hung with the following message

Feb 28 10:09:37 (none) user.alert kernel: [   87.029554] BUG: unable to
handle kernel paging request at 0000077e1d012900
Feb 28 10:09:37 (none) user.alert kernel: [   87.029695] IP:
[<ffffffffa0033722>] kni_net_rx_normal+0x2e2/0x440 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [   87.029801] PGD 0
Feb 28 10:09:37 (none) user.warn kernel: [   87.029889] Oops: 0000 [#1] SMP
Feb 28 10:09:37 (none) user.warn kernel: [   87.030010] Modules linked in:
rte_kni(O) igb_uio(O)
Feb 28 10:09:37 (none) user.warn kernel: [   87.030167] CPU: 7 PID: 709
Comm: kni_single Tainted: G          IO    4.4.20 #1
Feb 28 10:09:37 (none) user.warn kernel: [   87.030242] Hardware name:
             /DX58SO2, BIOS SOX5820J.86A.0603.2010.1117.1506 11/17/2010
Feb 28 10:09:37 (none) user.warn kernel: [   87.030320] task:
ffff8805a8ad8000 ti: ffff8805a7ae0000 task.ti: ffff8805a7ae0000
Feb 28 10:09:37 (none) user.warn kernel: [   87.030395] RIP:
0010:[<ffffffffa0033722>]  [<ffffffffa0033722>]
kni_net_rx_normal+0x2e2/0x440 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [   87.030517] RSP:
0018:ffff8805a7ae3d30  EFLAGS: 00010286
Feb 28 10:09:37 (none) user.warn kernel: [   87.030576] RAX:
0000077e1d012900 RBX: 0000000000000020 RCX: 0000000000000010
Feb 28 10:09:37 (none) user.warn kernel: [   87.030639] RDX:
0000000000000001 RSI: 0000000000000246 RDI: ffffffffa00388a3
Feb 28 10:09:37 (none) user.warn kernel: [   87.030701] RBP:
ffff8805a7ae3e80 R08: 000000000000000a R09: 00000000fffffffe
Feb 28 10:09:37 (none) user.warn kernel: [   87.030766] R10:
00000000ffff2fea R11: 0000000000000006 R12: ffff8805a8a75000
Feb 28 10:09:37 (none) user.warn kernel: [   87.030829] R13:
ffff8800b8c12800 R14: 0000000000000000 R15: ffff8805a8a75800
Feb 28 10:09:37 (none) user.warn kernel: [   87.030893] FS:
 0000000000000000(0000) GS:ffff88062fce0000(0000) knlGS:0000000000000000
Feb 28 10:09:37 (none) user.warn kernel: [   87.030971] CS:  0010 DS: 0000
ES: 0000 CR0: 000000008005003b
Feb 28 10:09:37 (none) user.warn kernel: [   87.031031] CR2:
0000077e1d012900 CR3: 0000000001e0a000 CR4: 00000000000006e0
Feb 28 10:09:37 (none) user.warn kernel: [   87.031094] Stack:
Feb 28 10:09:37 (none) user.warn kernel: [   87.031148]  ffff88062fcf5940
ffff8805a8ad8560 0000000000000000 ffff88060000054e
Feb 28 10:09:37 (none) user.warn kernel: [   87.031367]  0000077e1d012900
00000000b8c12800 00000000b8c11ec0 00000000b8c11580
Feb 28 10:09:37 (none) user.warn kernel: [   87.031587]  00000000b8c10c40
00000000b8c10300 00000000b8c0f9c0 00000000b8c0f080
Feb 28 10:09:37 (none) user.warn kernel: [   87.031811] Call Trace:
Feb 28 10:09:37 (none) user.warn kernel: [   87.031871]
 [<ffffffffa00343af>] kni_net_rx+0xf/0x20 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [   87.031937]
 [<ffffffffa0032f05>] kni_thread_single+0x45/0xb0 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [   87.032004]
 [<ffffffffa0032ec0>] ? kni_init_net+0x50/0x50 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [   87.032067]
 [<ffffffff8107b7cb>] kthread+0xdb/0x100
Feb 28 10:09:37 (none) user.warn kernel: [   87.032125]
 [<ffffffff8107b6f0>] ? kthread_park+0x60/0x60
Feb 28 10:09:37 (none) user.warn kernel: [   87.032186]
 [<ffffffff81834c2f>] ret_from_fork+0x3f/0x70
Feb 28 10:09:37 (none) user.warn kernel: [   87.032246]
 [<ffffffff8107b6f0>] ? kthread_park+0x60/0x60
Feb 28 10:09:37 (none) user.warn kernel: [   87.032306] Code: 48 89 85 d0
fe ff ff eb 80 41 f6 c6 0f 75 0e 48 c7 c7 9f 88 03 a0 31 c0 e8 02 e9 11 e1
48 8b 85 d0 fe ff ff 48 c7 c7 a3 88 03 a0 <42> 0f b6 34 30 31 c0 49 83 c6
01 e8 e4 e8 11 e1 e9 5e fe ff ff
Feb 28 10:09:37 (none) user.alert kernel: [   87.034742] RIP
 [<ffffffffa0033722>] kni_net_rx_normal+0x2e2/0x440 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [   87.034844]  RSP
<ffff8805a7ae3d30>
Feb 28 10:09:37 (none) user.warn kernel: [   87.034900] CR2:
0000077e1d012900
Feb 28 10:09:37 (none) user.warn kernel: [   87.034956] ---[ end trace
5b31765eb0372d51 ]---

In there I saw it was failing somewhere in kni_net_rx_normal() function of
kni_net.c file.

So I narrowed down the line of code where it was failing and it came to
line 169 where the memcpy happens
Next I tried to print some addresses in that function and it gave me
kva data addresses: data_kva 0000077e1d012900 kva->buff_add
00007f7e1d012880 kva->data_off 128 kni->mbuf_va  (null) and kni->mbuf_kva
ffff880000000000
Next I tried to see if I can print the data in data_kva address and it
failed there, so it looks like it fails when I try to access data_kva @
0000077e1d012900, I guess address is wrong, I dont know why, Can you give
me some idea on this or some things to try out to debug the problem.


Thank you

Sushil

On Tue, Feb 28, 2017 at 10:00 AM, Wiles, Keith <keith.wiles at intel.com>
wrote:

>
> > On Feb 28, 2017, at 9:42 AM, Sushil Adhikari <sushil446 at gmail.com>
> wrote:
> >
> > Since printf is not working here I'm using printk, do you mean whether
> skb_put(skb, len) fails or the memcpy fails( line 169)? I had separated the
> line 169 in to two one for skb_put and another just memcpy, and it doesn't
> fail on skb_put so its memory copy that what causing the fail. Since the
> memory location of data_skv and the location in "BUG: unable to handle
> kernel paging request at 000007529d212900" matches I thought the data_skv
> address is not correct or something.
>
> If you try printing the a byte or word at the data_kva address does the
> printf fail?
>
> If not try looping on the address read every 128 bytes and see how far you
> get. If it is the first address then I am guessing the mbuf_kva address is
> bad. Then we need to look in the MAINTAINERS file and email the maintainer
> directly to see if he knows what is happening.
>
> >
> > On Tue, Feb 28, 2017 at 9:34 AM, Wiles, Keith <keith.wiles at intel.com>
> wrote:
> >
> > > On Feb 28, 2017, at 9:30 AM, Sushil Adhikari <sushil446 at gmail.com>
> wrote:
> > >
> > > its failing at data_kva address because this is where I'm getting the
> kernel paging request fail
> > > BUG: unable to handle kernel paging request at 000007529d212900
> > > and this is what my debug shows
> > > kva data addresses: data_kva 000007529d212900, kva->buff_add
> 00007f529d212880, kva->data_off 128, kni->mbuf_va  (null), and
> kni->mbuf_kva ffff880000000000
> >
> > I was thinking of using GDB to dump memory or use printf to see which
> one is failing.
> >
> > >
> > > I'm not sure how to verify that these are normal
> > >
> > > On Mon, Feb 27, 2017 at 4:41 PM, Wiles, Keith <keith.wiles at intel.com>
> wrote:
> > >
> > > > On Feb 27, 2017, at 4:22 PM, Sushil Adhikari <sushil446 at gmail.com>
> wrote:
> > > >
> > > > I narrowed it to location where it was failing, its coming from
> http://dpdk.org/browse/dpdk-stable/tree/lib/librte_eal/
> linuxapp/kni/kni_net.c?h=v16.07.2 line 169, I am getting the value of len
> to be 1358 from len=kva->pkt_len; which seems right for ip packet and the
> memory allocation from line 157 also seems to be working fine. when I print
> the sizeof(*skb) or sizeof(struct sk_buff) its giving me 208, I guess I
> dont know whether it should be the size we allocate from line 157, which is
> len + 2 = 1360 or its fixed size structure of 208 byte. I would appreciate
> any insight.
> > > > Linux version 4.4.20 (sushila at dev03) (gcc version 4.9.2
> (crosstool-NG 1.20.0) ) #1 SMP Fri Feb 24 14:32:28 CST 2017
> > >
> > > Looks like we need to determine which address is failing the skb_put()
> or data_kva address. If the address that fails is at the end of the
> skb_put() then I would think the len is wrong, meaning we are stepping on
> memory just passed a page for the skb. If the address that fails is in the
> data_kva then the calculations for that address are wrong in line 154. You
> may want to printout the kva->data_off, buf_addr, mbuf_va and mbuf_kva to
> verify these values seem normal. The data_off value should be reasonable (I
> guess) meaning with a 2K range.
> > >
> > > Also print out the two values skb_put() and data_kva. You can use gdb
> to example the memory using the dump memory command. (is it x/nn <address>)
> nn is the read width, but you could leave off the ‘/nn’ for the default.
> > >
> > > >
> > > > Thank you
> > > > Sushil
> > > >
> > > > On Sat, Feb 25, 2017 at 10:31 AM, Wiles, Keith <
> keith.wiles at intel.com> wrote:
> > > >
> > > > > On Feb 24, 2017, at 8:07 AM, Sushil Adhikari <sushil446 at gmail.com>
> wrote:
> > > > >
> > > > > Resending because of unsupported email content type
> > > > >
> > > > >
> > > > > yes hanging is the better word I guess,  ctrl + c is not working
> to actually stop the program. I also had display connected to the target
> manchine and I have attached a picture that shows the messages in that
> display that is where I saw "BUG:Unable to handle kernel paging request at
> xxxxxx", which made me think that the program is in bad state.
> > > >
> > > > Sorry, I do not see why you are getting this message. All I can
> suggest is to use GDB and see if you can determine why the message is
> happening.
> > > >
> > > > >
> > > > > info thread in gdb shows one thread running
> > > > > Id   Target Id         Frame
> > > > > * 1    LWP 843 "dpdkKni" 0x000000000044eaee in rte_kni_tx_burst ()
> > > > >
> > > > > On Thu, Feb 23, 2017 at 5:41 PM, Wiles, Keith <
> keith.wiles at intel.com> wrote:
> > > > >
> > > > > > On Feb 23, 2017, at 2:38 PM, Sushil Adhikari <
> sushil446 at gmail.com> wrote:
> > > > > >
> > > > > > While trying to run dpdk Kni application I ran in to a problem,
> with
> > > > > > following error message
> > > > > > BUG: unable to handle kernel paging request at 000007ffe2b92780
> > > > > >
> > > > > > To run the application I first unbinded the ports from kernel
> module and
> > > > > > binded them to igb_uio
> > > > > >> echo 0000:05:00.1 > /sys/bus/pci/drivers/ixgbe/unbind
> > > > > >> echo 0000:05:00.0 > /sys/bus/pci/drivers/ixgbe/unbind
> > > > > >> echo 0x8086 0x1528 > /sys/bus/pci/drivers/igb_uio/new_id
> > > > > >
> > > > > > I ran the application using gdb as
> > > > > >
> > > > > > [~]$ /root/gdb dpdkKni
> > > > > > GNU gdb (crosstool-NG 1.20.0) 7.8
> > > > > > Copyright (C) 2014 Free Software Foundation, Inc.
> > > > > > License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html
> > > > > >>
> > > > > > This is free software: you are free to change and redistribute
> it.
> > > > > > There is NO WARRANTY, to the extent permitted by law.  Type
> "show copying"
> > > > > > and "show warranty" for details.
> > > > > > This GDB was configured as "x86_64-unknown-linux-gnu".
> > > > > > Type "show configuration" for configuration details.
> > > > > > For bug reporting instructions, please see:
> > > > > > <http://www.gnu.org/software/gdb/bugs/>.
> > > > > > Find the GDB manual and other documentation resources online at:
> > > > > > <http://www.gnu.org/software/gdb/documentation/>.
> > > > > > For help, type "help".
> > > > > > Type "apropos word" to search for commands related to "word"...
> > > > > > Reading symbols from dpdkKni...(no debugging symbols
> found)...done.
> > > > > > (gdb) Run dpdkKni -c 0x0f -n 4 -- -P -p 0x3
> --config="(0,0,1),(1,2,3)"
> > > > > > Starting program: /root/dpdkKni dpdkKni -c 0x0f -n 4 -- -P -p 0x3
> > > > > > --config="(0,0,1),(1,2,3)"
> > > > > > warning: Could not load shared library symbols for
> linux-vdso.so.1.
> > > > > > Do you need "set solib-search-path" or "set sysroot"?
> > > > > > warning: Unable to find libthread_db matching inferior's thread
> library,
> > > > > > thread debugging will not be available.
> > > > > > EAL: Detected 4 lcore(s)
> > > > > > EAL: Probing VFIO support...
> > > > > > EAL: PCI device 0000:05:00.0 on NUMA socket -1
> > > > > > EAL:   probe driver: 8086:1528 net_ixgbe
> > > > > > EAL: PCI device 0000:05:00.1 on NUMA socket -1
> > > > > > EAL:   probe driver: 8086:1528 net_ixgbe
> > > > > > Address of pktmbuf_pool 0x7ffff5a7dec0
> > > > > > APP: Initialising port 0 ...
> > > > > > KNI: pci: 05:00:00       8086:1528
> > > > > > kni created for port 0 with kni[i] address 0x7fff75638280 with i
> 0
> > > > > > APP: Initialising port 1 ...
> > > > > > KNI: pci: 05:00:01       8086:1528
> > > > > > kni created for port 1 with kni[i] address 0x7fff75629e00 with i
> 0
> > > > > > APP: Lcore 1 is writing to port 0
> > > > > > APP: Lcore 2 is reading from port 1
> > > > > > APP: Lcore 3 is writing to port 1
> > > > > > APP: Lcore 0 is reading from port 0
> > > > > > ^C
> > > > > > Program received signal SIGINT, Interrupt.
> > > > >
> > > > > The program did not crash or get a segfault, but you hit control-c
> which stopped the application. When you ran the application you started 4
> threads and this is why it would appear in different places when stopped.
> > > > >
> > > > > If the application is hanging then you can use control-C and then
> do ‘info threads’ command to see the location of all threads. You can use
> the ‘thread X’ command to switch between threads. Please check the command
> usage here I am going from memory.
> > > > >
> > > > > I am not sure if the application has a -i option to get a command
> line if so that maybe useful to enable, check the application to see if it
> used cmdline feature.
> > > > >
> > > > > It maybe the application just sits running and you have to use
> other tools or apps to send traffic on the KNI application, sorry I have
> not really used the KNI example.
> > > > >
> > > > > > 0x000000000044e916 in rte_kni_tx_burst ()
> > > > > > (gdb) backtrace
> > > > > > #0  0x000000000044e916 in rte_kni_tx_burst ()
> > > > > > #1  0x0000000000619758 in main_loop(void*) ()
> > > > > > #2  0x0000000000431183 in rte_eal_mp_remote_launch ()
> > > > > > #3  0x000000000040d312 in main ()
> > > > > >
> > > > > > (this is where the program crashes)
> > > > > >
> > > > > > I tried to trace the crash with gdb(I am new to gdb)
> > > > > >
> > > > > > and when I do the backtrace it ends up in different functions
> each time:
> > > > > > this time it gave me rte_kni_tx_burst()
> > > > > >
> > > > > > I'm running latest dpdk version 17.02 and linux kernel is
> > > > > > Linux version 4.4.20 (tcuser at cibuild08) (gcc version 4.9.2
> (crosstool-NG
> > > > > > 1.20.0) ) #1 SMP
> > > > > >
> > > > > > I would appreciate any suggestion or insight regarding this
> issue.
> > > > >
> > > > > Regards,
> > > > > Keith
> > > > >
> > > > >
> > > > > <kni.jpg>
> > > >
> > > > Regards,
> > > > Keith
> > > >
> > > >
> > >
> > > Regards,
> > > Keith
> > >
> > >
> >
> > Regards,
> > Keith
> >
> >
>
> Regards,
> Keith
>
>


More information about the users mailing list