[dpdk-dev] [PATCH 0/2] support mailbox interruption on ixgbe/igb VF

Luca Boccassi lboccass at Brocade.com
Tue Jun 28 17:10:24 CEST 2016


On Tue, 2016-06-28 at 10:16 +0000, Luca Boccassi wrote:
> On Tue, 2016-05-24 at 14:06 +0800, Wenzhuo Lu wrote:
> > This patch set addes the support of the mailbox interruption on VF.
> > So, VF can receice the messges for physical link down/up.
> > 
> > PS: This patch set is splitted from a previous patch set, *automatic
> > link recovery on ixgbe/igb VF*.
> > 
> > Wenzhuo Lu (2):
> >   ixgbe: VF supports mailbox interruption for PF link up/down
> >   igb: VF supports mailbox interruption for PF link up/down
> > 
> >  doc/guides/rel_notes/release_16_07.rst |   6 ++
> >  drivers/net/e1000/igb_ethdev.c         | 159 +++++++++++++++++++++++++++++++++
> >  drivers/net/ixgbe/ixgbe_ethdev.c       |  85 +++++++++++++++++-
> >  3 files changed, 247 insertions(+), 3 deletions(-)
> > 
> 
> Hi,
> 
> After backporting these patches to 16.04 or 2.2, we get a segmentation
> fault when using interface bonding when the interfaces go down. The
> scenario is:
> 
> - Host has a X540-AT2 10gb card using the ixgbe driver, 2 VFs are
> created and passes to the qemu/kvm guest VM via libvirt
> - Guest creates a bonded link using the 2 VFs
> - Host sets the VFs state to down via ip link
> - Guess DPDK app segfaults
> 
> Backtrace:
> 
> #0  0x0000000000000000 in ?? ()
> No symbol table info available.
> #1  0x00007ffff5003957 in bond_ethdev_slave_link_status_change_monitor (
>     cb_arg=0x727748 <rte_eth_devices@@DPDK_2.2+4168>)
>     at /usr/src/packages/BUILD/drivers/net/bonding/rte_eth_bond_pmd.c:1938
>         internals = 0x7fffeb8f5ec0
>         i = 0
>         polling_slave_found = 0
> #2  0x00007ffff68ea88c in eal_alarm_callback (hdl=<optimized out>, arg=<optimized out>)
>     at /usr/src/packages/BUILD/lib/librte_eal/linuxapp/eal/eal_alarm.c:120
>         now = {tv_sec = 356, tv_nsec = 551082574}
>         ap = 0x7fffebc22380
> #3  0x00007ffff68e926d in eal_intr_process_interrupts (nfds=<optimized out>, events=<optimized out>)
>     at /usr/src/packages/BUILD/lib/librte_eal/linuxapp/eal/eal_interrupts.c:752
>         bytes_read = <optimized out>
>         buf = {uio_intr_count = 1, vfio_intr_count = 1, timerfd_num = 1, 
>           charbuf = "\001\000\000\000\000\000\000\000D\260~\363\377\177\000"}
>         n = 0
>         src = 0x7fffeb8d2640
>         cb = 0x7fffeb8d2d80
>         next = <optimized out>
>         active_cb = <optimized out>
> #4  eal_intr_handle_interrupts (totalfds=<optimized out>, pfd=12)
>     at /usr/src/packages/BUILD/lib/librte_eal/linuxapp/eal/eal_interrupts.c:800
>         events = 0x7fffefb1ba20
>         nfds = 1
> #5  eal_intr_thread_main (arg=<optimized out>)
>     at /usr/src/packages/BUILD/lib/librte_eal/linuxapp/eal/eal_interrupts.c:870
>         pipe_event = {events = 3, data = {ptr = 0x6, fd = 6, u32 = 6, u64 = 6}}
>         src = <optimized out>
>         numfds = <optimized out>
>         pfd = 12
>         ev = {events = 3, data = {ptr = 0xf7df02e500000005, fd = 5, u32 = 5, 
>             u64 = 17860997829745442821}}
>         __func__ = "eal_intr_thread_main"
> #6  0x00007ffff37eb0a4 in start_thread (arg=0x7fffefb3c700) at pthread_create.c:309
>         __res = <optimized out>
>         pd = 0x7fffefb3c700
>         now = <optimized out>
>         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737214924544, 2510814068564645188, 1, 
>                 140737354125408, 140737336548072, 140737214924544, -2510779380161489596, 
>                 -2510806361332034236}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, 
>             data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
>         not_first_call = <optimized out>
>         pagesize_m1 = <optimized out>
>         sp = <optimized out>
>         freesize = <optimized out>
>         __PRETTY_FUNCTION__ = "start_thread"
> #7  0x00007ffff1b8287d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> 
> It dies in this bit:
> 
> /* Update slave link status */
> (*slave_ethdev->dev_ops->link_update)(slave_ethdev,
> 	internals->slaves[i].link_status_wait_to_complete);
> 
> (gdb) print rte_eth_devices[internals->slaves[i].port_id]
> $7 = {rx_pkt_burst = 0x0, tx_pkt_burst = 0x0, data = 0x0, driver = 0x0, dev_ops = 0x0, {pci_dev = 0x0, 
>     vmbus_dev = 0x0}, link_intr_cbs = {tqh_first = 0x0, tqh_last = 0x0}, post_rx_burst_cbs = {
>     0x0 <repeats 256 times>}, pre_tx_burst_cbs = {0x0 <repeats 256 times>}, attached = 0 '\000', 
>   dev_type = RTE_ETH_DEV_UNKNOWN}
> 
> I'm assuming it's not a simply matter of checking the dev_type or for
> nulls. Do you have any suggestions/insight? I'm delving into the issue,
> but it's the first time I look at the bonding code so any help or
> pointers would be greatly appreciated.
> 
> Note that I also tried to backport the additional patches for reset that
> are currently under review on top of these, but there's no difference.
> But I have not yet used the new reset API in our app though.
> 
> Thanks!

I noticed that we were used 2 of the patches that were self-nacked, and
they were causing the crash. I'll switch to the new version that is
under review instead.

-- 
Kind regards,
Luca Boccassi


More information about the dev mailing list