Bug 701 - mlx5: reading xstats smashes stack
Summary: mlx5: reading xstats smashes stack
Status: UNCONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: ethdev (show other bugs)
Version: 22.03
Hardware: x86 Linux
: Normal normal
Target Milestone: ---
Assignee: dev
URL:
Depends on:
Blocks:
 
Reported: 2021-05-11 09:50 CEST by Jan Viktorin
Modified: 2023-12-21 12:40 CET (History)
4 users (show)



Attachments
Proposed hotfix for the problem (1.17 KB, patch)
2023-12-21 12:40 CET, Timofei Kushnir
Details | Diff

Description Jan Viktorin 2021-05-11 09:50:47 CEST
When reading xstats via mlx5 driver *before the port* is started, such application crashes with "stack smashing detected". After the port is started (and then stoped), it does not happen anymore.

Terminal 1:

# dpdk-testpmd -a 0000:04:00.0 -- -i --disable-device-start
...
EAL: RTE Version: 'DPDK 20.11.1'
...
EAL: Probe PCI driver: mlx5_pci (15b3:101d) device: 0000:04:00.0 (socket 0)
...
testpmd>

Terminal 2:

# dpdk-proc-info -- --xstats
Device 0000:04:00.1 is not driven by the primary process
mlx5_pci: can not attach rte ethdev
mlx5_pci: probe of PCI device 0000:04:00.1 aborted after encountering an error: Cannot allocate memory
common_mlx5: Failed to load driver = mlx5_pci.

EAL: Requested device 0000:04:00.1 cannot be used
EAL: Cannot find resource for device
EAL: No legacy callbacks, legacy socket not created
...
*** stack smashing detected ***: dpdk-proc-info terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7fd76cdf9697]
/lib64/libc.so.6(+0x118652)[0x7fd76cdf9652]
/usr/lib64/dpdk/pmds-21.0/librte_net_mlx5.so.21.0(+0x1b41e4)[0x7fd7602271e4]
======= Memory map: ========
...
Comment 1 Thomas Monjalon 2021-05-11 10:06:21 CEST
Please can you check it reproduces on the latest upstream branch?

Any details from gdb?
Comment 2 Jan Viktorin 2021-05-11 10:27:22 CEST
Backtrace:

Program received signal SIGABRT, Aborted.
0x00007ffff552c387 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-324.el7_9.x86_64 lbr_libnl3-3.5.0-2.el7.x86_64 libfdt-1.4.6-1.el7.x86_64 libgcc-4.8.5-44.el7.x86_64 libibverbs-52mlnx1-1.52104.x86_64 libpcap-1.5.3-12.el7.x86_64 numactl-libs-2.0.12-5.el7.x86_64 zlib-1.2.7-19.el7_9.x86_64
(gdb) bt full
#0  0x00007ffff552c387 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007ffff552da78 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007ffff556ef67 in __libc_message () from /lib64/libc.so.6
No symbol table info available.
#3  0x00007ffff560e697 in __fortify_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007ffff560e652 in __stack_chk_fail () from /lib64/libc.so.6
No symbol table info available.
#5  0x00007fffe8a3c1e4 in mlx5_xstats_get (dev=0x608540 <rte_eth_devices>, stats=0x7fffffffdfa0, n=8) at ../drivers/net/mlx5/mlx5_stats.c:81
        priv = 0x1003d0200
        i = <optimized out>
        counters = <optimized out>
        xstats_ctrl = 0x1003d0af0
        mlx5_stats_n = <optimized out>
#6  0x0000000000000000 in ?? ()
No symbol table info available.

I will try to check the upstream as well.
Comment 3 Jan Viktorin 2021-05-11 11:55:03 CEST
Same issue with main be81f77d8077 ("hash: fix tuple adjustment").

dpdk/ $ rm -rf build && CFLAGS="-fstack-protector-strong" meson build && ninja -C build

1 # build/app/dpdk-testpmd -a 0000:04:00.0 -- -i --disable-device-start

2 # build/app/dpdk-proc-info -- --xstats
...
*** stack smashing detected ***: /home/shared/xvikto03/Projects/cisticka/dpdk/build/app/dpdk-proc-info terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7ffff5c99697]
/lib64/libc.so.6(+0x118652)[0x7ffff5c99652]
/home/shared/xvikto03/Projects/cisticka/dpdk/build/app/dpdk-proc-info[0x100bff4]
Comment 4 Kamil Vojanec 2022-07-29 13:51:41 CEST
Using current main 7220632 ("version: 22.11-rc0"), this bug can be reproduced using the same steps as describe above.

With DPDK build with debug symbols, I was able to get the following backtrace:

#0  0x00007ffff5bda387 in raise () from /lib64/libc.so.6
#1  0x00007ffff5bdba78 in abort () from /lib64/libc.so.6
#2  0x00007ffff5c1ced7 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff5cbc577 in __fortify_fail () from /lib64/libc.so.6
#4  0x00007ffff5cbc532 in __stack_chk_fail () from /lib64/libc.so.6
#5  0x0000000001fc840a in mlx5_xstats_get (dev=0x16, stats=0x49d4, n=0) at ../drivers/net/mlx5/mlx5_stats.c:82
#6  0x0000000000a19c6e in rte_eth_xstats_get (port_id=0, xstats=0x7fffffffdf10, n=16) at ../lib/ethdev/rte_ethdev.c:2985
#7  0x0000000000a19a21 in rte_eth_xstats_get_by_id (port_id=0, ids=0x0, values=0x7a76460, size=16) at ../lib/ethdev/rte_ethdev.c:2940
#8  0x00000000005a711e in nic_xstats_display (port_id=0) at ../app/proc-info/main.c:559
#9  0x00000000005a9994 in main (argc=2, argv=0x7fffffffe4f0) at ../app/proc-info/main.c:1552
Comment 5 Kamil Vojanec 2022-08-01 12:28:58 CEST
Using GDB, I was able to track down the cause of the problem. It seems to be the `counters` variable defined at `mlx5_stats.c:43`. That variable is stack allocated with `n` elements. However, in the `mlx5_os_read_dev_counters()` this variable is then used in memset with incorrect (`xstats_ctrl->mlx5_stats_n`) size, which results in overwriting the stack canary, thus crashing the application.
Comment 6 Timofei Kushnir 2023-12-21 12:40:55 CET
Created attachment 268 [details]
Proposed hotfix for the problem

Proposed patch works well for me.

Note You need to log in before you can comment on or make changes to this bug.