Bug 796 - Lack of error checking in rte_pktmbuf_mtod could lead to a seg fault
Summary: Lack of error checking in rte_pktmbuf_mtod could lead to a seg fault
Status: UNCONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: ethdev (show other bugs)
Version: 21.05
Hardware: All All
: Normal normal
Target Milestone: ---
Assignee: dev
URL:
Depends on:
Blocks:
 
Reported: 2021-08-26 16:39 CEST by Juan Camilo Vega
Modified: 2021-08-30 09:12 CEST (History)
2 users (show)



Attachments

Description Juan Camilo Vega 2021-08-26 16:39:55 CEST
I have been working with DPDK and I came across a bug that unless well documented could lead to a segmentation fault at runtime. I have a workaround but I wanted to let you know to help the community improve. When calling rte_eth_rx_burst at very high speeds (above 10G), we will on very rare occasions (It usually takes hours to reproduce) we receive packet bursts where most of the packets are valid but one or more packets in the middle have pkts[i]->buf_addr equal to NULL (potentially due to a NIC glitch I assume). The problem is that this parameter is usually abstracted away from the user who would normally just use the rte_pktmbuf_mtod to perform the checks and return the address to the payload rather than accessing it directly, and therefore would not think, unless the documentation explicitly requires it, to check if pkts[i]->buf_addr is NULL for any of the values or not. rte_pktmbuf_mtod also does not check this condition and so if we call 



"char *base_pkt_address = rte_pktmbuf_mtod(pkts[i],char *);" 



where pkts[i]->buf_addr == NULL, then base_pkt_address will not get a valid address but instead gets 0+the size of the preamble (which is usually the constant 0x80). This is not a value the user would normally be checking for, they would instead typically check if base_pkt_address is NULL and otherwise assume the pointer is valid. Attempting to use the data pointed to by base_pkt_address will then lead to a segmentation fault as we try to access invalid address 0x80.


I am using dpdk-21.05. I have an older NIC that does not support VFIO so this was produced using the uio_pci_generic driver.


Thanks
Comment 1 Dmitry Kozlyuk 2021-08-29 01:17:26 CEST
Thank you for sharing your experience, Juan.

DPDK ethdev layer or user code cannot reliably detect or mitigate a NIC/firmware/PMD glitches that produce incorrect mbufs (consider buf_addr containing a random value). rte_pktmbuf_mtod() is a data path function and as such it avoids any checks that are not absolutely necessary and assumes a valid input, in order to maximize performance.

Running a faulty hardware with uio_pci_generic is extremely dangerous to the system due to unrestricted DMA ability. However, if the error happens due to software reasons, e.g. a PMD bug or an firmware blocking an unusable mbuf, a specific PMD can be able to detect a failure and, for example, not to fill and return an invalid mbuf. So please tell which NIC and PMD are you using. You can also try updating DPDK to 20.11 (which should be compatible with yours).
Comment 2 Olivier Matz 2021-08-30 09:12:56 CEST
Agree with Dmitry. If the PMD builds an invalid mbuf, it has to be fixed in the PMD. Adding a check in rte_pktmbuf_mtod() would slow down everyone for a case that should not happen.

Note You need to log in before you can comment on or make changes to this bug.