[dpdk-users] [DPDK-PDUMP] Segmentation fault while Restarting PDUMP

RAJESH KUMAR S.R rajuuu1992 at gmail.com
Wed Mar 21 14:43:38 CET 2018


Hi Rami,

Got some temperary fix.

My understanding of the issue is as follows.

When the pdump is called for the first time,

rte_eth_devices attach to new ports.

rte_eth_dev_find_free_port() is called that returns 2 for rx_port and 3 for
tx_port.

Corresponding, tx_queues and rx_queues and other data are stored in
rte_eth_device.data[2] & rte_eth_devices[3].data, obtained from
rte_eth_dev_shared_data->data[2] and rte_eth_dev_shared_data->data[3]

During the program exit, these data are not getting cleaned up.
rte_eth_dev_shared_data->data[2].name and
rte_eth_dev_shared_data->data[3].name have the old values ("
net_pcap_rx_0","net_pcap_tx_0")


rte_eth_dev_find_free_port() iterates over all ports and selects the first
port that has no name.

When pdump is restarted, rte_eth_devices are getting attached to new
ports(rx_port at 4, tx_port at 5).

So, rte_eth_dev_find_free_port() returns 4 for rx and 5 for tx instead of 2
and 3 which are free.

In configure_vdev in pdump/main.c, the check  (port_id > nb_ports) fails
and returns.

While debugging using gdb, I set ports as rx= 2 and tx= 3 when it was
getting set to rx=4 and tx=5, it was working fine

So, I reverted this DPDK Commit 8ee892a2385c50427c03db5cef1789babceb5999

 ethdev: fix port id allocation

for (i = 0; i < RTE_MAX_ETHPORTS; i++) \{
-               if (rte_eth_devices\[i\].state == RTE_ETH_DEV_UNUSED)
+               /* Using shared name field to find a free port. */
+               if (rte_eth_dev_data\[i\].name[0] == '\0') {
+                       RTE_ASSERT(rte_eth_devices\[i\].state ==
+                                  RTE_ETH_DEV_UNUSED);
                        return i;
+               \}
        \}
        return RTE_MAX_ETHPORTS;
 \}

After reverting, it was working fine.
I'm not sure whether this is the right way to do.



Regards,
Rajesh kumar S R

On Tue, Mar 20, 2018 at 2:55 PM, RAJESH KUMAR S.R <rajuuu1992 at gmail.com>
wrote:

> Hi Rami,
>
>
> I don't have a proper understanding of the whole flow. I'm not sure what I
> have missed. Need some help
> Got some info while debugging.
>
>
> There is a structure "rte_eth_dev_shared_data". The "data" field in this
> structure has some old data from previous run.
>
> It is getting assigned in rte_eth_dev_shared_data() -->
>
>  mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA);
>  ....
>  rte_eth_dev_shared_data = mz->addr;
>
>
> a.
>
> Function: rte_eth_dev_find_free_port
> backtrace:  (create_mp_ring_vdev -->rte_eth_dev_attach((devargs=0x7fffffffe6e0
> "net_pcap_rx_0,tx_pcap=/tmp/rx.pcap", port_id=0x7fffffffe6ca))
>  --> rte_eal_dev_attach -->rte_eal_hotplug_add --> vdev_plug--->
> vdev_probe_all_drivers --> pmd_pcap_probe  --> eth_from_pcaps -->
> eth_from_pcaps_common -->pmd_init_internals --> rte_eth_vdev_allocate-->
> rte_eth_dev_allocate --> rte_eth_dev_find_free_port)
>
>
> During the first run,
>
> Port_ids that are allocated from rte_eth_dev_find_free_port function are
> 2(rx) and 3(tx).
>
> The  data values in rte_eth_dev_shared_data are
>
> rte_eth_dev_shared_data->data[2]
> {name = "net_pcap_rx_0", .....}
>
> p rte_eth_dev_shared_data->data[3]
> {name = "net_pcap_tx_0", ....}
>
> The packet capture is working fine.
>
> b.
>
> During the second run,
> In function, rte_eth_dev_find_free_port
>
> "....
> if (rte_eth_dev_shared_data->data[i].name[0] == '\0') {
>                         RTE_ASSERT(rte_eth_devices[i].state ==
>                                    RTE_ETH_DEV_UNUSED);
>                         return i;
>                 }
> ..."
>
>
> p rte_eth_dev_shared_data->data[2]
> $3 = {name = "net_pcap_rx_0", .....}
>
> p rte_eth_dev_shared_data->data[3]
> $4 = {name = "net_pcap_tx_0", ....}
>
>
> The previous data exists here.
> The port_ids that is returned by rte_eth_dev_find_free_port are 4(rx) &
> 5(tx).
>
> c.
> pdump/main.c -> In configure_vdev() , for port_id=4
> const uint8_t nb_ports = rte_eth_dev_count();
> (= 3)
> This check is failing
>  if (port_id > nb_ports)
>                 return -1;
>
>
> In configure_vdev() , for port_id=5
> const uint8_t nb_ports = rte_eth_dev_count();
> (= 4)
> This check is failing
>  if (port_id > nb_ports)
>                 return -1;
>
> rte_eth_dev_configure, rte_eth_dev_start are not getting called.
> So, In the rte_eth_tx_burst (dump_packets -->pdump_rxtx-->rte_eth_tx_burst)
> , some data in rte_eth_devices[port_id] seems to be NULL.
>
>
> I'll update with more info
>
>
> Regards,
> Rajesh Kumar S R
>
> On Mon, Mar 19, 2018 at 4:54 PM, RAJESH KUMAR S.R <rajuuu1992 at gmail.com>
> wrote:
>
>> Hi Rami,
>> Thanks for the response.
>>
>> I used "echo 0 > /proc/sys/kernel/randomize_va_space "
>> for disabling ASLR, but the error still exists.
>> I checked with testpmd as primary process and the same error occurs when
>> I restart the pdump tool.
>>
>> I'm just trying to debug the segmentation fault in rte_eth_tx_burst
>> function for now.
>>
>>
>>
>> Thanks & Regards,
>> Rajesh kumar S R
>>
>> On Mon, Mar 19, 2018 at 12:39 AM, Rosen, Rami <rami.rosen at intel.com>
>> wrote:
>>
>>> Hi Rajesh,
>>>
>>>
>>>
>>> First notice the warning you get:
>>>
>>> >EAL: WARNING: Address Space Layout Randomization (ASLR) is enabled in
>>> the kernel.
>>>
>>> >EAL:    This may cause issues with mapping memory into secondary
>>> processes
>>>
>>>
>>>
>>> So first, I would suggest that you will disable ASLR and try again, to
>>> verify it is not the cause. IIRC, in the past I had some issue with
>>> dpdk-pdump when ASLR was not enabled (but IIRC the issue itself was
>>> different – probably not getting packets written to the pcap file or
>>> something else).
>>>
>>> Disabling ASLR is done easily:
>>>
>>> Probably you have : cat   /proc/sys/kernel/randomize_va_space
>>>
>>> gives “2”
>>>
>>> To disable it run
>>>
>>> echo 0 > /proc/sys/kernel/randomize_va_space
>>>
>>>
>>>
>>> And also I would suggest to try it with testpmd. I actually ran it today
>>> with
>>>
>>> DPDK 18.02 with testpmd, and it worked for me. I sent L2 traffic with
>>> scapy, and captured only RX packets with rx-dev; any other L2 packet
>>> generator should be OK.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Rami Rosen
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* RAJESH KUMAR S.R [mailto:rajuuu1992 at gmail.com]
>>> *Sent:* Sunday, March 18, 2018 19:52
>>> *To:* Rosen, Rami <rami.rosen at intel.com>; users at dpdk.org
>>> *Subject:* Re: [dpdk-users] [DPDK-PDUMP] Issue: RING: Cannot reserve
>>> memory
>>>
>>>
>>>
>>>
>>>
>>> Hi Rami,
>>>
>>> Thanks for the response.
>>>
>>> DPDK version I'm using is 18.02.0.
>>>
>>> The primary process I'm using is custom program which uses dpdk.
>>>
>>> I have initialized pdump in the primary process using
>>> "rte_pdump_init(NULL)" similar to that in testpmd example.
>>>
>>> I'm using the pdump tool without making any changes.
>>>
>>>
>>>
>>> I enabled the following flags
>>> 1. CONFIG_RTE_LIBRTE_PDUMP=y
>>> 2. CONFIG_RTE_LIBRTE_PMD_PCAP=y
>>>
>>> I added the libpcap library
>>>
>>> The problem I'm facing is bit different from what I mentioned earlier.
>>>
>>> The first time when I run, I'm able to see the packets getting saved in
>>> the output pcap file, rx.pcap and tx.pcap.
>>>
>>> The second time when I run, I'm getting a segmentation fault as I have
>>> pasted the output in the previous message.
>>>
>>> I'm using gdb for debugging, the functions rte_eal_init(),
>>> create_mp_ring_vdev();
>>>         enable_pdump();  are working fine.
>>>
>>>
>>>
>>> The segmentation fault is occuring in rte_eth_tx_burst
>>> function(dump_packets ->pdump_rxtx-->rte_eth_tx_burst)
>>>
>>> The rings tx_ring_0 and rx_ring_0 are not getting freed. I checked that
>>> with rte_memzone_dump().  So, next time when I start the pdump tool I was
>>> getting " memzone <RG_rx_ring_0> already exists .  RING: Cannot reserve
>>> memory" error.
>>>
>>> So, issue is in rte_eth_tx_burst function, where the
>>> "dev->data->tx_queues[queue_id]" is NULL.
>>>
>>>
>>> "
>>> EAL: Detected lcore 0 as core 0 on socket 0
>>> EAL: Detected lcore 1 as core 0 on socket 0
>>> EAL: Support maximum 128 logical core(s) by configuration.
>>> EAL: Detected 2 lcore(s)
>>> EAL: Module /sys/module/vfio_pci not found! error 2 (No such file or
>>> directory)
>>> EAL: VFIO PCI modules not loaded
>>> EAL: Multi-process socket /var/run/.rte_unix_1817_14c6748457c5
>>> [New Thread 0x7ffff67d4700 (LWP 1821)]
>>> EAL: Probing VFIO support...
>>> EAL: Module /sys/module/vfio not found! error 2 (No such file or
>>> directory)
>>> EAL: VFIO modules not loaded, skipping VFIO support...
>>> EAL: Module /sys/module/vfio not found! error 2 (No such file or
>>> directory)
>>> EAL: Setting up physically contiguous memory...
>>> EAL: WARNING: Address Space Layout Randomization (ASLR) is enabled in
>>> the kernel.
>>> EAL:    This may cause issues with mapping memory into secondary
>>> processes
>>> EAL: Analysing 300 files
>>> EAL: Mapped segment 0 of size 0x400000
>>> EAL: Mapped segment 1 of size 0x200000
>>> EAL: Mapped segment 2 of size 0x200000
>>> EAL: Mapped segment 3 of size 0x200000
>>> EAL: Mapped segment 4 of size 0x24c00000
>>> EAL: Mapped segment 5 of size 0x200000
>>> EAL: TSC frequency is ~2904192 KHz
>>> EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using
>>> unreliable clock cycles !
>>> EAL: Master lcore 0 is ready (tid=f7fec8c0;cpuset=[0])
>>> [New Thread 0x7ffff5fd3700 (LWP 1822)]
>>> EAL: PCI device 0000:00:03.0 on NUMA socket -1
>>> EAL:   Invalid NUMA socket, default to 0
>>> EAL:   probe driver: 1af4:1000 net_virtio
>>> EAL: PCI Port IO found start=0xc060
>>> EAL: PCI device 0000:00:04.0 on NUMA socket -1
>>> EAL:   Invalid NUMA socket, default to 0
>>> EAL:   probe driver: 1af4:1000 net_virtio
>>> EAL: PCI Port IO found start=0xc080
>>> EAL: PCI device 0000:00:05.0 on NUMA socket -1
>>> EAL:   Invalid NUMA socket, default to 0
>>> EAL:   probe driver: 1af4:1000 net_virtio
>>> EAL: Requested device 0000:00:05.0 cannot be used
>>> PMD: Initializing pmd_pcap for net_pcap_rx_0
>>> PMD: Creating pcap-backed ethdev on numa socket -1
>>> PMD: Initializing pmd_pcap for net_pcap_tx_0
>>> PMD: Creating pcap-backed ethdev on numa socket -1
>>>
>>> Thread 1 "dpdk-pdump" received signal SIGSEGV, Segmentation fault.
>>> 0x000000000043ee07 in rte_eth_tx_burst (port_id=4, queue_id=0,
>>> tx_pkts=0x7fffffffe8c0, nb_pkts=1)
>>> "
>>>
>>> I don't know why tx_queues is NULL.
>>>
>>> I'm currently checking where that is getting populated.
>>>
>>>
>>>
>>> Thanks & Regards,
>>>
>>> Rajesh kumar S R
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Mar 17, 2018 at 9:28 PM, Rosen, Rami <rami.rosen at intel.com>
>>> wrote:
>>>
>>> Hi Rajesh,
>>>
>>> Can you please provide more details:
>>>
>>> - Which DPDK version are you using?
>>>
>>> - dpdk-pdump is a secondary process and must be launched along with
>>> any primary process. Which is the primary process you are running ?
>>> is it something that you developed on your own/took something from DPDK
>>> examples (with/without extending it), etc ?
>>>
>>> Regards,
>>> Rami Rosen
>>>
>>>
>>> -----Original Message-----
>>> From: users [mailto:users-bounces at dpdk.org] On Behalf Of RAJESH KUMAR
>>> S.R
>>> Sent: Monday, March 12, 2018 12:25
>>> To: users at dpdk.org
>>> Subject: [dpdk-users] [DPDK-PDUMP] Issue: RING: Cannot reserve memory
>>>
>>> Hi,
>>>
>>> I am new to dpdk.
>>> I'm trying to use dpdk-pdump tool. I'm able to capture packets on dpdk
>>> ports.
>>>
>>> But, I'm facing the following issue
>>> I'm getting a error "RING:Cannot reserve memory" while trying to restart
>>> the pdump tool or running 2 instances of pdump.
>>> I have also used the rte_eal_cleanup while exiting pdump main function.
>>>
>>>
>>> Output:
>>> * sudo /opt/pep/active/bin/dpdk-pdump -- --pdump
>>> 'port=0,queue=*,rx-dev=/tmp/rx.pcap,tx-dev=/tmp/tx.pcap'*
>>>
>>> [sudo] password for admin:
>>> EAL: Detected 2 lcore(s)
>>> EAL: Multi-process socket /var/run/.rte_unix_1061_ce297dfe759
>>> EAL: Probing VFIO support...
>>> EAL: WARNING: Address Space Layout Randomization (ASLR) is enabled in
>>> the kernel.
>>> EAL:    This may cause issues with mapping memory into secondary
>>> processes
>>> EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using
>>> unreliable clock cycles !
>>> EAL: PCI device 0000:00:03.0 on NUMA socket -1
>>> EAL:   Invalid NUMA socket, default to 0
>>> EAL:   probe driver: 1af4:1000 net_virtio
>>> EAL: PCI device 0000:00:04.0 on NUMA socket -1
>>> EAL:   Invalid NUMA socket, default to 0
>>> EAL:   probe driver: 1af4:1000 net_virtio
>>> EAL: PCI device 0000:00:05.0 on NUMA socket -1
>>> EAL:   Invalid NUMA socket, default to 0
>>> EAL:   probe driver: 1af4:1000 net_virtio
>>> EAL: Requested device 0000:00:05.0 cannot be used
>>> PMD: Initializing pmd_pcap for net_pcap_rx_0
>>> PMD: Creating pcap-backed ethdev on numa socket -1 Port 2 MAC: 00 00 00
>>> 01 02 03
>>> PMD: Initializing pmd_pcap for net_pcap_tx_0
>>> PMD: Creating pcap-backed ethdev on numa socket -1 Port 3 MAC: 00 00 00
>>> 01 02 03 ^C
>>>
>>> Signal 2 received, preparing to exit...
>>> ##### PDUMP DEBUG STATS #####
>>>  -packets dequeued:            4
>>>  -packets transmitted to vdev:        4
>>>  -packets freed:            0
>>>
>>>
>>> Restarting........
>>>
>>> *> sudo /opt/pep/active/bin/dpdk-pdump -- --pdump
>>> 'port=0,queue=*,rx-dev=/tmp/rx.pcap,tx-dev=/tmp/tx.pcap'*
>>> EAL: Detected 2 lcore(s)
>>> EAL: Multi-process socket /var/run/.rte_unix_1073_cea9cc51241
>>> EAL: Probing VFIO support...
>>> EAL: WARNING: Address Space Layout Randomization (ASLR) is enabled in
>>> the kernel.
>>> EAL:    This may cause issues with mapping memory into secondary
>>> processes
>>> EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using
>>> unreliable clock cycles !
>>> EAL: PCI device 0000:00:03.0 on NUMA socket -1
>>> EAL:   Invalid NUMA socket, default to 0
>>> EAL:   probe driver: 1af4:1000 net_virtio
>>> EAL: PCI device 0000:00:04.0 on NUMA socket -1
>>> EAL:   Invalid NUMA socket, default to 0
>>> EAL:   probe driver: 1af4:1000 net_virtio
>>> EAL: PCI device 0000:00:05.0 on NUMA socket -1
>>> EAL:   Invalid NUMA socket, default to 0
>>> EAL:   probe driver: 1af4:1000 net_virtio
>>> EAL: Requested device 0000:00:05.0 cannot be used
>>> PMD: Initializing pmd_pcap for net_pcap_rx_0
>>> PMD: Creating pcap-backed ethdev on numa socket -1
>>> PMD: Initializing pmd_pcap for net_pcap_tx_0
>>> PMD: Creating pcap-backed ethdev on numa socket -1 Segmentation fault
>>>
>>> *> sudo /opt/pep/active/bin/dpdk-pdump -- --pdump
>>> 'port=0,queue=*,rx-dev=/tmp/rx.pcap,tx-dev=/tmp/tx.pcap'*
>>>
>>> EAL: Detected 2 lcore(s)
>>> EAL: Multi-process socket /var/run/.rte_unix_1087_cec3ab006b1
>>> EAL: Probing VFIO support...
>>> EAL: WARNING: Address Space Layout Randomization (ASLR) is enabled in
>>> the kernel.
>>> EAL:    This may cause issues with mapping memory into secondary
>>> processes
>>> EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using
>>> unreliable clock cycles !
>>> EAL: PCI device 0000:00:03.0 on NUMA socket -1
>>> EAL:   Invalid NUMA socket, default to 0
>>> EAL:   probe driver: 1af4:1000 net_virtio
>>> EAL: PCI device 0000:00:04.0 on NUMA socket -1
>>> EAL:   Invalid NUMA socket, default to 0
>>> EAL:   probe driver: 1af4:1000 net_virtio
>>> EAL: PCI device 0000:00:05.0 on NUMA socket -1
>>> EAL:   Invalid NUMA socket, default to 0
>>> EAL:   probe driver: 1af4:1000 net_virtio
>>> EAL: Requested device 0000:00:05.0 cannot be used
>>> RING: Cannot reserve memory
>>> EAL: Error - exiting with code: 1
>>>   Cause: File exists:create_mp_ring_vdev:634
>>>
>>>
>>>
>>> Hugepage info:
>>> cat /proc/meminfo | grep Huge
>>> AnonHugePages:      6144 kB
>>> HugePages_Total:     300
>>> HugePages_Free:        0
>>> HugePages_Rsvd:        0
>>> HugePages_Surp:        0
>>> Hugepagesize:       2048 kB
>>>
>>>
>>> Can you please help me in finding the issue.
>>>
>>>
>>>
>>> Thanks,
>>> Rajesh kumar S R
>>>
>>>
>>>
>>
>>
>


More information about the users mailing list