Bug 1213 - [Windows] BSOD when shutting down testpmd
Summary: [Windows] BSOD when shutting down testpmd
Status: CONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: core (show other bugs)
Version: 22.11
Hardware: All Windows
: Normal major
Target Milestone: ---
Assignee: Tyler Retzlaff
URL:
Depends on:
Blocks:
 
Reported: 2023-04-07 16:34 CEST by Antoine Pollenus
Modified: 2023-09-08 06:46 CEST (History)
3 users (show)



Attachments

Description Antoine Pollenus 2023-04-07 16:34:57 CEST
Hello,

When shutting down the testpmd command I get a BSOD (PROCESS_HAS_LOCKED_PAGES):

This testpmd allocate 40 queues RX and 40 queues TX. The total number of mbuf is set to 7000000. This command is supposed to allocate +- 40GB of memory in total for the 80 queues.

.\dpdk-testpmd.exe -l 1-5 -n 4 -a 0000:5e:00.0 -- --socket-num=0 --burst=16 --txd=1024 --rxd=1024 --mbcache=512 --txq=40 --rxq=40 --nb-cores=4 --txpkts=1500 -i --forward-mode=flowgen --eth-peer=0,01:00:5e:00:00:05 --tx-ip=10.10.1.14,239.0.0.5 --tx-udp=42500,42500 --total-num-mbuf=7000000

This testpmd command is working properly until you call quit in the interactive command line. When the application shutdown the machine freeze and show a BSOD (PROCESS_HAS_LOCKED_PAGES).

When analysing the minidump generated, we see that some page are found in the process that have been terminated.

In the bellow comment you'll find the minidump analyze.

Could you try to reproduce it on your side ?

The problem doesn't seem to be linked to some out of memory issues. (The server have a total of 384 GB of ram and should handle easily 40GB of allocation)

SETUP:
CPU : 2x Intel Xeon platinum 8268
RAM: 6x memory channel - 192GB on each numa
OS: Windows Server 2022 21H2
NIC: Intel E810 100G Dual port
DPDK version: 22.11
Comment 1 Antoine Pollenus 2023-04-07 16:35:11 CEST
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

PROCESS_HAS_LOCKED_PAGES (76)
Caused by a driver not cleaning up correctly after an I/O.
Arguments:
Arg1: 0000000000000000, Locked memory pages found in process being terminated.
Arg2: ffff9386c70280c0, Process address.
Arg3: 00000000007c0000, Number of locked pages.
Arg4: 0000000000000000, Pointer to driver stacks (if enabled) or 0 if not.
	Issue a !search over all of physical memory for the current process pointer.
	This will yield at least one MDL which points to it.  Then do another !search
	for each MDL found, this will yield the IRP(s) that point to it, revealing
	which driver is leaking the pages.
	Otherwise, set HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory
	Management\TrackLockedPages to a DWORD 1 value and reboot.  Then the system
	will save stack traces so the guilty driver can be easily identified.
	When you enable this flag, if the driver commits the error again you will
	see a different BugCheck - DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS (0xCB) -
	which can identify the offending driver(s).

Debugging Details:
------------------


KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 2046

    Key  : Analysis.DebugAnalysisManager
    Value: Create

    Key  : Analysis.Elapsed.mSec
    Value: 6691

    Key  : Analysis.IO.Other.Mb
    Value: 19

    Key  : Analysis.IO.Read.Mb
    Value: 0

    Key  : Analysis.IO.Write.Mb
    Value: 29

    Key  : Analysis.Init.CPU.mSec
    Value: 1031

    Key  : Analysis.Init.Elapsed.mSec
    Value: 32443

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 95

    Key  : Bugcheck.Code.DumpHeader
    Value: 0x76

    Key  : Bugcheck.Code.Register
    Value: 0x76

    Key  : WER.OS.Branch
    Value: fe_release_svc_prod2

    Key  : WER.OS.Timestamp
    Value: 2022-07-07T18:32:00Z

    Key  : WER.OS.Version
    Value: 10.0.20348.859


FILE_IN_CAB:  040723-12859-01.dmp

BUGCHECK_P1: 0

BUGCHECK_P2: ffff9386c70280c0

BUGCHECK_P3: 7c0000

BUGCHECK_P4: 0

PROCESS_NAME:  vcs.exe

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXWINLOGON: 1

CUSTOMER_CRASH_COUNT:  1

STACK_TEXT:  
ffffc788`c7f720b8 fffff800`350c441f     : 00000000`00000076 00000000`00000000 ffff9386`c70280c0 00000000`007c0000 : nt!KeBugCheckEx
ffffc788`c7f720c0 fffff800`34f249d1     : ffff9386`c7028508 ffffc788`c7f72180 ffff9b85`4efaf080 ffff9386`c70280c0 : nt!MmDeleteProcessAddressSpace+0x104e97
ffffc788`c7f72110 fffff800`34ea8ba0     : ffff9386`c7028090 ffff9386`c7028090 00000000`00000000 00000000`00000000 : nt!PspProcessDelete+0x171
ffffc788`c7f721b0 fffff800`34aa08e7     : 00000000`00000000 00000000`00000000 ffffc788`c7f72309 ffff9386`c70280c0 : nt!ObpRemoveObjectRoutine+0x80
ffffc788`c7f72210 fffff800`34ea6b8d     : 00000000`00000001 ffff9386`c7028090 ffff9386`c7028090 ffff9386`c24b9900 : nt!ObfDereferenceObjectWithTag+0xc7
ffffc788`c7f72250 fffff800`34ea52f9     : 00000000`00000000 000000eb`0c47f8c8 00000000`00000810 00000000`00000000 : nt!ObpCloseHandle+0x2dd
ffffc788`c7f72370 fffff800`34c31085     : ffff9b85`4efaf080 000000eb`0c47ef30 ffffc788`c7f723b8 ffffffff`f4143e00 : nt!NtClose+0x39
ffffc788`c7f723a0 00007ffb`dfabf654     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25
000000eb`0c47f5d8 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ffb`dfabf654


SYMBOL_NAME:  nt!MmDeleteProcessAddressSpace+104e97

MODULE_NAME: nt

IMAGE_VERSION:  10.0.20348.1607

STACK_COMMAND:  .cxr; .ecxr ; kb

IMAGE_NAME:  ntkrnlmp.exe

BUCKET_ID_FUNC_OFFSET:  104e97

FAILURE_BUCKET_ID:  0x76_vcs.exe_nt!MmDeleteProcessAddressSpace

OS_VERSION:  10.0.20348.859

BUILDLAB_STR:  fe_release_svc_prod2

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {443bfbfd-3528-74d9-5e4f-ad04aa666062}

Followup:     MachineOwner
---------
Comment 2 Tyler Retzlaff 2023-04-07 23:20:37 CEST
hi Antoine,

i will log a bug for this internally, for the bug could you please let me know which git commit ids specifically for the netuio and virt2phys were used to build the drivers? i can't accept any customer crash dumps for privacy reasons and will have to reproduce the issue myself.

i will request a system to diagnose the problem internally. but to set expectations it could be some time before i receive the system and am able to look into the bugcheck.
Comment 3 Antoine Pollenus 2023-04-11 21:48:20 CEST
Hi Tyler, 

Sorry for the late answer and thank you for taking time to look at that issue.

For the commit ID of dpdk-kmods
Short Hash: 4a589f7
Long Hash: 4a589f7bed00fc7009c93d430bd214ac7ad2bb6b

Both virt2phys and netuio have been compiled with the same version.

Thank you in advance for the help and the fix of this bug.

If you need more informations feel free to ask.
Comment 4 Antoine Pollenus 2023-07-18 10:55:25 CEST
Hi Tyler,

Any news on the investigation of that issue ?

regards,

Antoine
Comment 5 Tyler Retzlaff 2023-09-08 06:46:55 CEST
i'm afraid i haven't had time to try and diagnose the failure. we are in the process of integrating and authoring new drivers that would supplant the unsigned open source netuio and virt2phys drivers.

we haven't forgotten, i'm sorry for the delay.

Note You need to log in before you can comment on or make changes to this bug.