[dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file

David Coen d.coen at resi.it
Wed Mar 15 16:48:53 CET 2017


Hi Kai,

I'm sure that it's not necessary to use --base-virtaddr option on the secondary process.

Referring to addresses of your last post, to fully try my method, 
you should set your real primary application with

 --base-virtaddr=0x7ffef5000000

that is the smallest address I can see in your post (see below "Region 5").

I hope this could help you,

David
------------------------------------------------------------------------------------------
Da: Kai Zhang [mailto:kay21s at gmail.com] 
Inviato: mercoledì 15 marzo 2017 05:14
A: Wiles, Keith
Cc: David Coen; Van Haaren, Harry
Oggetto: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file

I have also tried to use the same option --base-virtaddr=0x7fffdc200000 on the secondary process. But it does not help.

Thank you, Keith. I think I can try to figure it out first, if the internal is not too complicated ...

Regards,
Kai


On Wed, Mar 15, 2017 at 11:28 AM, Wiles, Keith <keith.wiles at intel.com> wrote:

> On Mar 15, 2017, at 10:56 AM, Kai Zhang <kay21s at gmail.com> wrote:
>
> Hi David,
>
> I find your method not work for me :-(
>
> The dummy primary application shows the following regions:
> Region 0: virtual address [0x7fffdc200000, 0x7ffff5a00000], physical address 0x59c00000, len 427819008
> Region 1: virtual address [0x7fffdbe00000, 0x7fffdc000000], physical address 0x7b600000, len 2097152
> Region 2: virtual address [0x7fffdba00000, 0x7fffdbc00000], physical address 0xf25800000, len 2097152
> Region 3: virtual address [0x7ffef5800000, 0x7fffdb800000], physical address 0xf25c00000, len 3858759680
> Region 4: virtual address [0x7ffef5400000, 0x7ffef5600000], physical address 0x100f000000, len 2097152
> Region 5: virtual address [0x7ffef5000000, 0x7ffef5200000], physical address 0x1024000000, len 2097152
>
> I set the real primary application with --base-virtaddr=0x7fffdc200000
>
> The error in the secondary process is:
> EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7ffff2bfd000

This one seems like a hardware issue around the PCI device can not be set to the correct. The path above is the device path to the resource0 value in the PCI and the system is having problem mapping the address. The secondary process, does it need to have the same option setting the base address?

Sorry, not much help here as I not able to focus on the problem more because I am off site at a week long meeting.

>
> It seems that they are not accessing the same region.
>
> Regards,
> Kai
>
> On Wed, Mar 15, 2017 at 12:47 AM, David Coen <d.coen at resi.it> wrote:
> Hi Kai, I agree with you.
>
>
>
> Hi have quite the same issue, a primary application and a secondary one running, sometimes, with more than 4 cores.
>
> I'm using DPDK 16.11 on RedHat 6.7.
>
>
>
> Till now I solved in this way:
>
>
>
> - Disabling ASLR by adding those two lines to "/etc/sysctl.conf":
>
>                 # Disable Address Space Layout Randomization (ASLR)  (needed by DPDK)
>
>                 kernel.randomize_va_space = 0
>
>
>
> - Getting virtual address of the first (the one with the minimum address value) memory segment returned from the function "rte_eal_get_physmem_layout ()", called from a "dummy" primary application used only to get this address.
>
> - Passing the above virtual address as a parameter for the "real" primary application using the " --base-virtaddr= " dpdk command line option. When secondary app starts, it all goes well with the specified base address.
>
>
>
> I've tested this solution on different servers and it's always ok.
>
> I think that there is some kind of limitation on DPDK primary/secondary initialization process that could be improved.
>
>
>
> Regards,
>
> David
>
>
>
> -----Messaggio originale-----
>
> Da: Kai Zhang [mailto:kay21s at gmail.com]
>
> Inviato: lunedì 13 marzo 2017 11:59
>
> A: Van Haaren, Harry
>
> Cc: Wiles, Keith; users at dpdk.org
>
> Oggetto: Re: [dpdk-users] Issue with more Cores assigned: Cannot mmap device resource file
>
>
>
> Thank you for your info, Harry.
>
>
>
> Even if the ASLR is the root reason, I don't think DPDK should expect users to disable it to use the primary/secondary model. Is it possible for the DPDK team to check this issue and fix the bug?
>
>
>
> Regards,
>
> Kai
>
>
>
> On Mon, Mar 13, 2017 at 5:58 PM, Van Haaren, Harry < harry.van.haaren at intel.com> wrote:
>
>
>
> > > From: users [mailto:users-bounces at dpdk.org] On Behalf Of Kai Zhang
>
> > > Subject: Re: [dpdk-users] Issue with more Cores assigned: Cannot
>
> > > mmap
>
> > device resource
>
> > > file
>
> > >
>
> > > Yes, my application is somewhat special and should run with the
>
> > > primary/secondary mode. I will search for the way to turn of the
>
> > > random page mapping and try it.
>
> >
>
> >
>
> > You're searching for ASLR, or Address Space Layout Randomization.
>
> >
>
> > Some useful links regarding ASLR, DPDK and Linux;
>
> > http://dpdk.readthedocs.io/en/v16.04/prog_guide/multi_proc_
>
> > support.html#multi-process-limitations
>
> > http://askubuntu.com/questions/318315/how-can-i-temporarily-disable-as
>
> > lr-
>
> > address-space-layout-randomization
>
> > http://dpdk.org/ml/archives/dev/2015-June/019364.html
>
> >
>
> > Please note that ASLR is a security feature of the OS, think twice
>
> > before disabling it.
>
> >
>
> >
>
> > Hope that helps, -Harry
>
> >
>
> >
>
> > > Thanks for your help :)
>
> > >
>
> > > Regards,
>
> > > Kai
>
> > >
>
> > > On Mon, Mar 13, 2017 at 3:24 AM, Wiles, Keith
>
> > > <keith.wiles at intel.com>
>
> > wrote:
>
> > >
>
> > > >
>
> > > > > On Mar 12, 2017, at 6:39 PM, Kai Zhang <kay21s at gmail.com> wrote:
>
> > > > >
>
> > > > >
>
> > > > > Your application may be attaching to the same port for each core.
>
> > > > Normally this means the each core could be allocating memory and
>
> > > > the
>
> > 4th
>
> > > > core just goes over the amount of memory you have reserved.
>
> > > > >
>
> > > > > I don't think so. Because the error is in the rte_eal_init(),
>
> > > > > which
>
> > is
>
> > > > executed in the first line of the main() function. At the time,
>
> > > > the
>
> > other
>
> > > > threads are not even launched.
>
> > > > >
>
> > > > > Is it possible to consider this as a bug in DPDK?
>
> > > >
>
> > > > One more thing, I run Pktgen as two processes all of the time. The
>
> > > > big difference is I do not run in primary and secondary modes. I
>
> > > > run two different instances of pktgen at the same time without
>
> > > > seeing this type problem. If the failure is associated with
>
> > > > primary/secondary
>
> > application
>
> > > > model, then it could be a bug in that code as a lot of syncing up
>
> > between
>
> > > > the two processes needs to be done because of memory/device sharing.
>
> > One
>
> > > > problem with P/S applications is memory needs to be mapped at the
>
> > > > same address between the processes and Linux has the Random memory
>
> > > > mapping builtin for security reasons. I forget the name of the
>
> > > > mode in Linux to turn off the random page mapping and google is not work for me ATM.
>
> > > >
>
> > > > Does your application require running as a primary/secondary
>
> > application?
>
> > > >
>
> > > > >
>
> > > > > Regards,
>
> > > > > Kai
>
> > > > >
>
> > > > >
>
> > > > > >
>
> > > > > > EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:02:
>
> > 00.0/resource0
>
> > > > to address: 0x7fff65bfc000
>
> > > > > > EAL: Error - exiting with code: 1
>
> > > > > >   Cause: Requested device 0000:02:00.0 cannot be used
>
> > > > > >
>
> > > > > > Regards,
>
> > > > > > Kai
>
> > > > > >
>
> > > > > > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <kay21s at gmail.com>
>
> > wrote:
>
> > > > > >
>
> > > > > > Command line:
>
> > > > > > primary:      sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary
>
> > > > > > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4
>
> > > > > > --proc-type=secondary
>
> > > > > >
>
> > > > > > The configurations are as follows:
>
> > > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind
>
> > > > 02:00.0,    2048 x 4k huge page
>
> > > > > > 02:00.0 Ethernet controller: Intel Corporation Ethernet
>
> > > > > > Controller
>
> > > > XL710 for 40GbE QSFP+ (rev 02)   [<<- Only bind this one]
>
> > > > > > 02:00.1 Ethernet controller: Intel Corporation Ethernet
>
> > > > > > Controller
>
> > > > XL710 for 40GbE QSFP+ (rev 02)
>
> > > > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit
>
> > > > > > Network
>
> > > > Connection (rev 03)
>
> > > > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit
>
> > > > > > Network
>
> > > > Connection (rev 03)
>
> > > > > >         Socket 0
>
> > > > > > --------
>
> > > > > > Core 0  [0, 12]
>
> > > > > > Core 1  [1, 13]
>
> > > > > > Core 2  [2, 14]
>
> > > > > > Core 3  [3, 15]
>
> > > > > > Core 4  [4, 16]
>
> > > > > > Core 5  [5, 17]
>
> > > > > > Core 8  [6, 18]
>
> > > > > > Core 9  [7, 19]
>
> > > > > > Core 10 [8, 20]
>
> > > > > > Core 11 [9, 21]
>
> > > > > > Core 12 [10, 22]
>
> > > > > > Core 13 [11, 23]
>
> > > > > >
>
> > > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,
>
> > 2048 x
>
> > > > 4k huge page
>
> > > > > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit
>
> > > > > > Network
>
> > > > Connection (rev 03)
>
> > > > > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit
>
> > > > > > Network
>
> > > > Connection (rev 03)
>
> > > > > >         Socket 0        Socket 1
>
> > > > > >         --------        --------
>
> > > > > > Core 0  [0, 20]         [10, 30]
>
> > > > > > Core 1  [1, 21]         [11, 31]
>
> > > > > > Core 2  [2, 22]         [12, 32]
>
> > > > > > Core 3  [3, 23]         [13, 33]
>
> > > > > > Core 4  [4, 24]         [14, 34]
>
> > > > > > Core 8  [5, 25]         [15, 35]
>
> > > > > > Core 9  [6, 26]         [16, 36]
>
> > > > > > Core 10 [7, 27]         [17, 37]
>
> > > > > > Core 11 [8, 28]         [18, 38]
>
> > > > > > Core 12 [9, 29]         [19, 39]
>
> > > > > >
>
> > > > > > Ah, as machine B does not have a 40GbE, I did not bind any NIC
>
> > > > > > and
>
> > run
>
> > > > my program with locally generated packets. But I am using other
>
> > > > DPDK features, such as memory sharing and message passing. Maybe
>
> > > > that is the reason it works correctly? I can only access machine B
>
> > > > remotely, so I
>
> > am
>
> > > > unable to install a NIC on it. I have another PC that is used as a
>
> > client
>
> > > > that only has four cores, which also cannot be used for verification...
>
> > > > > >
>
> > > > > > Regards,
>
> > > > > > Kai
>
> > > > > >
>
> > > > > >
>
> > > > > > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <
>
> > keith.wiles at intel.com>
>
> > > > wrote:
>
> > > > > >
>
> > > > > > > On Mar 11, 2017, at 9:45 AM, Kai Zhang <kay21s at gmail.com> wrote:
>
> > > > > > >
>
> > > > > > > Hi Keith,
>
> > > > > > >
>
> > > > > > > Thank you for your reply.
>
> > > > > > >
>
> > > > > > > I have tested my program on two machines
>
> > > > > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
>
> > > > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
>
> > > > > > >
>
> > > > > > > I am very sure that the primary process uses different cores
>
> > > > > > > with
>
> > > > the secondary process. The strange thing is that my program works
>
> > correctly
>
> > > > on machine B. But on machine A, the above issue happens with more
>
> > > > than
>
> > 4
>
> > > > cores assigned to the secondary process.
>
> > > > > > >
>
> > > > > > > I have tried to assign cores 1-5  to the secondary process
>
> > > > > > > and
>
> > also
>
> > > > tried other core assignment policies, but the error still happens
>
> > > > rte_eal_init() with more than 4 cores.
>
> > > > > >
>
> > > > > > It would be nice to see both command lines. I am not sure I
>
> > > > > > can
>
> > help
>
> > > > more all I can do is suggest some ideas to look at.
>
> > > > > >
>
> > > > > > Does machine B have the same number and type of NICs? Use
>
> > > > > > ‘lspci |
>
> > > > grep Ethernet’ to get a list of all Ethernet devices on both machines.
>
> > > > > >
>
> > > > > > What is the number of hugepages you have allocated for both
>
> > machines.
>
> > > > > >
>
> > > > > > Also look at the cpu_layout.py script to see why adding the
>
> > > > > > 5th
>
> > core
>
> > > > would be different on the two machines and try to make them the same.
>
> > > > > >
>
> > > > > > >
>
> > > > > > > Regards,
>
> > > > > > > Kai
>
> > > > > > >
>
> > > > > > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <
>
> > > > keith.wiles at intel.com> wrote:
>
> > > > > > >
>
> > > > > > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <kay21s at gmail.com>
>
> > wrote:
>
> > > > > > > >
>
> > > > > > > > Hi, there
>
> > > > > > > >
>
> > > > > > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS
>
> > > > > > > > 7.3.1611
>
> > with
>
> > > > Linux
>
> > > > > > > > kernel version 3.8.0-30.
>
> > > > > > > >
>
> > > > > > > > I have a master process and a secondary process. When I
>
> > > > > > > > run the
>
> > > > secondary
>
> > > > > > > > process with less than or equal to 4 cores, it works correctly.
>
> > > > Such as:
>
> > > > > > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary sudo
>
> > > > > > > > ./program -c 0x0f -n 4 --proc-type=secondary
>
> > > > > > > >
>
> > > > > > > > However, there will be error in the rte_eal_init if I
>
> > > > > > > > assign
>
> > more
>
> > > > than 4
>
> > > > > > > > cores.
>
> > > > > > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
>
> > > > > > > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
>
> > > > > > > >
>
> > > > > > > > EAL: Cannot mmap device resource file
>
> > > > > > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address:
>
> > > > 0x7fff65bfc000
>
> > > > > > > > EAL: Error - exiting with code: 1
>
> > > > > > > >  Cause: Requested device 0000:02:00.0 cannot be used
>
> > > > > > >
>
> > > > > > > I assume you have at least 8 cores. Have you tried -l 1-5 on
>
> > > > > > > the
>
> > > > secondary process.
>
> > > > > > >
>
> > > > > > > You did not show the primary process command line, but the
>
> > > > > > > if you
>
> > > > use 1-5 then you can only give primary process -l 6-7 or two
>
> > > > cores. It
>
> > is
>
> > > > always a reasonable thing is to leave core zero for linux to use.
>
> > > > > > >
>
> > > > > > > Also it could be you ran out of memory or hugepages you
>
> > allocated to
>
> > > > the system.
>
> > > > > > >
>
> > > > > > > >
>
> > > > > > > > Anyone knows why this happens?
>
> > > > > > > >
>
> > > > > > > > Thanks a lot,
>
> > > > > > > > Kai Zhang
>
> > > > > > >
>
> > > > > > > Regards,
>
> > > > > > > Keith
>
> > > > > > >
>
> > > > > > >
>
> > > > > >
>
> > > > > > Regards,
>
> > > > > > Keith
>
> > > > > >
>
> > > > > >
>
> > > > > >
>
> > > > >
>
> > > > > Regards,
>
> > > > > Keith
>
> > > >
>
> > > > Regards,
>
> > > > Keith
>
> > > >
>
> > > >
>
> >
>
>
>
>
>
>
Regards,
Keith




More information about the users mailing list