[dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK

Burakov, Anatoly anatoly.burakov at intel.com
Sat Jan 13 15:13:53 CET 2018


On 19-Dec-17 11:14 AM, Anatoly Burakov wrote:
> This patchset introduces a prototype implementation of dynamic memory allocation
> for DPDK. It is intended to start a conversation and build consensus on the best
> way to implement this functionality. The patchset works well enough to pass all
> unit tests, and to work with traffic forwarding, provided the device drivers are
> adjusted to ensure contiguous memory allocation where it matters.
> 
> The vast majority of changes are in the EAL and malloc, the external API
> disruption is minimal: a new set of API's are added for contiguous memory
> allocation (for rte_malloc and rte_memzone), and a few API additions in
> rte_memory. Every other API change is internal to EAL, and all of the memory
> allocation/freeing is handled through rte_malloc, with no externally visible
> API changes, aside from a call to get physmem layout, which no longer makes
> sense given that there are multiple memseg lists.
> 
> Quick outline of all changes done as part of this patchset:
> 
>   * Malloc heap adjusted to handle holes in address space
>   * Single memseg list replaced by multiple expandable memseg lists
>   * VA space for hugepages is preallocated in advance
>   * Added dynamic alloc/free for pages, happening as needed on malloc/free
>   * Added contiguous memory allocation API's for rte_malloc and rte_memzone
>   * Integrated Pawel Wodkowski's patch [1] for registering/unregistering memory
>     with VFIO
> 
> The biggest difference is a "memseg" now represents a single page (as opposed to
> being a big contiguous block of pages). As a consequence, both memzones and
> malloc elements are no longer guaranteed to be physically contiguous, unless
> the user asks for it. To preserve whatever functionality that was dependent
> on previous behavior, a legacy memory option is also provided, however it is
> expected to be temporary solution. The drivers weren't adjusted in this patchset,
> and it is expected that whoever shall test the drivers with this patchset will
> modify their relevant drivers to support the new set of API's. Basic testing
> with forwarding traffic was performed, both with UIO and VFIO, and no performance
> degradation was observed.
> 
> Why multiple memseg lists instead of one? It makes things easier on a number of
> fronts. Since memseg is a single page now, the list will get quite big, and we
> need to locate pages somehow when we allocate and free them. We could of course
> just walk the list and allocate one contiguous chunk of VA space for memsegs,
> but i chose to use separate lists instead, to speed up many operations with the
> list.
> 
> It would be great to see the following discussions within the community regarding
> both current implementation and future work:
> 
>   * Any suggestions to improve current implementation. The whole system with
>     multiple memseg lists is kind of unweildy, so maybe there are better ways to
>     do the same thing. Maybe use a single list after all? We're not expecting
>     malloc/free on hot path, so maybe it doesn't matter that we have to walk
>     the list of potentially thousands of pages?
>   * Pluggable memory allocators. Right now, allocators are hardcoded, but down
>     the line it would be great to have custom allocators (e.g. for externally
>     allocated memory). I've tried to keep the memalloc API minimal and generic
>     enough to be able to easily change it down the line, but suggestions are
>     welcome. Memory drivers, with ops for alloc/free etc.?
>   * Memory tagging. This is related to previous item. Right now, we can only ask
>     malloc to allocate memory by page size, but one could potentially have
>     different memory regions backed by pages of similar sizes (for example,
>     locked 1G pages, to completely avoid TLB misses, alongside regular 1G pages),
>     and it would be good to have that kind of mechanism to distinguish between
>     different memory types available to a DPDK application. One could, for example,
>     tag memory by "purpose" (i.e. "fast", "slow"), or in other ways.
>   * Secondary process implementation, in particular when it comes to allocating/
>     freeing new memory. Current plan is to make use of RPC mechanism proposed by
>     Jianfeng [2] to communicate between primary and secondary processes, however
>     other suggestions are welcome.
>   * Support for non-hugepage memory. This work is planned down the line. Aside
>     from obvious concerns about physical addresses, 4K pages are small and will
>     eat up enormous amounts of memseg list space, so my proposal would be to
>     allocate 4K pages in bigger blocks (say, 2MB).
>   * 32-bit support. Current implementation lacks it, and i don't see a trivial
>     way to make it work if we are to preallocate huge chunks of VA space in
>     advance. We could limit it to 1G per page size, but even that, on multiple
>     sockets, won't work that well, and we can't know in advance what kind of
>     memory user will try to allocate. Drop it? Leave it in legacy mode only?
>   * Preallocation. Right now, malloc will free any and all memory that it can,
>     which could lead to a (perhaps counterintuitive) situation where a user
>     calls DPDK with --socket-mem=1024,1024, does a single "rte_free" and loses
>     all of the preallocated memory in the process. Would preallocating memory
>     *and keeping it no matter what* be a valid use case? E.g. if DPDK was run
>     without any memory requirements specified, grow and shrink as needed, but
>     DPDK was asked to preallocate memory, we can grow but we can't shrink
>     past the preallocated amount?
> 
> Any other feedback about things i didn't think of or missed is greatly
> appreciated.
> 
> [1] http://dpdk.org/dev/patchwork/patch/24484/
> [2] http://dpdk.org/dev/patchwork/patch/31838/
> 
Hi all,

Could this proposal be discussed at the next tech board meeting?

-- 
Thanks,
Anatoly


More information about the dev mailing list