[RFC] mem: do not merge elems from different heap allocs

  SPDK uses the rte_mem_event_callback_register API to
create RDMA memory regions (MRs) for newly allocated regions
of memory.  This is used in both the SPDK NVMe-oF target
and the NVMe-oF host driver.

DPDK creates internal malloc_elem structures for these
allocated regions.  As users malloc and free memory, DPDK
will sometimes merge malloc_elems that originated from
different allocations that were notified through the
registered mem_event callback routine.  This results
in subsequent allocations that can span across multiple
RDMA MRs.  This requires SPDK to check each DPDK buffer to
see if it crosses an MR boundary, and if so, would have to
add considerable logic and complexity to describe that
buffer before it can be accessed by the RNIC.  It is somewhat
analagous to rte_malloc returning a buffer that is not
IOVA-contiguous.

As a malloc_elem gets split and some of these elements
get freed, it can also result in DPDK sending an
RTE_MEM_EVENT_FREE notification for a subset of the
original RTE_MEM_EVENT_ALLOC notification.  This is also
problematic for RDMA memory regions, since unregistering
the memory region is all-or-nothing.  It is not possible
to unregister part of a memory region.

SPDK is currently working around this latter issue by
setting the RTE_MEMSEG_FLAG_DO_NOT_FREE on each of the
memory segments associated with an RTE_MEM_EVENT_ALLOC
notification.  But the first issue with merged elements
crossing MR boundaries is still possible and we have
some rather simple allocation patterns that can
trigger it.  So we would like to propose disabling the
merging of malloc_elems that originate from different
RTE_MEM_EVENT_ALLOC events.

This patch demonstrates how this merging could be
avoided.  There are some allocation patterns (especially
frequent multi-huge-page rte_mallocs and rte_frees) that
could result in higher memory usage which need to be
evaluated.  It's possible this behavior would need to be
explicitly enabled through an init flag or a separate API,
or implicitly based on whether any memory registration
callbacks have been registered - implementation of one
of those is deferred pending feedback on this RFC.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
---
 lib/librte_eal/common/malloc_elem.c | 14 ++++++++++----
 lib/librte_eal/common/malloc_elem.h |  6 +++++-
 lib/librte_eal/common/malloc_heap.c |  7 ++++++-
 3 files changed, 21 insertions(+), 6 deletions(-)

Message ID	20181129191330.31792-1-james.r.harris@intel.com (mailing list archive)
State	Superseded, archived
Delegated to:	Thomas Monjalon
Headers	From: Jim Harris <james.r.harris@intel.com> To: dev@dpdk.org Cc: anatoly.burakov@intel.com Date: Thu, 29 Nov 2018 12:13:30 -0700 Message-Id: <20181129191330.31792-1-james.r.harris@intel.com> Subject: [dpdk-dev] [RFC] mem: do not merge elems from different heap allocs Precedence: list Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org>
Series	[RFC] mem: do not merge elems from different heap allocs \| [RFC] mem: do not merge elems from different heap allocs

Context	Check	Description
ci/Intel-compilation	success	Compilation OK
ci/checkpatch	warning	coding style issues

[RFC] mem: do not merge elems from different heap allocs

Checks

Commit Message

Comments

Patch