[dpdk-dev] [PATCH v6 1/2] mbuf: support attaching external buffer to mbuf

Andrew Rybchenko arybchenko at solarflare.com
Thu Apr 26 18:05:01 CEST 2018


On 04/26/2018 04:10 AM, Yongseok Koh wrote:
> This patch introduces a new way of attaching an external buffer to a mbuf.
>
> Attaching an external buffer is quite similar to mbuf indirection in
> replacing buffer addresses and length of a mbuf, but a few differences:
>    - When an indirect mbuf is attached, refcnt of the direct mbuf would be
>      2 as long as the direct mbuf itself isn't freed after the attachment.
>      In such cases, the buffer area of a direct mbuf must be read-only. But
>      external buffer has its own refcnt and it starts from 1. Unless
>      multiple mbufs are attached to a mbuf having an external buffer, the
>      external buffer is writable.
>    - There's no need to allocate buffer from a mempool. Any buffer can be
>      attached with appropriate free callback.
>    - Smaller metadata is required to maintain shared data such as refcnt.

I'm still unsure how reference counters for external buffer work.

Let's consider the following example:

                         |<-mbuf1 buf_len-->|<-mbuf2 buf_len-->|
+--+-----------+----+--------------+----+--------------+---+- - -
|  global      |head|mbuf1 data    |head|mbuf2 data    | |
|  shinfo=2    |room| |room|              |   |
+--+-----------+----+--------------+----+--------------+---+- - -
                ^ ^
+----------+   |      +----------+ |
| mbuf1    +---+      | mbuf2    +-+
| refcnt=1 |          | refcnt=1 |
+----------+          +----------+

I.e. we have big buffer which is sliced into many small data
buffers referenced from mbufs.

shinfo reference counter is used to control when big buffer
may be freed. But what controls sharing of each block?

headroom and following mbuf data (buf_len) is owned by
corresponding mbuf and the mbuf owner can do everything
with the space (prepend data, append data, modify etc).
I.e. it is read-write in above terminology.

What should happen if mbuf1 is cloned? Right now it will result
in a new mbuf1a with reference counter 1 and incremented shinfo
reference counter. And what does say that corresponding area
is read-only now? It looks like nothing.

As I understand it should be solved using per data area shinfo
which free callback decrements big buffer reference counter.

So, we have two reference counters per each mbuf with external
buffer (plus reference counter per big buffer).
Two reference counters sounds too much and it looks like
mbuf-with-extbuf reference counter is not really used
(since on clone/attach update shinfo refcnt).
It is still two counters to check on free.

Have you considered alternative approach to use mbuf refcnt
as sharing indicator for extbuf data? However, in this case
indirect referencing extbuf would logically look like:

+----------+    +--------+     +--------+
| indirect +--->| extbuf +---->|  data  |
|  mbuf    |    |  mbuf  |     |        |
+----------+    +--------+     +--------+

It looks like it would allow to avoid two reference counters
per data block as above. Personally I'm not sure which approach
is better and would like to hear what you and other reviewers
think about it.

Some minor notes below as well.

> Signed-off-by: Yongseok Koh <yskoh at mellanox.com>
> ---
>
> ** This patch can pass the mbuf_autotest. **
>
> Submitting only non-mlx5 patches to meet deadline for RC1. mlx5 patches
> will be submitted separately rebased on a differnet patchset which
> accommodates new memory hotplug design to mlx PMDs.
>
> v6:
> * rte_pktmbuf_attach_extbuf() doesn't take NULL shinfo. Instead,
>    rte_pktmbuf_ext_shinfo_init_helper() is added.
> * bug fix in rte_pktmbuf_attach() - shinfo wasn't saved to mi.
> * minor changes from review.
>
> v5:
> * rte_pktmbuf_attach_extbuf() sets headroom to 0.
> * if shinfo is provided when attaching, user should initialize it.
> * minor changes from review.
>
> v4:
> * rte_pktmbuf_attach_extbuf() takes new arguments - buf_iova and shinfo.
>    user can pass memory for shared data via shinfo argument.
> * minor changes from review.
>
> v3:
> * implement external buffer attachment instead of introducing buf_off for
>    mbuf indirection.
>
>   lib/librte_mbuf/rte_mbuf.h | 335 +++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 306 insertions(+), 29 deletions(-)
>
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 43aaa9c5f..0a6885281 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h

[...]

> +/**
> + * Shared data at the end of an external buffer.
> + */
> +struct rte_mbuf_ext_shared_info {
> +	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
> +	void *fcb_opaque;                        /**< Free callback argument */
> +	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> +};
> +
>   /**< Maximum number of nb_segs allowed. */
>   #define RTE_MBUF_MAX_NB_SEGS	UINT16_MAX
>   
> @@ -706,14 +728,34 @@ rte_mbuf_to_baddr(struct rte_mbuf *md)
>   }
>   
>   /**
> + * Returns TRUE if given mbuf is cloned by mbuf indirection, or FALSE
> + * otherwise.
> + *
> + * If a mbuf has its data in another mbuf and references it by mbuf
> + * indirection, this mbuf can be defined as a cloned mbuf.
> + */
> +#define RTE_MBUF_CLONED(mb)     ((mb)->ol_flags & IND_ATTACHED_MBUF)
> +
> +/**
>    * Returns TRUE if given mbuf is indirect, or FALSE otherwise.
>    */
> -#define RTE_MBUF_INDIRECT(mb)   ((mb)->ol_flags & IND_ATTACHED_MBUF)
> +#define RTE_MBUF_INDIRECT(mb)   RTE_MBUF_CLONED(mb)

We have discussed that it would be good to deprecate RTE_MBUF_INDIRECT()
since it is not !RTE_MBUF_DIREC(). Is it lost here or intentional (may 
be I've lost
in the thread)?

>   /** Mbuf prefetch */
>   #define RTE_MBUF_PREFETCH_TO_FREE(m) do {       \
>   	if ((m) != NULL)                        \
> @@ -1213,11 +1307,157 @@ static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
>   }
>   
>   /**
> + * Initialize shared data at the end of an external buffer before attaching
> + * to a mbuf by ``rte_pktmbuf_attach_extbuf()``. This is not a mandatory
> + * initialization but a helper function to simply spare a few bytes at the
> + * end of the buffer for shared data. If shared data is allocated
> + * separately, this should not be called but application has to properly
> + * initialize the shared data according to its need.
> + *
> + * Free callback and its argument is saved and the refcnt is set to 1.
> + *
> + * @warning
> + * buf_len must be adjusted to RTE_PTR_DIFF(shinfo, buf_addr) after this
> + * initialization. For example,

May be buf_len should be inout and it should be done by the function?
Just a question since current approach looks fragile.

> + *
> + *   struct rte_mbuf_ext_shared_info *shinfo =
> + *          rte_pktmbuf_ext_shinfo_init_helpfer(buf_addr, buf_len,
> + *                                              free_cb, fcb_arg);
> + *   buf_len = RTE_PTR_DIFF(shinfo, buf_addr);
> + *   rte_pktmbuf_attach_extbuf(m, buf_addr, buf_iova, buf_len, shinfo);
> + *
> + * @param buf_addr
> + *   The pointer to the external buffer.
> + * @param buf_len
> + *   The size of the external buffer. buf_len must be larger than the size
> + *   of ``struct rte_mbuf_ext_shared_info`` and padding for alignment. If
> + *   not enough, this function will return NULL.
> + * @param free_cb
> + *   Free callback function to call when the external buffer needs to be
> + *   freed.
> + * @param fcb_opaque
> + *   Argument for the free callback function.
> + *
> + * @return
> + *   A pointer to the initialized shared data on success, return NULL
> + *   otherwise.
> + */
> +static inline struct rte_mbuf_ext_shared_info *
> +rte_pktmbuf_ext_shinfo_init_helper(void *buf_addr, uint16_t buf_len,
> +	rte_mbuf_extbuf_free_callback_t free_cb, void *fcb_opaque)
> +{
> +	struct rte_mbuf_ext_shared_info *shinfo;
> +	void *buf_end = RTE_PTR_ADD(buf_addr, buf_len);
> +
> +	shinfo = RTE_PTR_ALIGN_FLOOR(RTE_PTR_SUB(buf_end,
> +				sizeof(*shinfo)), sizeof(uintptr_t));
> +	if ((void *)shinfo <= buf_addr)
> +		return NULL;
> +
> +	rte_mbuf_ext_refcnt_set(shinfo, 1);
> +
> +	shinfo->free_cb = free_cb;
> +	shinfo->fcb_opaque = fcb_opaque;

Just a nit, but I'd suggest to initialize in the same order as in the 
struct.
(if there is no reasons why reference counter should be initialized first)

> +
> +	/* buf_len must be adjusted to RTE_PTR_DIFF(shinfo, buf_addr) */
> +	return shinfo;
> +}

[...]


More information about the dev mailing list