[dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf

Dumitrescu, Cristian cristian.dumitrescu at intel.com
Thu Jun 5 15:22:56 CEST 2014


Hi,

In order to minimize the number of iterations, Ivan and I had an offline discussion on this.

Ivan is concerned with the way the mbuf cloning/scatter-gather feature is currently implemented in DPDK, and not with this particular patch.

We agreed to take the discussion on cloning/scatter-gather implementation during the DPDK 1.8 time-frame, at this belongs to the mbuf refresh discussion. The mbuf library is not just the format of the mbuf data structure, it also includes all the features associated with the mbuf, as: accessing mbuf fields through get/set methods, buffer chaining, cloning/scatter-gather, enabling HW offloads through mbuf flags and fields, etc.

Regards,
Cristian

-----Original Message-----
From: Ivan Boule [mailto:ivan.boule at 6wind.com] 
Sent: Monday, June 2, 2014 1:24 PM
To: Dumitrescu, Cristian; dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf

Hi Cristian,

I agree with you, the natural way to store application metadata into 
mbufs consists in putting it right after the rte_mbuf data structure.
This can be simply implemented this way:

struct metadata {
     ...
};

struct app_mbuf {
     struct rte_mbuf mbuf;
     struct metadata mtdt;
};

With such a representation, the application initializes the "buf_addr" 
field of each mbuf pointed to by a (struct app_mbuf *)amb pointer as:

     amb->mbuf.buf_addr = ((char *amb) + sizeof(struct app_mbuf));

But such a natural programming approach breaks the assumptions of the 
macros RTE_MBUF_FROM_BADDR, RTE_MBUF_TO_BADDR, RTE_MBUF_INDIRECT, and 
RTE_MBUF_DIRECT.

For instance, in the function  __rte_pktmbuf_prefree_seg(m), after the line
     struct rte_mbuf *md = RTE_MBUF_FROM_BADDR(m->buf_addr);
"md" is always different from "m", and thus systematically (and most of 
the time wrongly) considered as an indirect mbuf.

In the same way, the operations done by the function rte_pktmbuf_detach(m)
     void *buf = RTE_MBUF_TO_BADDR(m);
     m->buf_addr = buf;
does not set buf_addr to its [dafault] correct value.

To summarize, adding metadata after the rte_mbuf data structure is
incompatible with the packet cloning feature behind the [wrongly named]
RTE_MBUF_SCATTER_GATHER configuration option.

By the way, you suggest to use the headroom to also store packet metadata.
But, then, how does an application can store both types of information 
in a given mbuf at the same time, when the actual length of network 
headers in a mbuf is variable, as it depends on the protocol path 
followed by the packet in a networking stack (tunnels, etc)?

Regards,
Ivan


On 05/30/2014 12:28 AM, Dumitrescu, Cristian wrote:
> Hi Ivan,
>
> I hate to disagree with you :), but no, we do not break the scatter-gather feature. We actually implemented the Packet Framework IPv4 fragmentation and reassembly ports to validate exactly this.
>
> Maybe the confusion comes from the RTE_MBUF_TO_BADDR macro. This macro only works (and it is only used) for direct mbufs, so probably the correct name for this macro should be RTE_MBUF_DIRECT_TO_BADDR, as it cannot work (and it is not used) for indirect mbufs.
>
> I am describing the rationale behind meta-data design below, sorry for the long description that looks like a design document ...
>
> Quick recap:
> - mbuf->buf_addr points to the first address where a byte of data for the current segment _can_ be placed
> - direct mbuf: the packet descriptor (mbuf) sits in the same mempool buffer with the packet itself, so mbuf->buf_addr = mbuf + 1;
> - indirect mbuf: the packet descriptor is located in a different mempool buffer than the packet itself, so mbuf->buf_addr != mbuf + 1;
> - Regardless of the mbuf type, it is not necessarily true that the first byte of data is located at mbuf->buf_addr, as we save a headroom (configurable, by default initially of CONFIG_RTE_PKTMBUF_HEADROOM = 128 bytes) at the start of the data buffer (mbuf->buf_addr) for prepending packet headers ( ... or, why not, meta-data!).  The location of the first real data byte is mbuf->pkt.data, which is variable, as opposed to mbuf->buf_addr, which does not change.
>
> Packet meta-data rationale:
> - I think we both agree that we need to be able to store packet meta-data in the packet buffer. The packet buffer regions where meta-data could be stored can only be the in the headroom or in the tailroom, which are both managed by the application and both go up and down during the packet lifetime.
>
> - To me, packet-metadata is part of the packet descriptor: mbuf is the mandatory & standard part of the packet descriptor, while meta-data is the optional & non-standard (per application) part of the packet descriptor. Therefore, it makes sense to put the meta-data immediately after mbuf, but this is not mandatory.
>
> - The zero-size field called meta-data is really just an offset: it points to the first buffer location where meta-data _can_ be placed. The reason for having this field is to provide a standard way to place meta-data in the packet buffer rather that hide it in the Packet Framework libraries and potentially conflict with other mbuf changes in the future. It flags that meta-data should be taken into consideration when any mbuf change is done in  the future.
>
> - For direct mbufs, the same buffer space (the headroom) is shared between packet data (header prepending) and meta-data similar to how the stack and the heap manage the same memory. Of course, since it is managed by the application, it is the responsibility of the application to make sure the packet bytes and the meta-data do not step on one another, but this problem is not at all new, nor introduced by the meta-data field: even currently, the application has to make sure the headroom is dimensioned correctly, so that even in the worst case scenario (application-specific), the packet bytes do not run into the mbuf fields, right?
>
> - For indirect mbufs, the packet meta-data is associated with the indirect mbuf rather than the direct mbuf (the one the indirect mbuf attaches to, as the direct mbuf contains a different packet that has different meta-data), so it makes sense that meta-data of the indirect mbuf is stored in the same buffer as the indirect mbuf (rather than the buffer of the attached direct mbuf). So this means that the buffer size used to store the indirect mbuf is sized appropriately (mbuf size, currently 64 bytes, plus the max size for additional meta-data). This is illustrated in the code of the IPv4 fragmentation port, where for every child packet (output fragment) we copy the parent meta-data in the child buffer (which stores an indirect buffer attached to the direct buffer of the input jumbo, plus now additional meta-data).
>
> - We are also considering having a user_data field in the mbuf itself to point to meta-data in the same buffer or any other buffer. The Packet Framework functionally works with both approaches, but there is a performance problem associated with the mbuf->user_data approach that we are not addressing for this 1.7 release timeframe. The issue is the data dependency that is created, as in order to find out the location of the meta-data, we need to first read the mbuf  and then read meta-data from mbuf->user_data. The cost of the additional memory access is high, due to data dependency preventing efficient prefetch, therefore (at least for the time being) we need to use a compile-time known meta-data offset. The application can fill the meta-data on packet RX and then all additional CPU cores processing the packet can leave of the meta-data for a long time with no need to access the mbuf or the packet (efficiency). For the time being, if somebody needs more yet meta-data in yet another 
buffer, they can add (for their application) a user_data pointer as part of their application meta-data (instead of standard mbuf).
>
> - As said, the mbuf->metadata points to the first location where meta-data _can_ be placed, if somebody needs a different offset, they can add it on top of the mbuf->metadata (e.g. by having a reserved field in their struct app_pkt_metadata). We have demonstrated the use of meta-data in the examples/ip_pipeline sample app (see struct app_pkt_metadata in "main.h").
>
> Please let me know if this makes sense?
>
> Regards,
> Cristian
>
> PS: This is where a white board would help a lot ...
>
>
> -----Original Message-----
> From: Ivan Boule [mailto:ivan.boule at 6wind.com]
> Sent: Wednesday, May 28, 2014 1:03 PM
> To: Dumitrescu, Cristian; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf
>
> Hi Cristian,
>
> Currently, the DPDK framework does not make any assumption on the actual
> layout of a mbuf.
> More precisely, the DPDK does not impose any constraint on the actual
> location of additional metadata, if any, or on the actual location and
> size of its associated payload data buffer.
> This is coherent with the fact that mbuf pools are not created by the
> DPDK itself, but by network applications that are free to choose
> whatever packet mbuf layout that fits their particular needs.
>
> There is one exception to this basic DPDK rule: the mbuf cloning feature
> available through the RTE_MBUF_SCATTER_GATHER configuration option
> assumes that the payload data buffer of the mbuf immediately follows the
> rte_mbuf data structure (see the macros RTE_MBUF_FROM_BADDR,
> RTE_MBUF_TO_BADDR, RTE_MBUF_INDIRECT, and RTE_MBUF_DIRECT in the file
> lib/librte_mbuf/rte_mbuf.h).
>
> The cloning feature prevents to build packet mbufs with their metadata
> located immediately after the rte_mbuf data structure, which is exactly
> what your patch introduces.
>
> At least, a comment that clearly warns the user of this incompatibility
> might be worth adding into both the code and your patch log.
>
> Regards,
> Ivan
>
> On 05/27/2014 07:09 PM, Cristian Dumitrescu wrote:
>> Added zero-size field (offset in data structure) to specify the beginning of packet meta-data in the packet buffer just after the mbuf.
>>
>> The size of the packet meta-data is application specific and the packet meta-data is managed by the application.
>>
>> The packet meta-data should always be accessed through the provided macros.
>>
>> This is used by the Packet Framework libraries (port, table, pipeline).
>>
>> There is absolutely no performance impact due to this mbuf field, as it does not take any space in the mbuf structure (zero-size field).
>>
>> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu at intel.com>
>> ---
>>    lib/librte_mbuf/rte_mbuf.h |   17 +++++++++++++++++
>>    1 files changed, 17 insertions(+), 0 deletions(-)
>>
>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>> index 4a9ab41..bf09618 100644
>> --- a/lib/librte_mbuf/rte_mbuf.h
>> +++ b/lib/librte_mbuf/rte_mbuf.h
>> @@ -201,8 +201,25 @@ struct rte_mbuf {
>>    		struct rte_ctrlmbuf ctrl;
>>    		struct rte_pktmbuf pkt;
>>    	};
>> +	
>> +	union {
>> +		uint8_t metadata[0];
>> +		uint16_t metadata16[0];
>> +		uint32_t metadata32[0];
>> +		uint64_t metadata64[0];
>> +	};
>>    } __rte_cache_aligned;
>>
>> +#define RTE_MBUF_METADATA_UINT8(mbuf, offset)       (mbuf->metadata[offset])
>> +#define RTE_MBUF_METADATA_UINT16(mbuf, offset)      (mbuf->metadata16[offset/sizeof(uint16_t)])
>> +#define RTE_MBUF_METADATA_UINT32(mbuf, offset)      (mbuf->metadata32[offset/sizeof(uint32_t)])
>> +#define RTE_MBUF_METADATA_UINT64(mbuf, offset)      (mbuf->metadata64[offset/sizeof(uint64_t)])
>> +
>> +#define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset)   (&mbuf->metadata[offset])
>> +#define RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset)  (&mbuf->metadata16[offset/sizeof(uint16_t)])
>> +#define RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset)  (&mbuf->metadata32[offset/sizeof(uint32_t)])
>> +#define RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset)  (&mbuf->metadata64[offset/sizeof(uint64_t)])
>> +
>>    /**
>>     * Given the buf_addr returns the pointer to corresponding mbuf.
>>     */
>>
>
>


-- 
Ivan Boule
6WIND Development Engineer
--------------------------------------------------------------
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.




More information about the dev mailing list