[dpdk-dev] [PATCH v3 0/5] A means to negotiate delivery of Rx meta data

Thomas Monjalon thomas at monjalon.net
Fri Oct 1 11:48:52 CEST 2021


01/10/2021 10:55, Ivan Malov:
> On 01/10/2021 11:11, Thomas Monjalon wrote:
> > 01/10/2021 08:47, Andrew Rybchenko:
> >> On 9/30/21 10:30 PM, Ivan Malov wrote:
> >>> On 30/09/2021 19:18, Thomas Monjalon wrote:
> >>>> 23/09/2021 13:20, Ivan Malov:
> >>>>> In 2019, commit [1] announced changes in DEV_RX_OFFLOAD namespace
> >>>>> intending to add new flags, RSS_HASH and FLOW_MARK. Since then,
> >>>>> only the former has been added. The problem hasn't been solved.
> >>>>> Applications still assume that no efforts are needed to enable
> >>>>> flow mark and similar meta data delivery.
> >>>>>
> >>>>> The team behind net/sfc driver has to take over the efforts since
> >>>>> the problem has started impacting us. Riverhead, a cutting edge
> >>>>> Xilinx smart NIC family, has two Rx prefix types. Rx meta data
> >>>>> is available only from long Rx prefix. Switching between the
> >>>>> prefix formats can't happen in started state. Hence, we run
> >>>>> into the same problem which [1] was aiming to solve.
> >>>>
> >>>> Sorry I don't understand what is Rx prefix?
> >>>
> >>> A small chunk of per-packet metadata in Rx packet buffer preceding the
> >>> actual packet data. In terms of mbuf, this could be something lying
> >>> before m->data_off.
> > 
> > I've never seen the word "Rx prefix".
> > In general we talk about mbuf headroom and mbuf metadata,
> > the rest being the mbuf payload and mbuf tailroom.
> > I guess you mean mbuf metadata in the space of the struct rte_mbuf?
> 
> In this paragraph I describe the two ways how the NIC itself can provide 
> metadata buffers of different sizes. Hence the term "Rx prefix". As you 
> understand, the NIC HW is unaware of DPDK, mbufs and whatever else SW 
> concepts. To NIC, this is "Rx prefix", that is, a chunk of per-packet 
> metadata *preceding* the actual packet data. It's responsibility of the 
> PMD to treat this the right way, care about headroom, payload and 
> tailroom. I describe the two Rx prefix formats in NIC terminology just 
> to provide the gist of the problem.

OK but it is confusing as it is vendor-specific.
Please stick with DPDK terms if possible.

> >>>>> Rx meta data (mark, flag, tunnel ID) delivery is not an offload
> >>>>> on its own since the corresponding flows must be active to set
> >>>>> the data in the first place. Hence, adding offload flags
> >>>>> similar to RSS_HASH is not a good idea.
> >>>>
> >>>> What means "active" here?
> >>>
> >>> Active = inserted and functional. What this paragraph is trying to say
> >>> is that when you enable, say, RSS_HASH, that implies both computation of
> >>> the hash and the driver's ability to extract in from packets
> >>> ("delivery"). But when it comes to MARK, it's just "delivery". No
> >>> "offload" here: the NIC won't set any mark in packets unless you create
> >>> a flow rule to make it do so. That's the gist of it.
> > 
> > OK
> > Yes I agree RTE_FLOW_ACTION_TYPE_MARK doesn't need any offload flag.
> > Same for RTE_FLOW_ACTION_TYPE_SET_META.
> > 
> >>>>> Patch [1/5] of this series adds a generic API to let applications
> >>>>> negotiate delivery of Rx meta data during initialisation period.
> > 
> > What is a metadata?
> > Do you mean RTE_FLOW_ITEM_TYPE_META and RTE_FLOW_ITEM_TYPE_MARK?
> > Metadata word could cover any field in the mbuf struct so it is vague.
> 
> Metadata here is *any* additional information provided by the NIC for 
> each received packet. For example, Rx flag, Rx mark, RSS hash, packet 
> classification info, you name it. I'd like to stress out that the 
> suggested API comes with flags each of which is crystal clear on what 
> concrete kind of metadata it covers, eg. Rx mark.

I missed the flags.
You mean these 3 flags?

+/** The ethdev sees flagged packets if there are flows with action FLAG. */
+#define RTE_ETH_RX_META_USER_FLAG (UINT64_C(1) << 0)
+
+/** The ethdev sees mark IDs in packets if there are flows with action MARK. */
+#define RTE_ETH_RX_META_USER_MARK (UINT64_C(1) << 1)
+
+/** The ethdev detects missed packets if there are "tunnel_set" flows in use. */
+#define RTE_ETH_RX_META_TUNNEL_ID (UINT64_C(1) << 2)

It is not crystal clear because it does not reference the API,
like RTE_FLOW_ACTION_TYPE_MARK.
And it covers a limited set of metadata.
Do you intend to extend to all mbuf metadata?

> >>>>> This way, an application knows right from the start which parts
> >>>>> of Rx meta data won't be delivered. Hence, no necessity to try
> >>>>> inserting flows requesting such data and handle the failures.
> >>>>
> >>>> Sorry I don't understand the problem you want to solve.
> >>>> And sorry for not noticing earlier.
> >>>
> >>> No worries. *Some* PMDs do not enable delivery of, say, Rx mark with the
> >>> packets by default (for performance reasons). If the application tries
> >>> to insert a flow with action MARK, the PMD may not be able to enable
> >>> delivery of Rx mark without the need to re-start Rx sub-system. And
> >>> that's fraught with traffic disruption and similar bad consequences. In
> >>> order to address it, we need to let the application express its interest
> >>> in receiving mark with packets as early as possible. This way, the PMD
> >>> can enable Rx mark delivery in advance. And, as an additional benefit,
> >>> the application can learn *from the very beginning* whether it will be
> >>> possible to use the feature or not. If this API tells the application
> >>> that no mark delivery will be enabled, then the application can just
> >>> skip many unnecessary attempts to insert wittingly unsupported flows
> >>> during runtime.
> > 
> > I'm puzzled, because we could have the same reasoning for any offload.
> 
> We're not discussing *offloads*. An offload is when NIC *computes 
> something* and *delivers* it. We are discussing precisely *delivery*.

OK but still, there are a lot more mbuf metadata delivered.

> > I don't understand why we are focusing on mark only
> 
> We are not focusing on mark on purpose. It's just how our discussion 
> goes. I chose mark (could've chosen flag or anything else) just to show 
> you an example.
> 
> > I would prefer we find a generic solution using the rte_flow API. > Can we make rte_flow_validate() working before port start?
> > If validating a fake rule doesn't make sense,
> > why not having a new function accepting a single action as parameter?
> 
> A noble idea, but if we feed the entire flow rule to the driver for 
> validation, then the driver must not look specifically for actions FLAG 
> or MARK in it (to enable or disable metadata delivery). This way, the 
> driver is obliged to also validate match criteria, attributes, etc. And, 
> if something is unsupported (say, some specific item), the driver will 
> have to reject the rule as a whole thus leaving the application to join 
> the dots itself.
>
> Say, you ask the driver to validate the following rule:
> pattern blah-blah-1 / blah-blah-2 / end action flag / end
> intending to check support for FLAG delivery. Suppose, the driver 
> doesn't support pattern item "blah-blah-1". It will throw an error right 
> after seeing this unsupported item and won't even go further to see the 
> action FLAG. How can application know whether its request for FLAG was 
> heard or not?

No, I'm proposing a new function to validate the action alone,
without any match etc.
Example:
	rte_flow_action_request(RTE_FLOW_ACTION_TYPE_MARK)


> And I'd not bind delivery of metadata to flow API. Consider the 
> following example. We have a DPDK application sitting at the *host* and 
> we have a *guest* with its *own* DPDK instance. The guest DPDK has asked 
> the NIC (by virtue of flow API) to mark all outgoing packets. This 
> packets reach the *host* DPDK. Say, the host application just wants to 
> see the marked packets from the guest. Its own, (the host's) use of flow 
> API is a don't care here. The host doesn't want to mark packets itself, 
> it wants to see packets marked by the guest.

It does not make sense to me. We are talking about a DPDK API.
My concern is to avoid redefining new flags
while we already have rte_flow actions.





More information about the dev mailing list