[dpdk-dev] [PATCH v3 2/2] ethdev: add hierarchical scheduler API

Dumitrescu, Cristian cristian.dumitrescu at intel.com
Fri Apr 7 19:47:40 CEST 2017


Hi Jerin,

Thanks for your review!

> 
> Thanks Cristian for v3.
> 
> From Cavium HW perceptive, v3 is in relatively good shape to consume it,
> Except the below mentioned two pointers.
> 
> 1) We strongly believes, application explicitly need to give level id, while
> creating topology(i.e rte_tm_node_add()). Reasons are,
> 
> - In the capability side we are exposing nr_levels etc
> - I think, _All_ the HW implementation expects to be connected from
> level-n to leveln-1. IMO, Its better to express that in API.
> - For the SW implementations, which don't care abut the specific level id for
> the
>   connection can ignore the level id passed by the application.
>   Let the definition be "level" aware.
> 

The current API proposal creates a new node and connects it to its parent in a single step, so when a new node is added its level if completely known based on its parent level.

Therefore, specifying the level of the node when the node is added is redundant and therefore not needed. My concern is this requirement can introduce inconsistency into the API, as the user can specify a level ID for the new node that is different than (parent node level ID + 1).

But based on private conversation it looks to me that you guys have a strong opinion on this, so I am taking the action to identify a (nice) way to implement your requirement and do it.

> 2) There are lot of capability in the TM definition. I don't have strong option
> here as TM stuff comes in control path.
> 
> So expect point (1), generally we are fine with V3.
> 

This is good news!

> Detailed comments below,
> 
> > +
> > +#ifndef __INCLUDE_RTE_TM_H__
> > +#define __INCLUDE_RTE_TM_H__
> > +
> > +/**
> > + * @file
> > + * RTE Generic Traffic Manager API
> > + *
> > + * This interface provides the ability to configure the traffic manager in a
> > + * generic way. It includes features such as: hierarchical scheduling,
> > + * traffic shaping, congestion management, packet marking, etc.
> > + */
> 
> Fix missing API documentation doxygen hooks.
> 
> Files: doc/api/doxy-api-index.md and doc/api/doxy-api.conf.
> Ref:
> http://dpdk.org/browse/dpdk/commit/?id=71f2384328651dced05eceee8711
> 9a71f0cf16a7
> 

Thanks, will do.

> 
> > +
> > +#include <stdint.h>
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/** Ethernet framing overhead
> > + *
> > + * Overhead fields per Ethernet frame:
> > + * 1. Preamble:                                            7 bytes;
> > + * 2. Start of Frame Delimiter (SFD):                      1 byte;
> > + * 3. Inter-Frame Gap (IFG):                              12 bytes.
> > + */
> > +#define RTE_TM_ETH_FRAMING_OVERHEAD                  20
> 
> This definition is not used anywhere

This is the typical value to be passed to the shaper profile adjust parameter. Will take the action to document this.

> 
> > +/**
> > + * Color
> > + */
> > +enum rte_tm_color {
> > +	RTE_TM_GREEN = 0, /**< Green */
> 
> Explicit zero assignment is not required.

I think it is needed if we want RTE_TM_COLORS to be equal to the number of colors?

> 
> > +	RTE_TM_YELLOW, /**< Yellow */
> > +	RTE_TM_RED, /**< Red */
> > +	RTE_TM_COLORS /**< Number of colors */
> > +};
> > +
> > +/**
> > + * Node statistics counter type
> > + */
> > +enum rte_tm_stats_type {
> > +	/**< Number of packets scheduled from current node. */
> > +	RTE_TM_STATS_N_PKTS = 1 << 0,
> > +
> > +	RTE_TM_STATS_N_PKTS_QUEUED = 1 << 8,
> > +
> > +	/**< Number of bytes currently waiting in the packet queue of
> current
> > +	 * leaf node.
> > +	 */
> > +	RTE_TM_STATS_N_BYTES_QUEUED = 1 << 9,
> 
> I think, For bitmask, it is better to add ULL.
> example:
>         RTE_TM_STATS_N_BYTES_QUEUED = 1ULL << 9,

The enum fields are uint32_t, not uint64_t; therefore, we cannot use uint64_t constants. This is why all enums in DPDK are using uint32_t values. I am not sure, there might be a way to use uint64_t constants by relying on compiler extensions, by I wanted to keep ourselves out of trouble :).


> > +};
> 
> I think, The above definitions are used as "uint64_t stats_mask" in
> the remaining sections. How about changing to "enum rte_tm_stats_type
> stats_mask"
> instead of uint64_t stats_mask?
> 

The mask is a collection of flags (so multiple enum flags are typically enabled in this mask, e.g. stats_mask = RTE_TM_STATS_N_PKT | RTE_TM_STATS_N_BYTES | ...), therefore it cannot be of the same type as the enum.

I am taking the action to document that stats_mask is built out by using this specific enum flags.

> > +
> > +/**
> > + * Traffic manager dynamic updates
> > + */
> > +enum rte_tm_dynamic_update_type {
> > +	/**< Dynamic parent node update. The new parent node is located
> on same
> > +	 * hierarchy level as the former parent node. Consequently, the
> node
> > +	 * whose parent is changed preserves its hierarchy level.
> > +	 */
> > +	/**< Dynamic update of the set of enabled stats counter types. */
> > +	RTE_TM_UPDATE_NODE_STATS = 1 << 5,
> > +
> > +	/**< Dynamic update of congestion management mode for leaf
> nodes. */
> > +	RTE_TM_UPDATE_NODE_CMAN = 1 << 6,
> > +};
> 
> Same as above comment on enum.

Same as above answer on enum :)

> 
> > +struct rte_tm_level_capabilities {
> 
> IMO, level can be either leaf or nonleaf. If so, following struct makes more
> sense to me
> 
>         int is_leaf;
>         uint32_t n_nodes_max;
>         union {
>                 struct rte_tm_node_capabilities nonleaf;
>                 struct rte_tm_node_capabilities leaf;
>         };
> 

This was the way it was done in previous versions, but Hemant rightly observed that leaf nodes typically have different capabilities as non-leaf nodes, hence the current solution.

> > +	/**< Maximum number of nodes for the current hierarchy level. */
> > +	uint32_t n_nodes_max;
> > +
> > +	/**< Maximum number of non-leaf nodes for the current hierarchy
> level.
> > +	 * The value of 0 indicates that current level only supports leaf
> > +	 * nodes. The maximum value is *n_nodes_max*.
> > +	 */
> > +	uint32_t n_nodes_nonleaf_max;
> > +
> > +	/**< Maximum number of leaf nodes for the current hierarchy level.
> The
> > +	 * value of 0 indicates that current level only supports non-leaf
> > +	 * nodes. The maximum value is *n_nodes_max*.
> > +	 */
> > +	uint32_t n_nodes_leaf_max;
> > +
> > +	/**< Summary of node-level capabilities across all the non-leaf
> nodes
> > +	 * of the current hierarchy level. Valid only when
> > +	 * *n_nodes_nonleaf_max* is greater than 0.
> > +	 */
> > +	struct rte_tm_node_capabilities nonleaf;
> > +
> > +	/**< Summary of node-level capabilities across all the leaf nodes of
> > +	 * the current hierarchy level. Valid only when *n_nodes_leaf_max*
> is
> > +	 * greater than 0.
> > +	 */
> > +	struct rte_tm_node_capabilities leaf;
> > +};
> > +
> > +/**
> > + * Traffic manager capabilities
> > + */
> > +struct rte_tm_capabilities {
> > +	/**< Maximum number of nodes. */
> > +	uint32_t n_nodes_max;
> > +
> > +	/**< Set of supported dynamic update operations
> > +	 * (see enum rte_tm_dynamic_update_type).
> > +	 */
> > +	uint64_t dynamic_update_mask;
> 
> IMO, It is better to change as
> enum rte_tm_dynamic_update_type dynamic_update_mask
> instead of
> uint64_t dynamic_update_mask;
> 

Same answer as for the other enum above (mask is not equal to a single enum value, but a set of enum flags; basically each bit of the mask corresponds to a different enum value).

> > +
> > +	/**< Summary of node-level capabilities across all non-leaf nodes. */
> > +	struct rte_tm_node_capabilities nonleaf;
> > +
> > +	/**< Summary of node-level capabilities across all leaf nodes. */
> > +	struct rte_tm_node_capabilities leaf;
> > +};
> > +
> > +/**
> > + * Congestion management (CMAN) mode
> > + *
> > + * This is used for controlling the admission of packets into a packet queue
> or
> > + * group of packet queues on congestion. On request of writing a new
> packet
> > + * into the current queue while the queue is full, the *tail drop* algorithm
> > + * drops the new packet while leaving the queue unmodified, as opposed
> to *head
> > + * drop* algorithm, which drops the packet at the head of the queue (the
> oldest
> > + * packet waiting in the queue) and admits the new packet at the tail of
> the
> > + * queue.
> > + *
> > + * The *Random Early Detection (RED)* algorithm works by proactively
> dropping
> > + * more and more input packets as the queue occupancy builds up. When
> the queue
> > + * is full or almost full, RED effectively works as *tail drop*. The
> *Weighted
> > + * RED* algorithm uses a separate set of RED thresholds for each packet
> color.
> > + */
> > +enum rte_tm_cman_mode {
> > +	RTE_TM_CMAN_TAIL_DROP = 0, /**< Tail drop */
> 
> explicit zero assignment may not be required

Agree, I don't mind either way, by I see that the most of DPD library code uses explicit initial assignment?

> 
> > +	RTE_TM_CMAN_HEAD_DROP, /**< Head drop */
> > +	RTE_TM_CMAN_WRED, /**< Weighted Random Early Detection
> (WRED) */
> > +};
> > +
> 
> > +struct rte_tm_node_params {
> > +	/**< Shaper profile for the private shaper. The absence of the
> private
> > +	 * shaper for the current node is indicated by setting this parameter
> > +	 * to RTE_TM_SHAPER_PROFILE_ID_NONE.
> > +	 */
> > +	uint32_t shaper_profile_id;
> > +
> > +	/**< User allocated array of valid shared shaper IDs. */
> > +	uint32_t *shared_shaper_id;
> > +
> > +	/**< Number of shared shaper IDs in the *shared_shaper_id* array.
> */
> > +	uint32_t n_shared_shapers;
> > +
> > +	/**< Mask of statistics counter types to be enabled for this node.
> This
> > +	 * needs to be a subset of the statistics counter types available for
> > +	 * the current node. Any statistics counter type not included in this
> > +	 * set is to be disabled for the current node.
> > +	 */
> > +	uint64_t stats_mask;
> 
> How about changing to "enum rte_tm_stats_type" instead of uint64_t ?
> 

Same answer as above ones on enums & masks.

> > +
> > +	union {
> > +		/**< Parameters only valid for non-leaf nodes. */
> > +		struct {
> > +			/**< For each priority, indicates whether the children
> > +			 * nodes sharing the same priority are to be
> scheduled
> > +			 * by WFQ or by WRR. When NULL, it indicates that
> WFQ
> > +			 * is to be used for all priorities. When non-NULL, it
> > +			 * points to a pre-allocated array of *n_priority*
> > +			 * elements, with a non-zero value element
> indicating
> > +			 * WFQ and a zero value element for WRR.
> > +			 */
> > +			int *scheduling_mode_per_priority;
> > +
> > +			/**< Number of priorities. */
> > +			uint32_t n_priorities;
> > +		} nonleaf;
> 
> 
> Since we are adding all node "connecting" parameter in rte_tm_node_add().
> How about adding WFQ vs WRR as boolean value in rte_tm_node_add() to
> maintain
> the consistency

This is not about the parent node managing this new node as one of its children nodes, it is about how this new node will manage its future children nodes, hence the reason to put it here.

> 
> How about new error type in "enum rte_tm_error_type" to specify the
> connection
> error due to requested mode WFQ or WRR not supported.
> 

I think we already have it, it is called: RTE_TM_ERROR_TYPE_NODE_PARAMS_SCHEDULING_MODE.

> > +
> > +/**
> > +/**
> > + * Traffic manager get number of leaf nodes
> > + *
> > + * Each leaf node sits on on top of a TX queue of the current Ethernet
> port.
> > + * Therefore, the set of leaf nodes is predefined, their number is always
> equal
> > + * to N (where N is the number of TX queues configured for the current
> port)
> > + * and their IDs are 0 .. (N-1).
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param n_leaf_nodes
> > + *   Number of leaf nodes for the current port.
> > + * @param error
> > + *   Error details. Filled in only on error, when not NULL.
> > + * @return
> > + *   0 on success, non-zero error code otherwise.
> > + */
> > +int
> > +rte_tm_get_leaf_nodes(uint8_t port_id,
> > +	uint32_t *n_leaf_nodes,
> > +	struct rte_tm_error *error);
> 
> In order to keep consistency with rest of the API, IMO, the API
> name can be changed to rte_tm_leaf_nodes_get()
> 

IMO this is not a node API, it is a port API, hence the attempt to avoid rte_tm_node_XYZ().

Maybe a longer but less confusing name is: rte_tm_get_number_of_leaf_nodes()?

> > +
> > +/**
> > + * Traffic manager node type (i.e. leaf or non-leaf) get
> > + *
> > + * The leaf nodes have predefined IDs in the range of 0 .. (N-1), where N is
> > + * the number of TX queues of the current Ethernet port. The non-leaf
> nodes
> > + * have their IDs generated by the application outside of the above range,
> > + * which is reserved for leaf nodes.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param node_id
> > + *   Node ID value. Needs to be valid.
> > + * @param is_leaf
> 
> Change to "@param[out] is_leaf" to indicate the parameter is output.
> I guess, That scheme is missing in overall header file. It is good to have.
> 

OK, will do for the entire file.

> 
> > + *   Set to non-zero value when node is leaf and to zero otherwise (non-
> leaf).
> > + * @param error
> > + *   Error details. Filled in only on error, when not NULL.
> > + * @return
> > + *   0 on success, non-zero error code otherwise.
> > + */
> > +int
> > +rte_tm_node_type_get(uint8_t port_id,
> > +	uint32_t node_id,
> > +	int *is_leaf,
> > +	struct rte_tm_error *error);
> > +
> > +/**
> > + * Traffic manager capabilities get
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param cap
> 
> missing [out]. See above
> 

OK, will do for the entire file.

> > + *   Traffic manager capabilities. Needs to be pre-allocated and valid.
> > + * @param error
> > + *   Error details. Filled in only on error, when not NULL.
> > + * @return
> > + *   0 on success, non-zero error code otherwise.
> > + */
> > +int
> > +rte_tm_capabilities_get(uint8_t port_id,
> > +	struct rte_tm_capabilities *cap,
> > +	struct rte_tm_error *error);
> > +
> 
> > +int
> > +rte_tm_node_add(uint8_t port_id,
> > +	uint32_t node_id,
> > +	uint32_t parent_node_id,
> > +	uint32_t priority,
> > +	uint32_t weight,
> > +	struct rte_tm_node_params *params,
> > +	struct rte_tm_error *error);
> 
> See the first comment in the beginning of the file.
> 

Noted.

> > +
> > + * Traffic manager node parent update
> > + *
> > + * Restriction for root node: its parent cannot be changed.
> 
> IMO, it is nice to mention correspond "enum rte_tm_dynamic_update_type"
> flag for this API support here. May be in common code itself we can check
> that and
> return error if implementation does not meet the capability.
> 
> Applicable to all update APIs
> 

OK, will do for the entire file.

> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param node_id
> > + *   Node ID. Needs to be valid.
> > + * @param parent_node_id
> > + *   Node ID for the new parent. Needs to be valid.
> > + * @param priority
> > + *   Node priority. The highest node priority is zero. Used by the SP
> algorithm
> > + *   running on the parent of the current node for scheduling this child
> node.
> > + * @param weight
> > + *   Node weight. The node weight is relative to the weight sum of all
> siblings
> > + *   that have the same priority. The lowest weight is zero. Used by the
> > + *   WFQ/WRR algorithm running on the parent of the current node for
> scheduling
> > + *   this child node.
> > + * @param error
> > + *   Error details. Filled in only on error, when not NULL.
> > + * @return
> > + *   0 on success, non-zero error code otherwise.
> > + */
> > +int
> > +rte_tm_node_parent_update(uint8_t port_id,
> > +	uint32_t node_id,
> > +	uint32_t parent_node_id,
> > +	uint32_t priority,
> > +	uint32_t weight,
> > +	struct rte_tm_error *error);
> > +

Regards,
Cristian



More information about the dev mailing list