[dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler
Hemant Agrawal
hemant.agrawal at nxp.com
Fri Jan 13 11:36:20 CET 2017
On 11/30/2016 11:46 PM, Cristian Dumitrescu wrote:
> This RFC proposes an ethdev-based abstraction layer for Quality of Service (QoS)
> hierarchical scheduler. The goal of the abstraction layer is to provide a simple
> generic API that is agnostic of the underlying HW, SW or mixed HW-SW complex
> implementation.
>
> Q1: What is the benefit for having an abstraction layer for QoS hierarchical
> layer?
> A1: There is growing interest in the industry for handling various HW-based,
> SW-based or mixed hierarchical scheduler implementations using a unified DPDK
> API.
>
> Q2: Which devices are targeted by this abstraction layer?
> A2: All current and future devices that expose a hierarchical scheduler feature
> under DPDK, including NICs, FPGAs, ASICs, SOCs, SW libraries.
>
> Q3: Which scheduler hierarchies are supported by the API?
> A3: Hopefully any scheduler hierarchy can be described and covered by the
> current API. Of course, functional correctness, accuracy and performance levels
> depend on the specific implementations of this API.
>
> Q4: Why have this abstraction layer into ethdev as opposed to a new type of
> device (e.g. scheddev) similar to ethdev, cryptodev, eventdev, etc?
> A4: Packets are sent to the Ethernet device using the ethdev API
> rte_eth_tx_burst() function, with the hierarchical scheduling taking place
> automatically (i.e. no SW intervention) in HW implementations. Basically, the
> hierarchical scheduler is done as part of packet TX operation.
> The hierarchical scheduler is typically the last stage before packet TX and it
> is tightly integrated with the TX stage. The hierarchical scheduler is just
> another offload feature of the Ethernet device, which needs to be accommodated
> by the ethdev API similar to any other offload feature (such as RSS, DCB,
> flow director, etc).
> Once the decision to schedule a specific packet has been taken, this packet
> cannot be dropped and it has to be sent over the wire as is, otherwise what
> takes place on the wire is not what was planned at scheduling time, so the
> scheduling is not accurate (Note: there are some devices which allow prepending
> headers to the packet after the scheduling stage at the expense of sending
> correction requests back to the scheduler, but this only strengthens the bond
> between scheduling and TX).
>
egress QoS can be applied to a physical or a logical network device.
At present the network devices are presented as ethdev in DPDK. Even a
logical device can also be presented by creating a new ethdev. So it
seems to be a good idea to associate it with ethdev.
> Q5: Given that the packet scheduling takes place automatically for pure HW
> implementations, how does packet scheduling take place for poll-mode SW
> implementations?
> A5: The API provided function rte_sched_run() is designed to take care of this.
> For HW implementations, this function typically does nothing. For SW
> implementations, this function is typically expected to perform dequeue of
> packets from the hierarchical scheduler and their write to Ethernet device TX
> queue, periodic flush of any buffers on enqueue-side into the hierarchical
> scheduler for burst-oriented implementations, etc.
>
I think this is *rte_eth_sched_run* in your APIs.
It will be a no-ops for hw, how do you envision it's usages in the
typical software. e.g. in the l3fwd application,
- every time you do a rte_eth_tx_burst - there may be locking concern
here.
- creating a per port thread to continue doing rte_eth_sched_run
- call it in one of the existing polling thread for a port.
> Q6: Which are the scheduling algorithms supported?
> A6: The fundamental scheduling algorithms that are supported are Strict Priority
> (SP) and Weighted Fair Queuing (WFQ). The SP and WFQ algorithms are supported at
> the level of each node of the scheduling hierarchy, regardless of the node
> level/position in the tree. The SP algorithm is used to schedule between sibling
> nodes with different priority, while WFQ is used to schedule between groups of
> siblings that have the same priority.
> Algorithms such as Weighed Round Robin (WRR), byte-level WRR, Deficit WRR
> (DWRR), etc are considered approximations of the ideal WFQ and are therefore
> assimilated to WFQ, although an associated implementation-dependent accuracy,
> performance and resource usage trade-off might exist.
>
> Q7: Which are the supported congestion management algorithms?
> A7: Tail drop, head drop and Weighted Random Early Detection (WRED). They are
> available for every leaf node in the hierarchy, subject to the specific
> implementation supporting them.
>
We may need to introduce some kind capability APIS. e.g. NXP HW do not
support headdrop.
> Q8: Is traffic shaping supported?
> A8: Yes, there are a number of shapers (rate limiters) that can be supported for
> each node in the hierarchy (built-in limit is currently set to 4 per node). Each
> shaper can be private to a node (used only by that node) or shared between
> multiple nodes.
>
What do you mean by supporting 4 shaper per node? if you need more
shapers than create new hierarchy nodes.
Also, similarly if a shaper is to be shared between two nodes, than it
should be in parent node?
Why you want to create shaper hierarchy within a node of hierarchical
QoS.
> Q9: What is the purpose of having shaper profiles and WRED profiles?
> A9: In most implementations, many shapers typically share the same configuration
> parameters, so defining shaper profiles simplifies the configuration task. Same
> considerations apply to WRED contexts and profiles.
>
Agree
> Q10: How is the scheduling hierarchy defined and created?
> A10: Scheduler hierarchy tree is set up by creating new nodes and connecting
> them to other existing nodes, which thus become parent nodes. The unique ID that
> is assigned to each node when the node is created is further used to update the
> node configuration or to connect children nodes to it. The leaf nodes of the
> scheduler hierarchy are each attached to one of the Ethernet device TX queues.
It may be cleaner to differentiate between a leaf (i.e. a qos_queue) and
scheduling node.
> Q11: Are on-the-fly changes of the scheduling hierarchy allowed by the API?
> A11: Yes. The actual changes take place subject to the specific implementation
> supporting them, otherwise error code is returned.
What kind of change are you seeing here? creating new nodes/levels?
reconnecting a node from one parent node to another?
This is more like a implementation capability.
> Q12: What is the typical function call sequence to set up and run the Ethernet
> device scheduler?
> A12: The typical simplified function call sequence is listed below:
> i) Configure the Ethernet device and its TX queues: rte_eth_dev_configure(),
> rte_eth_tx_queue_setup()
> ii) Create WRED profiles and WRED contexts, shaper profiles and shapers:
> rte_eth_sched_wred_profile_add(), rte_eth_sched_wred_context_add(),
> rte_eth_sched_shaper_profile_add(), rte_eth_sched_shaper_add()
> iii) Create the scheduler hierarchy nodes and tree: rte_eth_sched_node_add()
> iv) Freeze the start-up hierarchy and ask the device whether it supports it:
> rte_eth_sched_node_add()
> v) Start the Ethernet port: rte_eth_dev_start()
> vi) Run-time scheduler hierarchy updates: rte_eth_sched_node_add(),
> rte_eth_sched_node_<attribute>_set()
> vii) Run-time packet enqueue into the hierarchical scheduler: rte_eth_tx_burst()
> viii) Run-time support for SW poll-mode implementations (see previous answer):
> rte_sched_run()
>
> Q13: Which are the possible options for the user when the Ethernet port does not
> support the scheduling hierarchy required by the user?
> A13: The following options are available to the user:
> i) abort
> ii) try out a new hierarchy (e.g. with less leaf nodes), if acceptable
> iii) wrap the Ethernet device into a new type of Ethernet device that has a SW
> front-end implementing the hierarchical scheduler (e.g. existing DPDK library
> librte_sched); instantiate the new device type on-the-fly and check if the
> hierarchy requirements can be met by the new device.
>
>
I will like to see some kind of capability APIs upfront.
1. Number of Levels supported
2. Per level capability (capability of each level may be different)
3. - Number of nodes support at a given level
4. - Max Number of input nodes supported
5. - Type of scheduling algo supported (SP, WFQ etc)
6. - Shaper support - Dual Rate
7. - Congestion control
8. - max priorities.
> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu at intel.com>
> ---
> lib/librte_ether/rte_ethdev.h | 794 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 794 insertions(+)
> mode change 100644 => 100755 lib/librte_ether/rte_ethdev.h
>
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> old mode 100644
> new mode 100755
> index 9678179..d4d8604
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -182,6 +182,8 @@ extern "C" {
> #include <rte_pci.h>
> #include <rte_dev.h>
> #include <rte_devargs.h>
> +#include <rte_meter.h>
> +#include <rte_red.h>
> #include "rte_ether.h"
> #include "rte_eth_ctrl.h"
> #include "rte_dev_info.h"
> @@ -1038,6 +1040,152 @@ TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);
> /**< l2 tunnel forwarding mask */
> #define ETH_L2_TUNNEL_FORWARDING_MASK 0x00000008
>
> +/**
> + * Scheduler configuration
> + */
> +
> +/**< Max number of shapers per node */
> +#define RTE_ETH_SCHED_SHAPERS_PER_NODE 4
> +/**< Invalid shaper ID */
> +#define RTE_ETH_SCHED_SHAPER_ID_NONE UINT32_MAX
> +/**< Max number of WRED contexts per node */
> +#define RTE_ETH_SCHED_WRED_CONTEXTS_PER_NODE 4
> +/**< Invalid WRED context ID */
> +#define RTE_ETH_SCHED_WRED_CONTEXT_ID_NONE UINT32_MAX
> +/**< Invalid node ID */
> +#define RTE_ETH_SCHED_NODE_NULL UINT32_MAX
> +
> +/**
> + * Congestion management (CMAN) mode
> + *
> + * This is used for controlling the admission of packets into a packet queue or
> + * group of packet queues on congestion. On request of writing a new packet
> + * into the current queue while the queue is full, the *tail drop* algorithm
> + * drops the new packet while leaving the queue unmodified, as opposed to *head
> + * drop* algorithm, which drops the packet at the head of the queue (the oldest
> + * packet waiting in the queue) and admits the new packet at the tail of the
> + * queue.
> + *
> + * The *Random Early Detection (RED)* algorithm works by proactively dropping
> + * more and more input packets as the queue occupancy builds up. When the queue
> + * is full or almost full, RED effectively works as *tail drop*. The *Weighted
> + * RED* algorithm uses a separate set of RED thresholds per packet color.
> + */
> +enum rte_eth_sched_cman_mode {
> + RTE_ETH_SCHED_CMAN_TAIL_DROP = 0, /**< Tail drop */
> + RTE_ETH_SCHED_CMAN_HEAD_DROP, /**< Head drop */
> + RTE_ETH_SCHED_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
> +};
> +
you may also need parameters whether the cman is byte based or frame based.
> +/**
> + * WRED profile
> + */
> +struct rte_eth_sched_wred_params {
> + /**< One set of RED parameters per packet color */
> + struct rte_red_params red_params[e_RTE_METER_COLORS];
> +};
> +
> +/**
> + * Shaper (rate limiter) profile
> + *
> + * Multiple shaper instances can share the same shaper profile. Each node can
> + * have multiple shapers enabled (up to RTE_ETH_SCHED_SHAPERS_PER_NODE). Each
> + * shaper can be private to a node (only one node using it) or shared (multiple
> + * nodes use the same shaper instance).
> + */
> +struct rte_eth_sched_shaper_params {
> + uint64_t rate; /**< Token bucket rate (bytes per second) */
> + uint64_t size; /**< Token bucket size (bytes) */
> +};
> +
dual rate shaper can be supported here.
I guess by size you mean the max burst size?
> +/**
> + * Node parameters
> + *
> + * Each scheduler hierarchy node has multiple inputs (children nodes of the
> + * current parent node) and a single output (which is input to its parent
> + * node). The current node arbitrates its inputs using Strict Priority (SP)
> + * and Weighted Fair Queuing (WFQ) algorithms to schedule input packets on its
> + * output while observing its shaping/rate limiting constraints. Algorithms
> + * such as Weighted Round Robin (WRR), byte-level WRR, Deficit WRR (DWRR), etc
> + * are considered approximations of the ideal WFQ and are assimilated to WFQ,
> + * although an associated implementation-dependent trade-off on accuracy,
> + * performance and resource usage might exist.
> + *
> + * Children nodes with different priorities are scheduled using the SP
> + * algorithm, based on their priority, with zero (0) as the highest priority.
> + * Children with same priority are scheduled using the WFQ algorithm, based on
> + * their weight, which is relative to the sum of the weights of all siblings
> + * with same priority, with one (1) as the lowest weight.
> + */
> +struct rte_eth_sched_node_params {
> + /**< Child node priority (used by SP). The highest priority is zero. */
> + uint32_t priority;
> + /**< Child node weight (used by WFQ), relative to some of weights of all
> + siblings with same priority). The lowest weight is one. */
> + uint32_t weight;
> + /**< Set of shaper instances enabled for current node. Each node shaper
> + can be disabled by setting it to RTE_ETH_SCHED_SHAPER_ID_NONE. */
> + uint32_t shaper_id[RTE_ETH_SCHED_SHAPERS_PER_NODE];
> + /**< Set to zero if current node is not a hierarchy leaf node, set to a
> + non-zero value otherwise. A leaf node is a hierarchy node that does
> + not have any children. A leaf node has to be connected to a valid
> + packet queue. */
> + int is_leaf;
> + /**< Parameters valid for leaf nodes only */
> + struct {
> + /**< Packet queue ID */
> + uint64_t queue_id;
> + /**< Congestion management mode */
> + enum rte_eth_sched_cman_mode cman;
> + /**< Set of WRED contexts enabled for current leaf node. Each
> + leaf node WRED context can be disabled by setting it to
> + RTE_ETH_SCHED_WRED_CONTEXT_ID_NONE. Only valid when
> + congestion management for current leaf node is set to WRED. */
> + uint32_t wred_context_id[RTE_ETH_SCHED_WRED_CONTEXTS_PER_NODE];
> + } leaf;
> +};
> +
It will be better to separate the leaf i.e. a qos_queue from a sched
node, it will simplify.
e.g.
struct rte_eth_sched_qos_queue{
/**< Child node priority (used by SP). The highest priority is zero. */
uint32_t priority;
/**< Child node weight (used by WFQ), relative to some of weights of
all siblings with same priority). The lowest weight is one. */
uint32_t weight;
/**< Packet queue ID */
uint64_t queue_id;
/**< Congestion management params*/
enum rte_eth_sched_cman_mode cman;
};
struct rte_eth_sched_node_params {
/**< Child node priority (used by SP). The highest priority is zero. */
uint32_t priority;
/**< Child node weight (used by WFQ), relative to some of weights of
all siblings with same priority). The lowest weight is one. */
uint32_t weight;
/**< Set of shaper instances enabled for current node. Each node shaper
can be disabled by setting it to RTE_ETH_SCHED_SHAPER_ID_NONE. */
uint32_t shaper_id;
/**< WRED contexts enabled for current leaf node. Each leaf node WRED
context can be disabled by setting it to
RTE_ETH_SCHED_WRED_CONTEXT_ID_NONE. Only valid when
congestion management for current leaf node is set to WRED. */
uint32_t wred_context_id;
};
sched_qos_queue (s) will be connected to schdule node.
> +/**
> + * Node statistics counter type
> + */
> +enum rte_eth_sched_stats_counter {
> + /**< Number of packets scheduled from current node. */
> + RTE_ETH_SCHED_STATS_COUNTER_N_PKTS = 1<< 0,
> + /**< Number of bytes scheduled from current node. */
> + RTE_ETH_SCHED_STATS_COUNTER_N_BYTES = 1 << 1,
> + RTE_ETH_SCHED_STATS_COUNTER_N_PKTS_DROPPED = 1 << 2,
> + RTE_ETH_SCHED_STATS_COUNTER_N_BYTES_DROPPED = 1 << 3,
> + /**< Number of packets currently waiting in the packet queue of current
> + leaf node. */
> + RTE_ETH_SCHED_STATS_COUNTER_N_PKTS_QUEUED = 1 << 4,
> + /**< Number of bytes currently waiting in the packet queue of current
> + leaf node. */
> + RTE_ETH_SCHED_STATS_COUNTER_N_BYTES_QUEUED = 1 << 5,
> +};
> +
> +/**
> + * Node statistics counters
> + */
> +struct rte_eth_sched_node_stats {
> + /**< Number of packets scheduled from current node. */
> + uint64_t n_pkts;
> + /**< Number of bytes scheduled from current node. */
> + uint64_t n_bytes;
> + /**< Statistics counters for leaf nodes only */
> + struct {
> + /**< Number of packets dropped by current leaf node. */
> + uint64_t n_pkts_dropped;
> + /**< Number of bytes dropped by current leaf node. */
> + uint64_t n_bytes_dropped;
> + /**< Number of packets currently waiting in the packet queue of
> + current leaf node. */
> + uint64_t n_pkts_queued;
> + /**< Number of bytes currently waiting in the packet queue of
> + current leaf node. */
> + uint64_t n_bytes_queued;
> + } leaf;
> +};
> +
> /*
> * Definitions of all functions exported by an Ethernet driver through the
> * the generic structure of type *eth_dev_ops* supplied in the *rte_eth_dev*
> @@ -1421,6 +1569,120 @@ typedef int (*eth_get_dcb_info)(struct rte_eth_dev *dev,
> struct rte_eth_dcb_info *dcb_info);
> /**< @internal Get dcb information on an Ethernet device */
>
> +typedef int (*eth_sched_wred_profile_add_t)(struct rte_eth_dev *dev,
> + uint32_t wred_profile_id,
> + struct rte_eth_sched_wred_params *profile);
> +/**< @internal Scheduler WRED profile add */
> +
> +typedef int (*eth_sched_wred_profile_delete_t)(struct rte_eth_dev *dev,
> + uint32_t wred_profile_id);
> +/**< @internal Scheduler WRED profile delete */
> +
> +typedef int (*eth_sched_wred_context_add_t)(struct rte_eth_dev *dev,
> + uint32_t wred_context_id,
> + uint32_t wred_profile_id);
> +/**< @internal Scheduler WRED context add */
> +
> +typedef int (*eth_sched_wred_context_delete_t)(struct rte_eth_dev *dev,
> + uint32_t wred_context_id);
> +/**< @internal Scheduler WRED context delete */
> +
> +typedef int (*eth_sched_shaper_profile_add_t)(struct rte_eth_dev *dev,
> + uint32_t shaper_profile_id,
> + struct rte_eth_sched_shaper_params *profile);
> +/**< @internal Scheduler shaper profile add */
> +
> +typedef int (*eth_sched_shaper_profile_delete_t)(struct rte_eth_dev *dev,
> + uint32_t shaper_profile_id);
> +/**< @internal Scheduler shaper profile delete */
> +
> +typedef int (*eth_sched_shaper_add_t)(struct rte_eth_dev *dev,
> + uint32_t shaper_id,
> + uint32_t shaper_profile_id);
> +/**< @internal Scheduler shaper instance add */
> +
> +typedef int (*eth_sched_shaper_delete_t)(struct rte_eth_dev *dev,
> + uint32_t shaper_id);
> +/**< @internal Scheduler shaper instance delete */
> +
> +typedef int (*eth_sched_node_add_t)(struct rte_eth_dev *dev,
> + uint32_t node_id,
> + uint32_t parent_node_id,
> + struct rte_eth_sched_node_params *params);
> +/**< @internal Scheduler node add */
> +
> +typedef int (*eth_sched_node_delete_t)(struct rte_eth_dev *dev,
> + uint32_t node_id);
> +/**< @internal Scheduler node delete */
> +
> +typedef int (*eth_sched_hierarchy_set_t)(struct rte_eth_dev *dev,
> + int clear_on_fail);
> +/**< @internal Scheduler hierarchy set */
> +
> +typedef int (*eth_sched_node_priority_set_t)(struct rte_eth_dev *dev,
> + uint32_t node_id,
> + uint32_t priority);
> +/**< @internal Scheduler node priority set */
> +
> +typedef int (*eth_sched_node_weight_set_t)(struct rte_eth_dev *dev,
> + uint32_t node_id,
> + uint32_t weight);
> +/**< @internal Scheduler node weight set */
> +
> +typedef int (*eth_sched_node_shaper_set_t)(struct rte_eth_dev *dev,
> + uint32_t node_id,
> + uint32_t shaper_pos,
> + uint32_t shaper_id);
> +/**< @internal Scheduler node shaper set */
> +
> +typedef int (*eth_sched_node_queue_set_t)(struct rte_eth_dev *dev,
> + uint32_t node_id,
> + uint32_t queue_id);
> +/**< @internal Scheduler node queue set */
> +
> +typedef int (*eth_sched_node_cman_set_t)(struct rte_eth_dev *dev,
> + uint32_t node_id,
> + enum rte_eth_sched_cman_mode cman);
> +/**< @internal Scheduler node congestion management mode set */
> +
> +typedef int (*eth_sched_node_wred_context_set_t)(struct rte_eth_dev *dev,
> + uint32_t node_id,
> + uint32_t wred_context_pos,
> + uint32_t wred_context_id);
> +/**< @internal Scheduler node WRED context set */
> +
> +typedef int (*eth_sched_stats_get_enabled_t)(struct rte_eth_dev *dev,
> + uint64_t *nonleaf_node_capability_stats_mask,
> + uint64_t *nonleaf_node_enabled_stats_mask,
> + uint64_t *leaf_node_capability_stats_mask,
> + uint64_t *leaf_node_enabled_stats_mask);
> +/**< @internal Scheduler get set of stats counters enabled for all nodes */
> +
> +typedef int (*eth_sched_stats_enable_t)(struct rte_eth_dev *dev,
> + uint64_t nonleaf_node_enabled_stats_mask,
> + uint64_t leaf_node_enabled_stats_mask);
> +/**< @internal Scheduler enable selected stats counters for all nodes */
> +
> +typedef int (*eth_sched_node_stats_get_enabled_t)(struct rte_eth_dev *dev,
> + uint32_t node_id,
> + uint64_t *capability_stats_mask,
> + uint64_t *enabled_stats_mask);
> +/**< @internal Scheduler get set of stats counters enabled for specific node */
> +
> +typedef int (*eth_sched_node_stats_enable_t)(struct rte_eth_dev *dev,
> + uint32_t node_id,
> + uint64_t enabled_stats_mask);
> +/**< @internal Scheduler enable selected stats counters for specific node */
> +
> +typedef int (*eth_sched_node_stats_read_t)(struct rte_eth_dev *dev,
> + uint32_t node_id,
> + struct rte_eth_sched_node_stats *stats,
> + int clear);
> +/**< @internal Scheduler read stats counters for specific node */
> +
> +typedef int (*eth_sched_run_t)(struct rte_eth_dev *dev);
> +/**< @internal Scheduler run */
> +
> /**
> * @internal A structure containing the functions exported by an Ethernet driver.
> */
> @@ -1547,6 +1809,53 @@ struct eth_dev_ops {
> eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
> /** Enable/disable l2 tunnel offload functions */
> eth_l2_tunnel_offload_set_t l2_tunnel_offload_set;
> +
> + /** Scheduler WRED profile add */
> + eth_sched_wred_profile_add_t sched_wred_profile_add;
> + /** Scheduler WRED profile delete */
> + eth_sched_wred_profile_delete_t sched_wred_profile_delete;
> + /** Scheduler WRED context add */
> + eth_sched_wred_context_add_t sched_wred_context_add;
> + /** Scheduler WRED context delete */
> + eth_sched_wred_context_delete_t sched_wred_context_delete;
> + /** Scheduler shaper profile add */
> + eth_sched_shaper_profile_add_t sched_shaper_profile_add;
> + /** Scheduler shaper profile delete */
> + eth_sched_shaper_profile_delete_t sched_shaper_profile_delete;
> + /** Scheduler shaper instance add */
> + eth_sched_shaper_add_t sched_shaper_add;
> + /** Scheduler shaper instance delete */
> + eth_sched_shaper_delete_t sched_shaper_delete;
> + /** Scheduler node add */
> + eth_sched_node_add_t sched_node_add;
> + /** Scheduler node delete */
> + eth_sched_node_delete_t sched_node_delete;
> + /** Scheduler hierarchy set */
> + eth_sched_hierarchy_set_t sched_hierarchy_set;
> + /** Scheduler node priority set */
> + eth_sched_node_priority_set_t sched_node_priority_set;
> + /** Scheduler node weight set */
> + eth_sched_node_weight_set_t sched_node_weight_set;
> + /** Scheduler node shaper set */
> + eth_sched_node_shaper_set_t sched_node_shaper_set;
> + /** Scheduler node queue set */
> + eth_sched_node_queue_set_t sched_node_queue_set;
> + /** Scheduler node congestion management mode set */
> + eth_sched_node_cman_set_t sched_node_cman_set;
> + /** Scheduler node WRED context set */
> + eth_sched_node_wred_context_set_t sched_node_wred_context_set;
> + /** Scheduler get statistics counter type enabled for all nodes */
> + eth_sched_stats_get_enabled_t sched_stats_get_enabled;
> + /** Scheduler enable selected statistics counters for all nodes */
> + eth_sched_stats_enable_t sched_stats_enable;
> + /** Scheduler get statistics counter type enabled for current node */
> + eth_sched_node_stats_get_enabled_t sched_node_stats_get_enabled;
> + /** Scheduler enable selected statistics counters for current node */
> + eth_sched_node_stats_enable_t sched_node_stats_enable;
> + /** Scheduler read statistics counters for current node */
> + eth_sched_node_stats_read_t sched_node_stats_read;
> + /** Scheduler run */
> + eth_sched_run_t sched_run;
> };
>
> /**
> @@ -4336,6 +4645,491 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
> uint8_t en);
>
> /**
> + * Scheduler WRED profile add
> + *
> + * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
> + * is used to create one or several WRED contexts.
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param wred_profile_id
> + * WRED profile ID for the new profile. Needs to be unused.
> + * @param profile
> + * WRED profile parameters. Needs to be pre-allocated and valid.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_wred_profile_add(uint8_t port_id,
> + uint32_t wred_profile_id,
> + struct rte_eth_sched_wred_params *profile);
> +
> +/**
> + * Scheduler WRED profile delete
> + *
> + * Delete an existing WRED profile. This operation fails when there is currently
> + * at least one user (i.e. WRED context) of this WRED profile.
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param wred_profile_id
> + * WRED profile ID. Needs to be the valid.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_wred_profile_delete(uint8_t port_id,
> + uint32_t wred_profile_id);
> +
> +/**
> + * Scheduler WRED context add or update
> + *
> + * When *wred_context_id* is invalid, a new WRED context with this ID is created
> + * by using the WRED profile identified by *wred_profile_id*.
> + *
> + * When *wred_context_id* is valid, this WRED context is no longer using the
> + * profile previously assigned to it and is updated to use the profile
> + * identified by *wred_profile_id*.
> + *
> + * A valid WRED context is assigned to one or several scheduler hierarchy leaf
> + * nodes configured to use WRED as the congestion management mode.
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param wred_context_id
> + * WRED context ID
> + * @param wred_profile_id
> + * WRED profile ID. Needs to be the valid.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_wred_context_add(uint8_t port_id,
> + uint32_t wred_context_id,
> + uint32_t wred_profile_id);
> +
> +/**
> + * Scheduler WRED context delete
> + *
> + * Delete an existing WRED context. This operation fails when there is currently
> + * at least one user (i.e. scheduler hierarchy leaf node) of this WRED context.
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param wred_context_id
> + * WRED context ID. Needs to be the valid.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_wred_context_delete(uint8_t port_id,
> + uint32_t wred_context_id);
> +
> +/**
> + * Scheduler shaper profile add
> + *
> + * Create a new shaper profile with ID set to *shaper_profile_id*. The new
> + * shaper profile is used to create one or several shapers.
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param shaper_profile_id
> + * Shaper profile ID for the new profile. Needs to be unused.
> + * @param profile
> + * Shaper profile parameters. Needs to be pre-allocated and valid.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_shaper_profile_add(uint8_t port_id,
> + uint32_t shaper_profile_id,
> + struct rte_eth_sched_shaper_params *profile);
> +
> +/**
> + * Scheduler shaper profile delete
> + *
> + * Delete an existing shaper profile. This operation fails when there is
> + * currently at least one user (i.e. shaper) of this shaper profile.
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param shaper_profile_id
> + * Shaper profile ID. Needs to be the valid.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +/* no users (shapers) using this profile */
> +int rte_eth_sched_shaper_profile_delete(uint8_t port_id,
> + uint32_t shaper_profile_id);
> +
> +/**
> + * Scheduler shaper add or update
> + *
> + * When *shaper_id* is not a valid shaper ID, a new shaper with this ID is
> + * created using the shaper profile identified by *shaper_profile_id*.
> + *
> + * When *shaper_id* is a valid shaper ID, this shaper is no longer using the
> + * shaper profile previously assigned to it and is updated to use the shaper
> + * profile identified by *shaper_profile_id*.
> + *
> + * A valid shaper is assigned to one or several scheduler hierarchy nodes.
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param shaper_id
> + * Shaper ID
> + * @param shaper_profile_id
> + * Shaper profile ID. Needs to be the valid.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_shaper_add(uint8_t port_id,
> + uint32_t shaper_id,
> + uint32_t shaper_profile_id);
> +
> +/**
> + * Scheduler shaper delete
> + *
> + * Delete an existing shaper. This operation fails when there is currently at
> + * least one user (i.e. scheduler hierarchy node) of this shaper.
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param shaper_id
> + * Shaper ID. Needs to be the valid.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_shaper_delete(uint8_t port_id,
> + uint32_t shaper_id);
> +
> +/**
> + * Scheduler node add or remap
> + *
> + * When *node_id* is not a valid node ID, a new node with this ID is created and
> + * connected as child to the existing node identified by *parent_node_id*.
> + *
> + * When *node_id* is a valid node ID, this node is disconnected from its current
> + * parent and connected as child to another existing node identified by
> + * *parent_node_id *.
> + *
> + * This function can be called during port initialization phase (before the
> + * Ethernet port is started) for building the scheduler start-up hierarchy.
> + * Subject to the specific Ethernet port supporting on-the-fly scheduler
> + * hierarchy updates, this function can also be called during run-time (after
> + * the Ethernet port is started).
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param node_id
> + * Node ID
> + * @param parent_node_id
> + * Parent node ID. Needs to be the valid.
> + * @param params
> + * Node parameters. Needs to be pre-allocated and valid.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_node_add(uint8_t port_id,
> + uint32_t node_id,
> + uint32_t parent_node_id,
> + struct rte_eth_sched_node_params *params);
> +
> +/**
> + * Scheduler node delete
> + *
> + * Delete an existing node. This operation fails when this node currently has at
> + * least one user (i.e. child node).
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param node_id
> + * Node ID. Needs to be valid.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_node_delete(uint8_t port_id,
> + uint32_t node_id);
> +
> +/**
> + * Scheduler hierarchy set
> + *
> + * This function is called during the port initialization phase (before the
> + * Ethernet port is started) to freeze the scheduler start-up hierarchy.
> + *
> + * This function fails when the currently configured scheduler hierarchy is not
> + * supported by the Ethernet port, in which case the user can abort or try out
> + * another hierarchy configuration (e.g. a hierarchy with less leaf nodes),
> + * which can be build from scratch (when *clear_on_fail* is enabled) or by
> + * modifying the existing hierarchy configuration (when *clear_on_fail* is
> + * disabled).
> + *
> + * Note that, even when the configured scheduler hierarchy is supported (so this
> + * function is successful), the Ethernet port start might still fail due to e.g.
> + * not enough memory being available in the system, etc.
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param clear_on_fail
> + * On function call failure, hierarchy is cleared when this parameter is
> + * non-zero and preserved when this parameter is equal to zero.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_hierarchy_set(uint8_t port_id,
> + int clear_on_fail);
> +
> +/**
> + * Scheduler node priority set
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param node_id
> + * Node ID. Needs to be valid.
> + * @param priority
> + * Node priority. The highest node priority is zero. Used by the SP algorithm
> + * running on the parent of the current node for scheduling this child node.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_node_priority_set(uint8_t port_id,
> + uint32_t node_id,
> + uint32_t priority);
> +
> +/**
> + * Scheduler node weight set
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param node_id
> + * Node ID. Needs to be valid.
> + * @param weight
> + * Node weight. The node weight is relative to the weight sum of all siblings
> + * that have the same priority. The lowest weight is zero. Used by the WFQ
> + * algorithm running on the parent of the current node for scheduling this
> + * child node.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_node_weight_set(uint8_t port_id,
> + uint32_t node_id,
> + uint32_t weight);
> +
> +/**
> + * Scheduler node shaper set
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param node_id
> + * Node ID. Needs to be valid.
> + * @param shaper_pos
> + * Position in the shaper array of the current node
> + * (0 .. RTE_ETH_SCHED_SHAPERS_PER_NODE-1).
> + * @param shaper_id
> + * Shaper ID. Needs to be either valid shaper ID or set to
> + * RTE_ETH_SCHED_SHAPER_ID_NONE in order to invalidate the shaper on position
> + * *shaper_pos* within the current node.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_node_shaper_set(uint8_t port_id,
> + uint32_t node_id,
> + uint32_t shaper_pos,
> + uint32_t shaper_id);
> +
> +/**
> + * Scheduler node queue set
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param node_id
> + * Node ID. Needs to be valid.
> + * @param queue_id
> + * Queue ID. Needs to be valid.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_node_queue_set(uint8_t port_id,
> + uint32_t node_id,
> + uint32_t queue_id);
> +
> +/**
> + * Scheduler node congestion management mode set
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param node_id
> + * Node ID. Needs to be valid leaf node ID.
> + * @param cman
> + * Congestion management mode.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_node_cman_set(uint8_t port_id,
> + uint32_t node_id,
> + enum rte_eth_sched_cman_mode cman);
> +
> +/**
> + * Scheduler node WRED context set
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param node_id
> + * Node ID. Needs to be valid leaf node ID that has WRED selected as the
> + * congestion management mode.
> + * @param wred_context_pos
> + * Position in the WRED context array of the current leaf node
> + * (0 .. RTE_ETH_SCHED_WRED_CONTEXTS_PER_NODE-1)
> + * @param wred_context_id
> + * WRED context ID. Needs to be either valid WRED context ID or set to
> + * RTE_ETH_SCHED_WRED_CONTEXT_ID_NONE in order to invalidate the WRED context
> + * on position *wred_context_pos* within the current leaf node.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_node_wred_context_set(uint8_t port_id,
> + uint32_t node_id,
> + uint32_t wred_context_pos,
> + uint32_t wred_context_id);
> +
> +/**
> + * Scheduler get statistics counter types enabled for all nodes
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param nonleaf_node_capability_stats_mask
> + * Statistics counter types available per node for all non-leaf nodes. Needs
> + * to be pre-allocated.
> + * @param nonleaf_node_enabled_stats_mask
> + * Statistics counter types currently enabled per node for each non-leaf node.
> + * This is a subset of *nonleaf_node_capability_stats_mask*. Needs to be
> + * pre-allocated.
> + * @param leaf_node_capability_stats_mask
> + * Statistics counter types available per node for all leaf nodes. Needs to
> + * be pre-allocated.
> + * @param leaf_node_enabled_stats_mask
> + * Statistics counter types currently enabled for each leaf node. This is
> + * a subset of *leaf_node_capability_stats_mask*. Needs to be pre-allocated.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_stats_get_enabled(uint8_t port_id,
> + uint64_t *nonleaf_node_capability_stats_mask,
> + uint64_t *nonleaf_node_enabled_stats_mask,
> + uint64_t *leaf_node_capability_stats_mask,
> + uint64_t *leaf_node_enabled_stats_mask);
> +
> +/**
> + * Scheduler enable selected statistics counters for all nodes
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param nonleaf_node_enabled_stats_mask
> + * Statistics counter types to be enabled per node for each non-leaf node.
> + * This needs to be a subset of the statistics counter types available per
> + * node for all non-leaf nodes. Any statistics counter type not included in
> + * this set is to be disabled for all non-leaf nodes.
> + * @param leaf_node_enabled_stats_mask
> + * Statistics counter types to be enabled per node for each leaf node. This
> + * needs to be a subset of the statistics counter types available per node for
> + * all leaf nodes. Any statistics counter type not included in this set is to
> + * be disabled for all leaf nodes.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_stats_enable(uint8_t port_id,
> + uint64_t nonleaf_node_enabled_stats_mask,
> + uint64_t leaf_node_enabled_stats_mask);
> +
> +/**
> + * Scheduler get statistics counter types enabled for current node
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param node_id
> + * Node ID. Needs to be valid.
> + * @param capability_stats_mask
> + * Statistics counter types available for the current node. Needs to be pre-allocated.
> + * @param enabled_stats_mask
> + * Statistics counter types currently enabled for the current node. This is
> + * a subset of *capability_stats_mask*. Needs to be pre-allocated.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_node_stats_get_enabled(uint8_t port_id,
> + uint32_t node_id,
> + uint64_t *capability_stats_mask,
> + uint64_t *enabled_stats_mask);
> +
> +/**
> + * Scheduler enable selected statistics counters for current node
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param node_id
> + * Node ID. Needs to be valid.
> + * @param enabled_stats_mask
> + * Statistics counter types to be enabled for the current node. This needs to
> + * be a subset of the statistics counter types available for the current node.
> + * Any statistics counter type not included in this set is to be disabled for
> + * the current node.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_node_stats_enable(uint8_t port_id,
> + uint32_t node_id,
> + uint64_t enabled_stats_mask);
> +
> +/**
> + * Scheduler node statistics counters read
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param node_id
> + * Node ID. Needs to be valid.
> + * @param stats
> + * When non-NULL, it contains the current value for the statistics counters
> + * enabled for the current node.
> + * @param clear
> + * When this parameter has a non-zero value, the statistics counters are
> + * cleared (i.e. set to zero) immediately after they have been read, otherwise
> + * the statistics counters are left untouched.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +int rte_eth_sched_node_stats_read(uint8_t port_id,
> + uint32_t node_id,
> + struct rte_eth_sched_node_stats *stats,
> + int clear);
> +
> +/**
> + * Scheduler run
> + *
> + * The packet enqueue side of the scheduler hierarchy is typically done through
> + * the Ethernet device TX function. For HW implementations, the packet dequeue
> + * side is typically done by the Ethernet device without any SW intervention,
> + * therefore this functions should not do anything.
> + *
> + * However, for poll-mode SW or mixed HW-SW implementations, the SW intervention
> + * is likely to be required for running the packet dequeue side of the scheduler
> + * hierarchy. Other potential task performed by this function is periodic flush
> + * of any packet enqueue-side buffers used by the burst-mode implementations.
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @return
> + * 0 on success, non-zero error code otherwise.
> + */
> +static inline int
> +rte_eth_sched_run(uint8_t port_id)
> +{
> + struct rte_eth_dev *dev;
> +
> +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> +#endif
> +
> + dev = &rte_eth_devices[port_id];
> +
> + return (dev->dev_ops->sched_run)? dev->dev_ops->sched_run(dev) : 0;
> +}
> +
> +/**
> * Get the port id from pci adrress or device name
> * Ex: 0000:2:00.0 or vdev name net_pcap0
> *
>
More information about the dev
mailing list