The SoftNIC PMD is intended to provide SW fall-back options for specific
ethdev APIs in a generic way to the NICs not supporting those features.
Currently, the only implemented ethdev API is Traffic Management (TM),
but other ethdev APIs such as rte_flow, traffic metering & policing, etc
can be easily implemented.
Overview:
* Generic: The SoftNIC PMD works with any "hard" PMD that implements the
ethdev API. It does not change the "hard" PMD in any way.
* Creation: For any given "hard" ethdev port, the user can decide to
create an associated "soft" ethdev port to drive the "hard" port. The
"soft" port is a virtual device that can be created at app start-up
through EAL vdev arg or later through the virtual device API.
* Configuration: The app explicitly decides which features are to be
enabled on the "soft" port and which features are still to be used from
the "hard" port. The app continues to explicitly configure both the
"hard" and the "soft" ports after the creation of the "soft" port.
* RX/TX: The app reads packets from/writes packets to the "soft" port
instead of the "hard" port. The RX and TX queues of the "soft" port are
thread safe, as any ethdev.
* Execution: The "soft" port is a feature-rich NIC implemented by the CPU,
so the run function of the "soft" port has to be executed by the CPU in
order to get packets moving between "hard" port and the app.
* Meets the NFV vision: The app should be (almost) agnostic about the NIC
implementation (different vendors/models, HW-SW mix), the app should not
require changes to use different NICs, the app should use the same API
for all NICs. If a NIC does not implement a specific feature, the HW
should be augmented with SW to meet the functionality while still
preserving the same API.
Traffic Management SW fall-back overview:
* Implements the ethdev traffic management API (rte_tm.h).
* Based on the existing librte_sched DPDK library.
Example: Create "soft" port for "hard" port "0000:04:00.1", enable the TM
feature with default settings:
--vdev 'net_softnic0,hard_name=0000:04:00.1,soft_tm=on'
Q1: Why generic name, if only TM is supported (for now)?
A1: The intention is to have SoftNIC PMD implement many other (all?)
ethdev APIs under a single "ideal" ethdev, hence the generic name.
The initial motivation is TM API, but the mechanism is generic and can
be used for many other ethdev APIs. Somebody looking to provide SW
fall-back for other ethdev API is likely to end up inventing the same,
hence it would be good to consolidate all under a single PMD and have
the user explicitly enable/disable the features it needs for each
"soft" device.
Q2: Are there any performance requirements for SoftNIC?
A2: Yes, performance should be great/decent for every feature, otherwise
the SW fall-back is unusable, thus useless.
Q3: Why not change the "hard" device (and keep a single device) instead of
creating a new "soft" device (and thus having two devices)?
A3: This is not possible with the current librte_ether ethdev
implementation. The ethdev->dev_ops are defined as constant structure,
so it cannot be changed per device (nor per PMD). The new ops also
need memory space to store their context data structures, which
requires updating the ethdev->data->dev_private of the existing
device; at best, maybe a resize of ethdev->data->dev_private could be
done, assuming that librte_ether will introduce a way to find out its
size, but this cannot be done while device is running. Other side
effects might exist, as the changes are very intrusive, plus it likely
needs more changes in librte_ether.
Q4: Why not call the SW fall-back dev_ops directly in librte_ether for
devices which do not support the specific feature? If the device
supports the capability, let's call its dev_ops, otherwise call the
SW fall-back dev_ops.
A4: First, similar reasons to Q&A3. This fixes the need to change
ethdev->dev_ops of the device, but it does not do anything to fix the
other significant issue of where to store the context data structures
needed by the SW fall-back functions (which, in this approach, are
called implicitly by librte_ether).
Second, the SW fall-back options should not be restricted arbitrarily
by the librte_ether library, the decision should belong to the app.
For example, the TM SW fall-back should not be limited to only
librte_sched, which (like any SW fall-back) is limited to a specific
hierarchy and feature set, it cannot do any possible hierarchy. If
alternatives exist, the one to use should be picked by the app, not by
the ethdev layer.
Q5: Why is the app required to continue to configure both the "hard" and
the "soft" devices even after the "soft" device has been created? Why
not hiding the "hard" device under the "soft" device and have the
"soft" device configure the "hard" device under the hood?
A5: This was the approach tried in the V2 of this patch set (overlay
"soft" device taking over the configuration of the underlay "hard"
device) and eventually dropped due to increased complexity of having
to keep the configuration of two distinct devices in sync with
librte_ether implementation that is not friendly towards such
approach. Basically, each ethdev API call for the overlay device
needs to configure the overlay device, invoke the same configuration
with possibly modified parameters for the underlay device, then resume
the configuration of overlay device, turning this into a device
emulation project.
V2 minuses: increased complexity (deal with two devices at same time);
need to implement every ethdev API, even those not needed for the scope
of SW fall-back; intrusive; sometimes have to silently take decisions
that should be left to the app.
V3 pluses: lower complexity (only one device); only need to implement
those APIs that are in scope of the SW fall-back; non-intrusive (deal
with "hard" device through ethdev API); app decisions taken by the app
in an explicit way.
Q6: Why expose the SW fall-back in a PMD and not in a SW library?
A6: The SW fall-back for an ethdev API has to implement that specific
ethdev API, (hence expose an ethdev object through a PMD), as opposed
to providing a different API. This approach allows the app to use the
same API (NFV vision). For example, we already have a library for TM
SW fall-back (librte_sched) that can be called directly by the apps
that need to call it outside of ethdev context (use-cases exist), but
an app that works with TM-aware NICs through the ethdev TM API would
have to be changed significantly in order to work with different
TM-agnostic NICs through the librte_sched API.
Q7: Why have all the SW fall-backs in a single PMD? Why not develop
the SW fall-back for each different ethdev API in a separate PMD, then
create a chain of "soft" devices for each "hard" device? Potentially,
this results in smaller size PMDs that are easier to maintain.
A7: Arguments for single ethdev/PMD and against chain of ethdevs/PMDs:
1. All the existing PMDs for HW NICs implement a lot of features under
the same PMD, so there is no reason for single PMD approach to break
code modularity. See the V3 code, a lot of care has been taken for
code modularity.
2. We should avoid the proliferation of SW PMDs.
3. A single device should be handled by a single PMD.
4. People are used with feature-rich PMDs, not with single-feature
PMDs, so we change of mindset?
5. [Configuration nightmare] A chain of "soft" devices attached to
single "hard" device requires the app to be aware that the N "soft"
devices in the chain plus the "hard" device refer to the same HW
device, and which device should be invoked to configure which
feature. Also the length of the chain and functionality of each
link is different for each HW device. This breaks the requirement
of preserving the same API while working with different NICs (NFV).
This most likely results in a configuration nightmare, nobody is
going to seriously use this.
6. [Feature inter-dependecy] Sometimes different features need to be
configured and executed together (e.g. share the same set of
resources, are inter-dependent, etc), so it is better and more
performant to do them in the same ethdev/PMD.
7. [Code duplication] There is a lot of duplication in the
configuration code for the chain of ethdevs approach. The ethdev
dev_configure, rx_queue_setup, tx_queue_setup API functions have to
be implemented per device, and they become meaningless/inconsistent
with the chain approach.
8. [Data structure duplication] The per device data structures have to
be duplicated and read repeatedly for each "soft" ethdev. The
ethdev device, dev_private, data, per RX/TX queue data structures
have to be replicated per "soft" device. They have to be re-read for
each stage, so the same cache misses are now multiplied with the
number of stages in the chain.
9. [rte_ring proliferation] Thread safety requirements for ethdev
RX/TXqueues require an rte_ring to be used for every RX/TX queue
of each "soft" ethdev. This rte_ring proliferation unnecessarily
increases the memory footprint and lowers performance, especially
when each "soft" ethdev ends up on a different CPU core (ping-pong
of cache lines).
10.[Meta-data proliferation] A chain of ethdevs is likely to result
in proliferation of meta-data that has to be passed between the
ethdevs (e.g. policing needs the output of flow classification),
which results in more cache line ping-pong between cores, hence
performance drops.
Cristian Dumitrescu (4):
Jasvinder Singh (4):
net/softnic: add softnic PMD
net/softnic: add traffic management support
net/softnic: add TM capabilities ops
net/softnic: add TM hierarchy related ops
Jasvinder Singh (1):
app/testpmd: add traffic management forwarding mode
MAINTAINERS | 5 +
app/test-pmd/Makefile | 8 +
app/test-pmd/cmdline.c | 88 +
app/test-pmd/testpmd.c | 15 +
app/test-pmd/testpmd.h | 46 +
app/test-pmd/tm.c | 865 +++++
config/common_base | 5 +
doc/api/doxy-api-index.md | 3 +-
doc/api/doxy-api.conf | 1 +
doc/guides/rel_notes/release_17_11.rst | 6 +
drivers/net/Makefile | 5 +
drivers/net/softnic/Makefile | 57 +
drivers/net/softnic/rte_eth_softnic.c | 852 +++++
drivers/net/softnic/rte_eth_softnic.h | 83 +
drivers/net/softnic/rte_eth_softnic_internals.h | 291 ++
drivers/net/softnic/rte_eth_softnic_tm.c | 3452 ++++++++++++++++++++
.../net/softnic/rte_pmd_eth_softnic_version.map | 7 +
mk/rte.app.mk | 5 +-
18 files changed, 5792 insertions(+), 2 deletions(-)
create mode 100644 app/test-pmd/tm.c
create mode 100644 drivers/net/softnic/Makefile
create mode 100644 drivers/net/softnic/rte_eth_softnic.c
create mode 100644 drivers/net/softnic/rte_eth_softnic.h
create mode 100644 drivers/net/softnic/rte_eth_softnic_internals.h
create mode 100644 drivers/net/softnic/rte_eth_softnic_tm.c
create mode 100644 drivers/net/softnic/rte_pmd_eth_softnic_version.map
Series Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
@@ -47,6 +47,7 @@ LIBABIVER := 1
# all source are stored in SRCS-y
#
SRCS-$(CONFIG_RTE_LIBRTE_PMD_SOFTNIC) += rte_eth_softnic.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_SOFTNIC) += rte_eth_softnic_tm.c
#
# Export include files
@@ -42,6 +42,7 @@
#include <rte_kvargs.h>
#include <rte_errno.h>
#include <rte_ring.h>
+#include <rte_sched.h>
#include "rte_eth_softnic.h"
#include "rte_eth_softnic_internals.h"
@@ -49,10 +50,29 @@
#define PRIV_TO_HARD_DEV(p) \
(&rte_eth_devices[p->hard.port_id])
+#define PMD_PARAM_SOFT_TM "soft_tm"
+#define PMD_PARAM_SOFT_TM_RATE "soft_tm_rate"
+#define PMD_PARAM_SOFT_TM_NB_QUEUES "soft_tm_nb_queues"
+#define PMD_PARAM_SOFT_TM_QSIZE0 "soft_tm_qsize0"
+#define PMD_PARAM_SOFT_TM_QSIZE1 "soft_tm_qsize1"
+#define PMD_PARAM_SOFT_TM_QSIZE2 "soft_tm_qsize2"
+#define PMD_PARAM_SOFT_TM_QSIZE3 "soft_tm_qsize3"
+#define PMD_PARAM_SOFT_TM_ENQ_BSZ "soft_tm_enq_bsz"
+#define PMD_PARAM_SOFT_TM_DEQ_BSZ "soft_tm_deq_bsz"
+
#define PMD_PARAM_HARD_NAME "hard_name"
#define PMD_PARAM_HARD_TX_QUEUE_ID "hard_tx_queue_id"
static const char *pmd_valid_args[] = {
+ PMD_PARAM_SOFT_TM,
+ PMD_PARAM_SOFT_TM_RATE,
+ PMD_PARAM_SOFT_TM_NB_QUEUES,
+ PMD_PARAM_SOFT_TM_QSIZE0,
+ PMD_PARAM_SOFT_TM_QSIZE1,
+ PMD_PARAM_SOFT_TM_QSIZE2,
+ PMD_PARAM_SOFT_TM_QSIZE3,
+ PMD_PARAM_SOFT_TM_ENQ_BSZ,
+ PMD_PARAM_SOFT_TM_DEQ_BSZ,
PMD_PARAM_HARD_NAME,
PMD_PARAM_HARD_TX_QUEUE_ID,
NULL
@@ -157,6 +177,13 @@ pmd_dev_start(struct rte_eth_dev *dev)
{
struct pmd_internals *p = dev->data->dev_private;
+ if (tm_used(dev)) {
+ int status = tm_start(p);
+
+ if (status)
+ return status;
+ }
+
dev->data->dev_link.link_status = ETH_LINK_UP;
if (p->params.soft.intrusive) {
@@ -172,7 +199,12 @@ pmd_dev_start(struct rte_eth_dev *dev)
static void
pmd_dev_stop(struct rte_eth_dev *dev)
{
+ struct pmd_internals *p = dev->data->dev_private;
+
dev->data->dev_link.link_status = ETH_LINK_DOWN;
+
+ if (tm_used(dev))
+ tm_stop(p);
}
static void
@@ -293,6 +325,77 @@ rte_pmd_softnic_run_default(struct rte_eth_dev *dev)
return 0;
}
+static __rte_always_inline int
+rte_pmd_softnic_run_tm(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *p = dev->data->dev_private;
+
+ /* Persistent context: Read Only (update not required) */
+ struct rte_sched_port *sched = p->soft.tm.sched;
+ struct rte_mbuf **pkts_enq = p->soft.tm.pkts_enq;
+ struct rte_mbuf **pkts_deq = p->soft.tm.pkts_deq;
+ uint32_t enq_bsz = p->params.soft.tm.enq_bsz;
+ uint32_t deq_bsz = p->params.soft.tm.deq_bsz;
+ uint16_t nb_tx_queues = dev->data->nb_tx_queues;
+
+ /* Persistent context: Read - Write (update required) */
+ uint32_t txq_pos = p->soft.tm.txq_pos;
+ uint32_t pkts_enq_len = p->soft.tm.pkts_enq_len;
+ uint32_t flush_count = p->soft.tm.flush_count;
+
+ /* Not part of the persistent context */
+ uint32_t pkts_deq_len, pos;
+ uint16_t i;
+
+ /* Soft device TXQ read, TM enqueue */
+ for (i = 0; i < nb_tx_queues; i++) {
+ struct rte_ring *txq = dev->data->tx_queues[txq_pos];
+
+ /* Read TXQ burst to packet enqueue buffer */
+ pkts_enq_len += rte_ring_sc_dequeue_burst(txq,
+ (void **)&pkts_enq[pkts_enq_len],
+ enq_bsz,
+ NULL);
+
+ /* Increment TXQ */
+ txq_pos++;
+ if (txq_pos >= nb_tx_queues)
+ txq_pos = 0;
+
+ /* TM enqueue when complete burst is available */
+ if (pkts_enq_len >= enq_bsz) {
+ rte_sched_port_enqueue(sched, pkts_enq, pkts_enq_len);
+
+ pkts_enq_len = 0;
+ flush_count = 0;
+ break;
+ }
+ }
+
+ if (flush_count >= FLUSH_COUNT_THRESHOLD) {
+ if (pkts_enq_len)
+ rte_sched_port_enqueue(sched, pkts_enq, pkts_enq_len);
+
+ pkts_enq_len = 0;
+ flush_count = 0;
+ }
+
+ p->soft.tm.txq_pos = txq_pos;
+ p->soft.tm.pkts_enq_len = pkts_enq_len;
+ p->soft.tm.flush_count = flush_count + 1;
+
+ /* TM dequeue, Hard device TXQ write */
+ pkts_deq_len = rte_sched_port_dequeue(sched, pkts_deq, deq_bsz);
+
+ for (pos = 0; pos < pkts_deq_len; )
+ pos += rte_eth_tx_burst(p->hard.port_id,
+ p->params.hard.tx_queue_id,
+ &pkts_deq[pos],
+ (uint16_t)(pkts_deq_len - pos));
+
+ return 0;
+}
+
int
rte_pmd_softnic_run(uint8_t port_id)
{
@@ -302,7 +405,9 @@ rte_pmd_softnic_run(uint8_t port_id)
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
#endif
- return rte_pmd_softnic_run_default(dev);
+ return (tm_used(dev)) ?
+ rte_pmd_softnic_run_tm(dev) :
+ rte_pmd_softnic_run_default(dev);
}
static struct ether_addr eth_addr = { .addr_bytes = {0} };
@@ -383,12 +488,25 @@ pmd_init(struct pmd_params *params, int numa_node)
return NULL;
}
+ /* Traffic Management (TM)*/
+ if (params->soft.flags & PMD_FEATURE_TM) {
+ status = tm_init(p, params, numa_node);
+ if (status) {
+ default_free(p);
+ rte_free(p);
+ return NULL;
+ }
+ }
+
return p;
}
static void
pmd_free(struct pmd_internals *p)
{
+ if (p->params.soft.flags & PMD_FEATURE_TM)
+ tm_free(p);
+
default_free(p);
rte_free(p);
@@ -468,7 +586,7 @@ static int
pmd_parse_args(struct pmd_params *p, const char *name, const char *params)
{
struct rte_kvargs *kvlist;
- int ret;
+ int i, ret;
kvlist = rte_kvargs_parse(params, pmd_valid_args);
if (kvlist == NULL)
@@ -478,8 +596,120 @@ pmd_parse_args(struct pmd_params *p, const char *name, const char *params)
memset(p, 0, sizeof(*p));
p->soft.name = name;
p->soft.intrusive = INTRUSIVE;
+ p->soft.tm.rate = 0;
+ p->soft.tm.nb_queues = SOFTNIC_SOFT_TM_NB_QUEUES;
+ for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; i++)
+ p->soft.tm.qsize[i] = SOFTNIC_SOFT_TM_QUEUE_SIZE;
+ p->soft.tm.enq_bsz = SOFTNIC_SOFT_TM_ENQ_BSZ;
+ p->soft.tm.deq_bsz = SOFTNIC_SOFT_TM_DEQ_BSZ;
p->hard.tx_queue_id = SOFTNIC_HARD_TX_QUEUE_ID;
+ /* SOFT: TM (optional) */
+ if (rte_kvargs_count(kvlist, PMD_PARAM_SOFT_TM) == 1) {
+ char *s;
+
+ ret = rte_kvargs_process(kvlist, PMD_PARAM_SOFT_TM,
+ &get_string, &s);
+ if (ret < 0)
+ goto out_free;
+
+ if (strcmp(s, "on") == 0)
+ p->soft.flags |= PMD_FEATURE_TM;
+ else if (strcmp(s, "off") == 0)
+ p->soft.flags &= ~PMD_FEATURE_TM;
+ else
+ goto out_free;
+ }
+
+ /* SOFT: TM rate (measured in bytes/second) (optional) */
+ if (rte_kvargs_count(kvlist, PMD_PARAM_SOFT_TM_RATE) == 1) {
+ ret = rte_kvargs_process(kvlist, PMD_PARAM_SOFT_TM_RATE,
+ &get_uint32, &p->soft.tm.rate);
+ if (ret < 0)
+ goto out_free;
+
+ p->soft.flags |= PMD_FEATURE_TM;
+ }
+
+ /* SOFT: TM number of queues (optional) */
+ if (rte_kvargs_count(kvlist, PMD_PARAM_SOFT_TM_NB_QUEUES) == 1) {
+ ret = rte_kvargs_process(kvlist, PMD_PARAM_SOFT_TM_NB_QUEUES,
+ &get_uint32, &p->soft.tm.nb_queues);
+ if (ret < 0)
+ goto out_free;
+
+ p->soft.flags |= PMD_FEATURE_TM;
+ }
+
+ /* SOFT: TM queue size 0 .. 3 (optional) */
+ if (rte_kvargs_count(kvlist, PMD_PARAM_SOFT_TM_QSIZE0) == 1) {
+ uint32_t qsize;
+
+ ret = rte_kvargs_process(kvlist, PMD_PARAM_SOFT_TM_QSIZE0,
+ &get_uint32, &qsize);
+ if (ret < 0)
+ goto out_free;
+
+ p->soft.tm.qsize[0] = (uint16_t)qsize;
+ p->soft.flags |= PMD_FEATURE_TM;
+ }
+
+ if (rte_kvargs_count(kvlist, PMD_PARAM_SOFT_TM_QSIZE1) == 1) {
+ uint32_t qsize;
+
+ ret = rte_kvargs_process(kvlist, PMD_PARAM_SOFT_TM_QSIZE1,
+ &get_uint32, &qsize);
+ if (ret < 0)
+ goto out_free;
+
+ p->soft.tm.qsize[1] = (uint16_t)qsize;
+ p->soft.flags |= PMD_FEATURE_TM;
+ }
+
+ if (rte_kvargs_count(kvlist, PMD_PARAM_SOFT_TM_QSIZE2) == 1) {
+ uint32_t qsize;
+
+ ret = rte_kvargs_process(kvlist, PMD_PARAM_SOFT_TM_QSIZE2,
+ &get_uint32, &qsize);
+ if (ret < 0)
+ goto out_free;
+
+ p->soft.tm.qsize[2] = (uint16_t)qsize;
+ p->soft.flags |= PMD_FEATURE_TM;
+ }
+
+ if (rte_kvargs_count(kvlist, PMD_PARAM_SOFT_TM_QSIZE3) == 1) {
+ uint32_t qsize;
+
+ ret = rte_kvargs_process(kvlist, PMD_PARAM_SOFT_TM_QSIZE3,
+ &get_uint32, &qsize);
+ if (ret < 0)
+ goto out_free;
+
+ p->soft.tm.qsize[3] = (uint16_t)qsize;
+ p->soft.flags |= PMD_FEATURE_TM;
+ }
+
+ /* SOFT: TM enqueue burst size (optional) */
+ if (rte_kvargs_count(kvlist, PMD_PARAM_SOFT_TM_ENQ_BSZ) == 1) {
+ ret = rte_kvargs_process(kvlist, PMD_PARAM_SOFT_TM_ENQ_BSZ,
+ &get_uint32, &p->soft.tm.enq_bsz);
+ if (ret < 0)
+ goto out_free;
+
+ p->soft.flags |= PMD_FEATURE_TM;
+ }
+
+ /* SOFT: TM dequeue burst size (optional) */
+ if (rte_kvargs_count(kvlist, PMD_PARAM_SOFT_TM_DEQ_BSZ) == 1) {
+ ret = rte_kvargs_process(kvlist, PMD_PARAM_SOFT_TM_DEQ_BSZ,
+ &get_uint32, &p->soft.tm.deq_bsz);
+ if (ret < 0)
+ goto out_free;
+
+ p->soft.flags |= PMD_FEATURE_TM;
+ }
+
/* HARD: name (mandatory) */
if (rte_kvargs_count(kvlist, PMD_PARAM_HARD_NAME) == 1) {
ret = rte_kvargs_process(kvlist, PMD_PARAM_HARD_NAME,
@@ -512,6 +742,7 @@ pmd_probe(struct rte_vdev_device *vdev)
int status;
struct rte_eth_dev_info hard_info;
+ uint32_t hard_speed;
uint8_t hard_port_id;
int numa_node;
void *dev_private;
@@ -534,11 +765,19 @@ pmd_probe(struct rte_vdev_device *vdev)
return -EINVAL;
rte_eth_dev_info_get(hard_port_id, &hard_info);
+ hard_speed = eth_dev_speed_max_mbps(hard_info.speed_capa);
numa_node = rte_eth_dev_socket_id(hard_port_id);
if (p.hard.tx_queue_id >= hard_info.max_tx_queues)
return -EINVAL;
+ if (p.soft.flags & PMD_FEATURE_TM) {
+ status = tm_params_check(&p, hard_speed);
+
+ if (status)
+ return status;
+ }
+
/* Allocate and initialize soft ethdev private data */
dev_private = pmd_init(&p, numa_node);
if (dev_private == NULL)
@@ -591,5 +830,14 @@ static struct rte_vdev_driver pmd_softnic_drv = {
RTE_PMD_REGISTER_VDEV(net_softnic, pmd_softnic_drv);
RTE_PMD_REGISTER_PARAM_STRING(net_softnic,
+ PMD_PARAM_SOFT_TM "=on|off "
+ PMD_PARAM_SOFT_TM_RATE "=<int> "
+ PMD_PARAM_SOFT_TM_NB_QUEUES "=<int> "
+ PMD_PARAM_SOFT_TM_QSIZE0 "=<int> "
+ PMD_PARAM_SOFT_TM_QSIZE1 "=<int> "
+ PMD_PARAM_SOFT_TM_QSIZE2 "=<int> "
+ PMD_PARAM_SOFT_TM_QSIZE3 "=<int> "
+ PMD_PARAM_SOFT_TM_ENQ_BSZ "=<int> "
+ PMD_PARAM_SOFT_TM_DEQ_BSZ "=<int> "
PMD_PARAM_HARD_NAME "=<string> "
PMD_PARAM_HARD_TX_QUEUE_ID "=<int>");
@@ -40,6 +40,22 @@
extern "C" {
#endif
+#ifndef SOFTNIC_SOFT_TM_NB_QUEUES
+#define SOFTNIC_SOFT_TM_NB_QUEUES 65536
+#endif
+
+#ifndef SOFTNIC_SOFT_TM_QUEUE_SIZE
+#define SOFTNIC_SOFT_TM_QUEUE_SIZE 64
+#endif
+
+#ifndef SOFTNIC_SOFT_TM_ENQ_BSZ
+#define SOFTNIC_SOFT_TM_ENQ_BSZ 32
+#endif
+
+#ifndef SOFTNIC_SOFT_TM_DEQ_BSZ
+#define SOFTNIC_SOFT_TM_DEQ_BSZ 24
+#endif
+
#ifndef SOFTNIC_HARD_TX_QUEUE_ID
#define SOFTNIC_HARD_TX_QUEUE_ID 0
#endif
@@ -37,10 +37,19 @@
#include <stdint.h>
#include <rte_mbuf.h>
+#include <rte_sched.h>
#include <rte_ethdev.h>
#include "rte_eth_softnic.h"
+/**
+ * PMD Parameters
+ */
+
+enum pmd_feature {
+ PMD_FEATURE_TM = 1, /**< Traffic Management (TM) */
+};
+
#ifndef INTRUSIVE
#define INTRUSIVE 0
#endif
@@ -57,6 +66,16 @@ struct pmd_params {
* (potentially faster).
*/
int intrusive;
+
+ /** Traffic Management (TM) */
+ struct {
+ uint32_t rate; /**< Rate (bytes/second) */
+ uint32_t nb_queues; /**< Number of queues */
+ uint16_t qsize[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE];
+ /**< Queue size per traffic class */
+ uint32_t enq_bsz; /**< Enqueue burst size */
+ uint32_t deq_bsz; /**< Dequeue burst size */
+ } tm;
} soft;
/** Parameters for the hard device (existing) */
@@ -75,7 +94,7 @@ struct pmd_params {
#endif
#ifndef FLUSH_COUNT_THRESHOLD
-#define FLUSH_COUNT_THRESHOLD (1 << 17)
+#define FLUSH_COUNT_THRESHOLD (1 << 17)
#endif
struct default_internals {
@@ -86,6 +105,66 @@ struct default_internals {
};
/**
+ * Traffic Management (TM) Internals
+ */
+
+#ifndef TM_MAX_SUBPORTS
+#define TM_MAX_SUBPORTS 8
+#endif
+
+#ifndef TM_MAX_PIPES_PER_SUBPORT
+#define TM_MAX_PIPES_PER_SUBPORT 4096
+#endif
+
+struct tm_params {
+ struct rte_sched_port_params port_params;
+
+ struct rte_sched_subport_params subport_params[TM_MAX_SUBPORTS];
+
+ struct rte_sched_pipe_params
+ pipe_profiles[RTE_SCHED_PIPE_PROFILES_PER_PORT];
+ uint32_t n_pipe_profiles;
+ uint32_t pipe_to_profile[TM_MAX_SUBPORTS * TM_MAX_PIPES_PER_SUBPORT];
+};
+
+/* TM Levels */
+enum tm_node_level {
+ TM_NODE_LEVEL_PORT = 0,
+ TM_NODE_LEVEL_SUBPORT,
+ TM_NODE_LEVEL_PIPE,
+ TM_NODE_LEVEL_TC,
+ TM_NODE_LEVEL_QUEUE,
+ TM_NODE_LEVEL_MAX,
+};
+
+/* TM Hierarchy Specification */
+struct tm_hierarchy {
+ uint32_t n_tm_nodes[TM_NODE_LEVEL_MAX];
+};
+
+struct tm_internals {
+ /** Hierarchy specification
+ *
+ * -Hierarchy is unfrozen at init and when port is stopped.
+ * -Hierarchy is frozen on successful hierarchy commit.
+ * -Run-time hierarchy changes are not allowed, therefore it makes
+ * sense to keep the hierarchy frozen after the port is started.
+ */
+ struct tm_hierarchy h;
+
+ /** Blueprints */
+ struct tm_params params;
+
+ /** Run-time */
+ struct rte_sched_port *sched;
+ struct rte_mbuf **pkts_enq;
+ struct rte_mbuf **pkts_deq;
+ uint32_t pkts_enq_len;
+ uint32_t txq_pos;
+ uint32_t flush_count;
+};
+
+/**
* PMD Internals
*/
struct pmd_internals {
@@ -95,6 +174,7 @@ struct pmd_internals {
/** Soft device */
struct {
struct default_internals def; /**< Default */
+ struct tm_internals tm; /**< Traffic Management */
} soft;
/** Hard device */
@@ -111,4 +191,28 @@ struct pmd_rx_queue {
} hard;
};
+int
+tm_params_check(struct pmd_params *params, uint32_t hard_rate);
+
+int
+tm_init(struct pmd_internals *p, struct pmd_params *params, int numa_node);
+
+void
+tm_free(struct pmd_internals *p);
+
+int
+tm_start(struct pmd_internals *p);
+
+void
+tm_stop(struct pmd_internals *p);
+
+static inline int
+tm_used(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *p = dev->data->dev_private;
+
+ return (p->params.soft.flags & PMD_FEATURE_TM) &&
+ p->soft.tm.h.n_tm_nodes[TM_NODE_LEVEL_PORT];
+}
+
#endif /* __INCLUDE_RTE_ETH_SOFTNIC_INTERNALS_H__ */
new file mode 100644
@@ -0,0 +1,181 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_malloc.h>
+
+#include "rte_eth_softnic_internals.h"
+#include "rte_eth_softnic.h"
+
+#define BYTES_IN_MBPS (1000 * 1000 / 8)
+
+int
+tm_params_check(struct pmd_params *params, uint32_t hard_rate)
+{
+ uint64_t hard_rate_bytes_per_sec = hard_rate * BYTES_IN_MBPS;
+ uint32_t i;
+
+ /* rate */
+ if (params->soft.tm.rate) {
+ if (params->soft.tm.rate > hard_rate_bytes_per_sec)
+ return -EINVAL;
+ } else {
+ params->soft.tm.rate =
+ (hard_rate_bytes_per_sec > UINT32_MAX) ?
+ UINT32_MAX : hard_rate_bytes_per_sec;
+ }
+
+ /* nb_queues */
+ if (params->soft.tm.nb_queues == 0)
+ return -EINVAL;
+
+ if (params->soft.tm.nb_queues < RTE_SCHED_QUEUES_PER_PIPE)
+ params->soft.tm.nb_queues = RTE_SCHED_QUEUES_PER_PIPE;
+
+ params->soft.tm.nb_queues =
+ rte_align32pow2(params->soft.tm.nb_queues);
+
+ /* qsize */
+ for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; i++) {
+ if (params->soft.tm.qsize[i] == 0)
+ return -EINVAL;
+
+ params->soft.tm.qsize[i] =
+ rte_align32pow2(params->soft.tm.qsize[i]);
+ }
+
+ /* enq_bsz, deq_bsz */
+ if ((params->soft.tm.enq_bsz == 0) ||
+ (params->soft.tm.deq_bsz == 0) ||
+ (params->soft.tm.deq_bsz >= params->soft.tm.enq_bsz))
+ return -EINVAL;
+
+ return 0;
+}
+
+int
+tm_init(struct pmd_internals *p,
+ struct pmd_params *params,
+ int numa_node)
+{
+ uint32_t enq_bsz = params->soft.tm.enq_bsz;
+ uint32_t deq_bsz = params->soft.tm.deq_bsz;
+
+ p->soft.tm.pkts_enq = rte_zmalloc_socket(params->soft.name,
+ 2 * enq_bsz * sizeof(struct rte_mbuf *),
+ 0,
+ numa_node);
+
+ if (p->soft.tm.pkts_enq == NULL)
+ return -ENOMEM;
+
+ p->soft.tm.pkts_deq = rte_zmalloc_socket(params->soft.name,
+ deq_bsz * sizeof(struct rte_mbuf *),
+ 0,
+ numa_node);
+
+ if (p->soft.tm.pkts_deq == NULL) {
+ rte_free(p->soft.tm.pkts_enq);
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+void
+tm_free(struct pmd_internals *p)
+{
+ rte_free(p->soft.tm.pkts_enq);
+ rte_free(p->soft.tm.pkts_deq);
+}
+
+int
+tm_start(struct pmd_internals *p)
+{
+ struct tm_params *t = &p->soft.tm.params;
+ uint32_t n_subports, subport_id;
+ int status;
+
+ /* Port */
+ p->soft.tm.sched = rte_sched_port_config(&t->port_params);
+ if (p->soft.tm.sched == NULL)
+ return -1;
+
+ /* Subport */
+ n_subports = t->port_params.n_subports_per_port;
+ for (subport_id = 0; subport_id < n_subports; subport_id++) {
+ uint32_t n_pipes_per_subport =
+ t->port_params.n_pipes_per_subport;
+ uint32_t pipe_id;
+
+ status = rte_sched_subport_config(p->soft.tm.sched,
+ subport_id,
+ &t->subport_params[subport_id]);
+ if (status) {
+ rte_sched_port_free(p->soft.tm.sched);
+ return -1;
+ }
+
+ /* Pipe */
+ n_pipes_per_subport = t->port_params.n_pipes_per_subport;
+ for (pipe_id = 0; pipe_id < n_pipes_per_subport; pipe_id++) {
+ int pos = subport_id * TM_MAX_PIPES_PER_SUBPORT +
+ pipe_id;
+ int profile_id = t->pipe_to_profile[pos];
+
+ if (profile_id < 0)
+ continue;
+
+ status = rte_sched_pipe_config(p->soft.tm.sched,
+ subport_id,
+ pipe_id,
+ profile_id);
+ if (status) {
+ rte_sched_port_free(p->soft.tm.sched);
+ return -1;
+ }
+ }
+ }
+
+ return 0;
+}
+
+void
+tm_stop(struct pmd_internals *p)
+{
+ if (p->soft.tm.sched)
+ rte_sched_port_free(p->soft.tm.sched);
+}