[dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler
Alan Robertson
aroberts at Brocade.com
Thu Dec 8 16:41:08 CET 2016
Hi Cristian,
The way qos works just now should be feasible for dynamic targets. That is similar functions
to rte_sched_port_enqueue() and rte_sched_port_dequeue() would be called. The first to
enqueue the mbufs onto the queues the second to dequeue. The qos structures and scheduler
don't need to be as functionally rich though. I would have thought a simple pipe with child
nodes should suffice for most. That would allow each tunnel/session to be shaped and the
queueing and drop logic inherited from what is there just now.
Thanks,
Alan.
-----Original Message-----
From: Dumitrescu, Cristian [mailto:cristian.dumitrescu at intel.com]
Sent: Wednesday, December 07, 2016 7:52 PM
To: Alan Robertson
Cc: dev at dpdk.org; Thomas Monjalon
Subject: RE: [dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler
Hi Alan,
Thanks for your comments!
> Hi Cristian,
> Looking at points 10 and 11 it's good to hear nodes can be dynamically added.
Yes, many implementations allow on-the-fly remapping a node from one parent to another one, or simply adding more nodes post-initialization, so it is natural for the API to provide this.
> We've been trying to decide the best way to do this for support of qos
> on tunnels for some time now and the existing implementation doesn't
> allow this so effectively ruled out hierarchical queueing for tunnel targets on the output interface.
> Having said that, has thought been given to separating the queueing from being so closely
> tied to the Ethernet transmit process ? When queueing on a tunnel for example we may
> be working with encryption. When running with an anti-reply window it is really much
> better to do the QOS (packet reordering) before the encryption. To
> support this would it be possible to have a separate scheduler
> structure which can be passed into the scheduling API ? This means
> the calling code can hang the structure of whatever entity it wishes to perform qos on, and we get dynamic target support (sessions/tunnels etc).
Yes, this is one point where we need to look for a better solution. Current proposal attaches the hierarchical scheduler function to an ethdev, so scheduling traffic for tunnels that have a pre-defined bandwidth is not supported nicely. This question was also raised in VPP, but there tunnels are supported as a type of output interfaces, so attaching scheduling to an output interface also covers the tunnels case.
Looks to me that nice tunnel abstractions are a gap in DPDK as well. Any thoughts about how tunnels should be supported in DPDK? What do other people think about this?
> Regarding the structure allocation, would it be possible to make the
> number of queues associated with a TC a compile time option which the scheduler would accommodate ?
> We frequently only use one queue per tc which means 75% of the space
> allocated at the queueing layer for that tc is never used. This may
> be specific to our implementation but if other implementations do the
> same if folks could say we may get a better idea if this is a common case.
> Whilst touching on the scheduler, the token replenishment works using
> a division and multiplication obviously to cater for the fact that it
> may be run after several tc windows have passed. The most commonly
> used industrial scheduler simply does a lapsed on the tc and then adds
> the bc. This relies on the scheduler being called within the tc
> window though. It would be nice to have this as a configurable option since it's much for efficient assuming the infra code from which it's called can guarantee the calling frequency.
This is probably feedback for librte_sched as opposed to the current API proposal, as the Latter is intended to be generic/implementation-agnostic and therefor its scope far exceeds the existing set of librte_sched features.
Btw, we do plan using the librte_sched feature as the default fall-back when the HW ethdev is not scheduler-enabled, as well as the implementation of choice for a lot of use-cases where it fits really well, so we do have to continue evolve and improve librte_sched feature-wise and performance-wise.
> I hope you'll consider these points for inclusion into a future road
> map. Hopefully in the future my employer will increase the priority
> of some of the tasks and a PR may appear on the mailing list.
> Thanks,
> Alan.
More information about the dev
mailing list