[PATCH v7 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model

Yan, Zhirun zhirun.yan at intel.com
Tue Jun 6 10:51:11 CEST 2023
Previous message (by thread): [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model
Next message (by thread): [PATCH v7 16/17] test/graph: add functional tests for mcore dispatch model
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk at gmail.com>
> Sent: Tuesday, June 6, 2023 1:55 PM
> To: Yan, Zhirun <zhirun.yan at intel.com>
> Cc: dev at dpdk.org; jerinj at marvell.com; kirankumark at marvell.com;
> ndabilpuram at marvell.com; stephen at networkplumber.org;
> pbhagavatula at marvell.com; Liang, Cunming <cunming.liang at intel.com>; Wang,
> Haiyue <haiyue.wang at intel.com>; mattias.ronnblom
> <mattias.ronnblom at ericsson.com>
> Subject: Re: [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore
> dispatch worker model
> 
> On Tue, Jun 6, 2023 at 10:41 AM Yan, Zhirun <zhirun.yan at intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk at gmail.com>
> > > Sent: Monday, June 5, 2023 9:42 PM
> > > To: Yan, Zhirun <zhirun.yan at intel.com>
> > > Cc: dev at dpdk.org; jerinj at marvell.com; kirankumark at marvell.com;
> > > ndabilpuram at marvell.com; stephen at networkplumber.org;
> > > pbhagavatula at marvell.com; Liang, Cunming <cunming.liang at intel.com>;
> > > Wang, Haiyue <haiyue.wang at intel.com>; mattias.ronnblom
> > > <mattias.ronnblom at ericsson.com>
> > > Subject: Re: [PATCH v7 15/17] examples/l3fwd-graph: introduce
> > > multicore dispatch worker model
> > >
> > > On Mon, Jun 5, 2023 at 4:57 PM Zhirun Yan <zhirun.yan at intel.com> wrote:
> > > >
> > > > Add new parameter "model" to choose mcore dispatch or rtc model.
> > > > And in dispatch model, the node will affinity to worker core successively.
> > > >
> > > > Note:
> > > > RTE_GRAPH_MODEL_SELECT is set to GRAPH_MODEL_RTC by default.
> Must
> > > set
> > > > model the same as RTE_GRAPH_MODEL_SELECT If set it as rtc or mcore
> > > > dispatch explicitly. GRAPH_MODEL_MCORE_RUNTIME_SELECT means it
> > > > could choose by model in runtime.
> > > > Only support one RX node for mcore dispatch model in current
> > > > implementation.
> > > >
> > > > ./dpdk-l3fwd-graph  -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)"
> > > > -P --model="dispatch"
> > > >
> > > > Signed-off-by: Haiyue Wang <haiyue.wang at intel.com>
> > > > Signed-off-by: Cunming Liang <cunming.liang at intel.com>
> > > > Signed-off-by: Zhirun Yan <zhirun.yan at intel.com>
> > > > ---
> > > >  examples/l3fwd-graph/main.c  | 231 +++++++++++++++++++++++++++++--
> ----
> > > >  lib/graph/rte_graph_worker.h |   3 +
> > > >  2 files changed, 196 insertions(+), 38 deletions(-)
> > > >
> > > > diff --git a/examples/l3fwd-graph/main.c
> > > > b/examples/l3fwd-graph/main.c index 5feeab4f0f..4ecc6c9af4 100644
> > > > --- a/examples/l3fwd-graph/main.c
> > > > +++ b/examples/l3fwd-graph/main.c
> > > > @@ -23,6 +23,12 @@
> > > >  #include <rte_cycles.h>
> > > >  #include <rte_eal.h>
> > > >  #include <rte_ethdev.h>
> > > > +#define GRAPH_MODEL_RTC 0 /* Run-to-completion model, set by
> default.
> > > > +*/ #define GRAPH_MODEL_MCORE_DISPATCH 1 /* Dispatch model. */
> > > #define
> > > > +GRAPH_MODEL_MCORE_RUNTIME_SELECT 2 /* Support to select model
> by
> > > */
> > > > +                                          /* parsing model in
> > > > +cmdline. */
> > >
> > > After moving model to graph->model, Can you check the performance.
> >
> > In my env, I test l3fwd-graph, I got the same throughput.(slight
> > improve could be treated as jitter) For graph_perf_autotest in test
> > app, there is slight drop (About 0.2% call, similar cycles/call) Can it be treated
> as jitter?
> 
> Most likely.
> Try in following in fasth path.
>  const ... model = graph->model;
> 

By default we set RTE_GRAPH_MODEL_SELECT == RTE_GRAPH_MODEL_RTC,
So get model is only in stats, not in fast path.

I found the root cause is coming from the additional unit test.

But actually, it should not be called.  All added UT is under graph_autotest, not in graph_perf_autotest.

It's strange if I destroy the cloned graph in testcase in the additional UT, can get same stats as we expected(even
better performance). A little more calls, objs and better cycles/call (28->27).

New:
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|Node                           |calls          |objs           |realloc_count  |objs/call      |objs/sec(10E6) |cycles/call|
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+
|test_graph_perf_worker-0-0     |10286353       |2633306368     |1              |256.000        |2037.676032    |27.0000    |
|test_graph_perf_worker-1-0     |10286709       |2633397504     |1              |256.000        |2037.767168    |27.0000    |
|test_graph_perf_worker-2-0     |10286732       |2633403392     |1              |256.000        |2037.773056    |27.0000    |
|test_graph_perf_worker-3-0     |10286751       |2633408256     |1              |256.000        |2037.777920    |27.0000    |
|test_graph_perf_source-0       |10286774       |2633414144     |2              |256.000        |2037.783552    |27.0000    |
|test_graph_perf_sink-0         |10286791       |2633418496     |1              |256.000        |2037.788160    |27.0000    |
+-------------------------------+---------------+---------------+---------------+---------------+---------------+-----------+


> >
> > Old:
> > +-------------------------------+---------------+---------------+---------------+------------
> ---+---------------+-----------+
> > |Node                           |calls          |objs           |realloc_count  |objs/call
> |objs/sec(10E6) |cycles/call|
> > +-------------------------------+---------------+---------------+---------------+------------
> ---+---------------+-----------+
> > |test_graph_perf_worker-0-0     |10175176       |2604845056     |1
> |256.000        |2015.394304    |27.0000    |
> > |test_graph_perf_worker-1-0     |10175542       |2604938752     |1
> |256.000        |2015.488000    |28.0000    |
> > |test_graph_perf_worker-2-0     |10175565       |2604944640     |1
> |256.000        |2015.493888    |28.0000    |
> > |test_graph_perf_worker-3-0     |10175593       |2604951808     |1
> |256.000        |2015.501056    |27.0000    |
> > |test_graph_perf_source-0       |10175623       |2604959488     |2
> |256.000        |2015.508480    |27.0000    |
> > |test_graph_perf_sink-0         |10175642       |2604964352     |1
> |256.000        |2015.513600    |27.0000    |
> > +-------------------------------+---------------+---------------+---------------+------------
> ---+---------------+-----------+
> >
> > New:
> > +-------------------------------+---------------+---------------+---------------+------------
> ---+---------------+-----------+
> > |Node                           |calls          |objs           |realloc_count  |objs/call
> |objs/sec(10E6) |cycles/call|
> > +-------------------------------+---------------+---------------+---------------+------------
> ---+---------------+-----------+
> > |test_graph_perf_worker-0-0     |10154953       |2599667968     |1
> |256.000        |2010.960128    |27.0000    |
> > |test_graph_perf_worker-1-0     |10155316       |2599760896     |1
> |256.000        |2011.053056    |27.0000    |
> > |test_graph_perf_worker-2-0     |10155338       |2599766528     |1
> |256.000        |2011.058688    |28.0000    |
> > |test_graph_perf_worker-3-0     |10155357       |2599771392     |1
> |256.000        |2011.063552    |28.0000    |
> > |test_graph_perf_source-0       |10155394       |2599780864     |2
> |256.000        |2011.072768    |27.0000    |
> > |test_graph_perf_sink-0         |10155422       |2599788032     |1
> |256.000        |2011.080192    |27.0000    |
> > +-------------------------------+---------------+---------------+---------------+------------
> ---+---------------+-----------+
> >
> > > This may not be needed for l3fwd
> > >
> > Do you mean graph->model?
> 
> Yes.
> 
> >
> > > or if there is not much code duplication,
> > >
> > > Do the following remove the limitation,  #define
> > > RTE_GRAPH_MODEL_SELECT RTE_GRAPH_MODEL_RTC.
> > >
> > > graph_main_loop change to graph_main_rtc_loop
> > >
> > >  #define RTE_GRAPH_MODEL_SELECT GRAPH_MODEL_MCORE_DISPATCH
> > >
> > > graph_main_loop change to graph_main_mcore_loop
> > >
> > > Select the following based on runtime option
> > >         /* Launch per-lcore init on every worker lcore */
> > >         rte_eal_mp_remote_launch(graph_main_rtc_loop, NULL, SKIP_MAIN);
> or
> > >         rte_eal_mp_remote_launch(graph_main_mcore_loop, NULL,
> > > SKIP_MAIN);
> > >
> >
> > We want to 1. Use same API (rte_graph_walk()) for diff models.
> > 2. no performance drop for rtc (use RTE_GRAPH_MODEL_SELECT in compile
> > time)
> >
> > If I understand correctly, I need remove graph->model and only use
> > RTE_GRAPH_MODEL_SELECT to select models?
> >
> > And change it as
> > graph_main_rtc_loop()
> > {
> >   While(1)
> >     rte_graph_walk_rtc()
> > }
> >
> > But actually, I think graph->model is need, especially for config
> > stage and for runtime config If set RTE_GRAPH_MODEL_SELECT_RUNTIME.
> 
> Yes. Agree. If there is no MAJOR performance issues lets use
> RTE_GRAPH_MODEL_SELECT_RUNTIME for l3fwd.
> 

Thanks, there is no major performance issues. I will keep to use current
scheme.

> > We need the model type to decide to alloc workqueue and use
> > RTE_GRAPH_MODEL_SELECT to choose the walk.
> >
> >
> > > >         memset(&rewrite_data, 0, sizeof(rewrite_data));
> > > >         rewrite_len = sizeof(rewrite_data); diff --git
> > > > a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> > > > index 541c373cb1..19b4c1514f 100644
> > > > --- a/lib/graph/rte_graph_worker.h
> > > > +++ b/lib/graph/rte_graph_worker.h
> > > > @@ -26,6 +26,9 @@ __rte_experimental  static inline void
> > > > rte_graph_walk(struct rte_graph *graph)  {
> > > > +#define RTE_GRAPH_MODEL_RTC 0
> > > > +#define RTE_GRAPH_MODEL_MCORE_DISPATCH 1
> > >
> > > No need for duplicate enum. Please remove enum make  this as in
> > > public header file.
> > >
> > Yes, it will cause no defined warnings.
> > Thanks for your comments.
> > I will remove enum and define model type macros in public header. And
> > also change the related structs/APIs.
> 
> Also add a comment in RTE_GRAPH_MODEL_MCORE_DISPATCH, If adding new
> entry, then update graph_is_valid API.
> 
Got it. Thanks very much.

> >
> > >
> > > > +
> > >
> > > Add comment here, On how  application uses this, aka.  before
> > > inlcuding the worker header file #define RTE_GRAPH_MODEL_SELECT
> > > RTE_GRAPH_MODEL_RTC.
> > > Please change the text as needed.
> > Yes, I will add comment and add the usage in document.
> >
> > >
> > >
> > > >  #if !defined(RTE_GRAPH_MODEL_SELECT) ||
> RTE_GRAPH_MODEL_SELECT ==
> > > RTE_GRAPH_MODEL_RTC
> > > >         rte_graph_walk_rtc(graph);  #elif RTE_GRAPH_MODEL_SELECT
> > > > ==
> > > RTE_GRAPH_MODEL_MCORE_DISPATCH
> > > > --
> > > > 2.37.2
> > > >
Previous message (by thread): [PATCH v7 15/17] examples/l3fwd-graph: introduce multicore dispatch worker model
Next message (by thread): [PATCH v7 16/17] test/graph: add functional tests for mcore dispatch model
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list