[dpdk-dev] 回复: [PATCH v1 1/2] app/test: remove unnecessary barriers for ring stress test

Feifei Wang Feifei.Wang2 at arm.com
Mon Feb 1 10:07:54 CET 2021


Hi, everyone

Sorry for that there may be a problem in the e-mail format. 
Please see the picture according to the following link:
https://ibb.co/SQ7yGfW

Best Regards
Feifei

> -----邮件原件-----
> 发件人: Feifei Wang
> 发送时间: 2021年2月1日 16:49
> 收件人: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>; Stephen
> Hemminger <stephen at networkplumber.org>
> 抄送: Ananyev, Konstantin <konstantin.ananyev at intel.com>;
> dev at dpdk.org; nd <nd at arm.com>; Ruifeng Wang
> <Ruifeng.Wang at arm.com>; nd <nd at arm.com>
> 主题: 回复: [dpdk-dev] [PATCH v1 1/2] app/test: remove unnecessary
> barriers for ring stress test
> 
> Sorry, a mistake happens in the picture, after Wrk_cmd == WRK_CMD_RUN,
> it should be a rmb rather than wmb.
> 
> > -----邮件原件-----
> > 发件人: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>
> > 发送时间: 2021年1月30日 9:24
> > 收件人: Stephen Hemminger <stephen at networkplumber.org>
> > 抄送: Ananyev, Konstantin <konstantin.ananyev at intel.com>; Feifei Wang
> > <Feifei.Wang2 at arm.com>; dev at dpdk.org; nd <nd at arm.com>; Ruifeng
> Wang
> > <Ruifeng.Wang at arm.com>; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli at arm.com>; nd <nd at arm.com>
> > 主题: RE: [dpdk-dev] [PATCH v1 1/2] app/test: remove unnecessary
> > barriers for ring stress test
> >
> > <snip>
> >
> > > >
> > > > >
> > > > > > >
> > > > > > > Hi Feifei,
> > > > > > >
> > > > > > > >
> > > > > > > > The variable "wrk_cmd" is a signal to control threads from
> > > > > > > > running and stopping. When worker lcores load "wrk_cmd ==
> > > > > WRK_CMD_RUN",
> > > > > > > > they
> > > > > > > start
> > > > > > > > running and when worker lcores load "wrk_cmd ==
> > > > > > > > WRK_CMD_STOP",
> > > > > > > they
> > > > > > > > stop.
> > > > > > > >
> > > > > > > > For the wmb in test_mt1, no storing operations must keep
> > > > > > > > the order after storing "wrk_cmd". Thus the wmb is
> unnecessary.
> > > > > > >
> > > > > > > I think there is a bug in my original code, we should do
> > > > > > > smp_wmb()
> > > > > > > *before* setting wrk_cmd, not after:
> > > > > > >
> > > > > > >         /* launch on all workers */
> > > > > > >         RTE_LCORE_FOREACH_WORKER(lc) {
> > > > > > >                 arg[lc].rng = r;
> > > > > > >                 arg[lc].stats = init_stat;
> > > > > > >                 rte_eal_remote_launch(test, &arg[lc], lc);
> > > > > > >         }
> > > > > > >
> > > > > > >         /* signal worker to start test */
> > > > > > > +      rte_smp_wmb();
> > > > > > >         wrk_cmd = WRK_CMD_RUN;
> > > > > > > -       rte_smp_wmb();
> > > > > > >
> > > > > > >         usleep(run_time * US_PER_S);
> > > > > > >
> > > > > > >
> > > > > > > I still think we'd better have some synchronisation here.
> > > > > > > Otherwise what would prevent compiler and/or cpu to update
> > > > > > > wrk_cmd out of order (before _init_ phase is completed)?
> > > > > > > We probably can safely assume no reordering from the
> > > > > > > compiler here, as we have function calls straight before and
> > > > > > > after 'wrk_cmd =
> > > > > WRK_CMD_RUN;'
> > > > > > > But for consistency and easier maintenance, I still think it
> > > > > > > is better to have something here, after all it is not
> > > > > > > performance critical
> > > pass.
> > > > > > Agree that this is not performance critical.
> > > > > >
> > > > > > This is more about correctness (as usually people refer to
> > > > > > code to understand the concepts). You can refer to video [1].
> > > > > > Essentially, the pthread_create has 'happens-before' behavior.
> > > > > > i.e. all the memory operations before the pthread_create are
> > > > > > visible to the new
> > > thread.
> > > > > > The
> > > > > > rte_smp_rmb() barrier in the thread function is not required
> > > > > > as it reads the
> > > > > data that was set before the thread was launched.
> > > > >
> > > > > rte_eal_remote_launch() doesn't call pthread_create().
> > > > > All it does -  updates global variable (lcore_config) and
> > > > > writes/reads to/from the pipe.
> > > > >
> > > > Thanks for the reminder ☹
> > > > I think rte_eal_remote_launch and rte_eal_wait_lcore need to
> > > > provide
> > > behavior similar to pthread_launch and pthread_join respectively.
> > > >
> > > > There is use of rte_smp_*mb in those functions as well. Those need
> > > > to be fixed
> > > first and then look at these.
> > >
> > > Looks like you want __atomic_thread_fence() here.
> > >
> > In the rte_eal_remote_launch case, all the memory operations before
> > the API call need to be visible to the worker. If this is the only
> > requirement, we can use the function pointer as the guard variable and
> > use store-release. In the eal_thread_loop function we could do
> > load-acquire on the function pointer.
> >
> > I do not think that there is a requirement to ensure that the memory
> > operations after the API call do not happen before the worker thread
> > starts running the function (As there is no guarantee on when the
> > worker thread will run. If the main thread needs to know if the worker
> > thread is running explicit hand-shaking needs to happen).
> >
> > The rte_eal_wait_lcore API needs to ensure that the memory operations
> > in the worker are visible to the main. rte_eal_wait_lcore and
> > eal_thread_loop are synchronizing using lcore_config[worker_id].state.
> > I need to understand what else 'state' is used for. If there are no
> > issues, we can do a store-release on 'state' in eal_thread_loop and a load-
> acquire in rte_eal_wait_lcore.
> >
> > So, we do not have to use the __atomic_thread_fence.
> >



More information about the dev mailing list