[dpdk-dev] 回复: [PATCH v1 1/2] app/test: remove unnecessary barriers for ring stress test

Feifei Wang Feifei.Wang2 at arm.com
Mon Feb 1 09:48:44 CET 2021


Sorry, a mistake happens in the picture, after Wrk_cmd == WRK_CMD_RUN, it should be a rmb rather than wmb.

> -----邮件原件-----
> 发件人: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>
> 发送时间: 2021年1月30日 9:24
> 收件人: Stephen Hemminger <stephen at networkplumber.org>
> 抄送: Ananyev, Konstantin <konstantin.ananyev at intel.com>; Feifei Wang
> <Feifei.Wang2 at arm.com>; dev at dpdk.org; nd <nd at arm.com>; Ruifeng
> Wang <Ruifeng.Wang at arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli at arm.com>; nd <nd at arm.com>
> 主题: RE: [dpdk-dev] [PATCH v1 1/2] app/test: remove unnecessary barriers
> for ring stress test
> 
> <snip>
> 
> > >
> > > >
> > > > > >
> > > > > > Hi Feifei,
> > > > > >
> > > > > > >
> > > > > > > The variable "wrk_cmd" is a signal to control threads from
> > > > > > > running and stopping. When worker lcores load "wrk_cmd ==
> > > > WRK_CMD_RUN",
> > > > > > > they
> > > > > > start
> > > > > > > running and when worker lcores load "wrk_cmd ==
> > > > > > > WRK_CMD_STOP",
> > > > > > they
> > > > > > > stop.
> > > > > > >
> > > > > > > For the wmb in test_mt1, no storing operations must keep the
> > > > > > > order after storing "wrk_cmd". Thus the wmb is unnecessary.
> > > > > >
> > > > > > I think there is a bug in my original code, we should do
> > > > > > smp_wmb()
> > > > > > *before* setting wrk_cmd, not after:
> > > > > >
> > > > > >         /* launch on all workers */
> > > > > >         RTE_LCORE_FOREACH_WORKER(lc) {
> > > > > >                 arg[lc].rng = r;
> > > > > >                 arg[lc].stats = init_stat;
> > > > > >                 rte_eal_remote_launch(test, &arg[lc], lc);
> > > > > >         }
> > > > > >
> > > > > >         /* signal worker to start test */
> > > > > > +      rte_smp_wmb();
> > > > > >         wrk_cmd = WRK_CMD_RUN;
> > > > > > -       rte_smp_wmb();
> > > > > >
> > > > > >         usleep(run_time * US_PER_S);
> > > > > >
> > > > > >
> > > > > > I still think we'd better have some synchronisation here.
> > > > > > Otherwise what would prevent compiler and/or cpu to update
> > > > > > wrk_cmd out of order (before _init_ phase is completed)?
> > > > > > We probably can safely assume no reordering from the compiler
> > > > > > here, as we have function calls straight before and after
> > > > > > 'wrk_cmd =
> > > > WRK_CMD_RUN;'
> > > > > > But for consistency and easier maintenance, I still think it
> > > > > > is better to have something here, after all it is not
> > > > > > performance critical
> > pass.
> > > > > Agree that this is not performance critical.
> > > > >
> > > > > This is more about correctness (as usually people refer to code
> > > > > to understand the concepts). You can refer to video [1].
> > > > > Essentially, the pthread_create has 'happens-before' behavior.
> > > > > i.e. all the memory operations before the pthread_create are
> > > > > visible to the new
> > thread.
> > > > > The
> > > > > rte_smp_rmb() barrier in the thread function is not required as
> > > > > it reads the
> > > > data that was set before the thread was launched.
> > > >
> > > > rte_eal_remote_launch() doesn't call pthread_create().
> > > > All it does -  updates global variable (lcore_config) and
> > > > writes/reads to/from the pipe.
> > > >
> > > Thanks for the reminder ☹
> > > I think rte_eal_remote_launch and rte_eal_wait_lcore need to provide
> > behavior similar to pthread_launch and pthread_join respectively.
> > >
> > > There is use of rte_smp_*mb in those functions as well. Those need
> > > to be fixed
> > first and then look at these.
> >
> > Looks like you want __atomic_thread_fence() here.
> >
> In the rte_eal_remote_launch case, all the memory operations before the
> API call need to be visible to the worker. If this is the only requirement, we
> can use the function pointer as the guard variable and use store-release. In
> the eal_thread_loop function we could do load-acquire on the function
> pointer.
> 
> I do not think that there is a requirement to ensure that the memory
> operations after the API call do not happen before the worker thread starts
> running the function (As there is no guarantee on when the worker thread
> will run. If the main thread needs to know if the worker thread is running
> explicit hand-shaking needs to happen).
> 
> The rte_eal_wait_lcore API needs to ensure that the memory operations in
> the worker are visible to the main. rte_eal_wait_lcore and eal_thread_loop
> are synchronizing using lcore_config[worker_id].state. I need to understand
> what else 'state' is used for. If there are no issues, we can do a store-release
> on 'state' in eal_thread_loop and a load-acquire in rte_eal_wait_lcore.
> 
> So, we do not have to use the __atomic_thread_fence.
> 



More information about the dev mailing list