[PATCH 2/2] ethdev: fix race condition in fast-path ops setup
Ruifeng Wang
Ruifeng.Wang at arm.com
Wed Feb 22 10:41:01 CET 2023
> -----Original Message-----
> From: fengchengwen <fengchengwen at huawei.com>
> Sent: Wednesday, February 22, 2023 9:07 AM
> To: Stephen Hemminger <stephen at networkplumber.org>; Ruifeng Wang <Ruifeng.Wang at arm.com>
> Cc: Ashok Kaladi <ashok.k.kaladi at intel.com>; jerinj at marvell.com; thomas at monjalon.net;
> Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>; dev at dpdk.org;
> s.v.naga.harish.k at intel.com; erik.g.carrillo at intel.com; abhinandan.gujjar at intel.com;
> stable at dpdk.org; nd <nd at arm.com>
> Subject: Re: [PATCH 2/2] ethdev: fix race condition in fast-path ops setup
>
> On 2023/2/22 1:00, Stephen Hemminger wrote:
> > On Tue, 21 Feb 2023 07:24:19 +0000
> > Ruifeng Wang <Ruifeng.Wang at arm.com> wrote:
> >
> >>> -----Original Message-----
> >>> From: fengchengwen <fengchengwen at huawei.com>
> >>> Sent: Monday, February 20, 2023 2:58 PM
> >>> To: Ashok Kaladi <ashok.k.kaladi at intel.com>; jerinj at marvell.com;
> >>> thomas at monjalon.net
> >>> Cc: dev at dpdk.org; s.v.naga.harish.k at intel.com;
> >>> erik.g.carrillo at intel.com; abhinandan.gujjar at intel.com;
> >>> stable at dpdk.org; Ruifeng Wang <Ruifeng.Wang at arm.com>
> >>> Subject: Re: [PATCH 2/2] ethdev: fix race condition in fast-path ops
> >>> setup
> >>>
> >>> On 2023/2/20 14:08, Ashok Kaladi wrote:
> >>>> If ethdev enqueue or dequeue function is called during
> >>>> eth_dev_fp_ops_setup(), it may get pre-empted after setting the
> >>>> function pointers, but before setting the pointer to port data.
> >>>> In this case the newly registered enqueue/dequeue function will use
> >>>> dummy port data and end up in seg fault.
> >>>>
> >>>> This patch moves the updation of each data pointers before updating
> >>>> corresponding function pointers.
> >>>>
> >>>> Fixes: c87d435a4d79 ("ethdev: copy fast-path API into separate
> >>>> structure")
> >>>> Cc: stable at dpdk.org
> >
> > Why is something calling enqueue/dequeue when device is not fully started.
> > A correctly written application would not call rx/tx burst until after
> > ethdev start had finished.
>
> Please refer the eb0d471a894 (ethdev: add proactive error handling mode), when driver
> recover itself, the application may still invoke enqueue/dequeue API.
>
> >
> > Would something like this work better?
> >
> > Note: there is another bug in current code. The check for link state
> > interrupt and link_ops could return -ENOTSUP and leave device in indeterminate state.
> > The check should be done before calling PMD.
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 0266cc82acb6..d6c163ed85e7 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1582,6 +1582,14 @@ rte_eth_dev_start(uint16_t port_id)
> > return 0;
> > }
> >
> > + if (dev->data->dev_conf.intr_conf.lsc == 0 &&
> > + dev->dev_ops->link_update == NULL) {
> > + RTE_ETHDEV_LOG(INFO,
> > + "Device with port_id=%"PRIu16" link update not supported\n",
> > + port_id);
> > + return -ENOTSUP;
> > + }
> > +
> > ret = rte_eth_dev_info_get(port_id, &dev_info);
> > if (ret != 0)
> > return ret;
> > @@ -1591,9 +1599,7 @@ rte_eth_dev_start(uint16_t port_id)
> > eth_dev_mac_restore(dev, &dev_info);
> >
> > diag = (*dev->dev_ops->dev_start)(dev);
> > - if (diag == 0)
> > - dev->data->dev_started = 1;
> > - else
> > + if (diag != 0)
> > return eth_err(port_id, diag);
> >
> > ret = eth_dev_config_restore(dev, &dev_info, port_id); @@ -1611,16
> > +1617,18 @@ rte_eth_dev_start(uint16_t port_id)
> > return ret;
> > }
> >
> > - if (dev->data->dev_conf.intr_conf.lsc == 0) {
> > - if (*dev->dev_ops->link_update == NULL)
> > - return -ENOTSUP;
> > - (*dev->dev_ops->link_update)(dev, 0);
> > - }
> > -
> > /* expose selection of PMD fast-path functions */
> > eth_dev_fp_ops_setup(rte_eth_fp_ops + port_id, dev);
> >
> > + /* ensure state is set before marking device ready */
> > + rte_smp_wmb();
Without a read barrier at the reader side (rte_eth_rx_burst), the wmb here may not fulfill the required data sync.
One solution is to change eth_dev_fp_ops_reset. Replacing dummy_eth_rx_burst with rte_eth_pkt_burst_dummy and
not touch rxq.data/txq.data. By doing this, the sync requirement between pkt_burst handle and qdata can be removed.
Because rte_eth_pkt_burst_dummy doesn't work on any data.
The downside is loss of error log and stack dump when dummy handle is called.
> > +
> > rte_ethdev_trace_start(port_id);
> > +
> > + /* Update current link state */
> > + if (dev->data->dev_conf.intr_conf.lsc == 0)
> > + (*dev->dev_ops->link_update)(dev, 0);
> > +
> > return 0;
> > }
> >
> >
> > .
> >
More information about the stable
mailing list