[PATCH] failsafe: fix segfault on hotplug event

Konstantin Ananyev konstantin.ananyev at huawei.com
Fri Nov 18 17:31:34 CET 2022



Hi Luc,
 
> > Hi Konstantin,
> >
> > > It is not recommended way to update rte_eth_fp_ops[] contents directly.
> > > There are eth_dev_fp_ops_setup()/ eth_dev_fp_ops_reset() that supposed
> > > to be used for that.
> >
> > Good to know. I see another fix that was made in a different PMD that
> > does exactly the same thing:
> >
> > https://github.com/DPDK/dpdk/commit/bcd68b68415172815e55fc67cf3947c0433baf74
> >
> > CC'ing the authors for awareness.
> >
> > > About the fix itself - while it might help till some extent,
> > > I think it will not remove the problem completely.
> > > There still remain a race-condition between rte_eth_rx_burst() and failsafe_eth_rmv_event_callback().
> > > Right now DPDK doesn't support switching PMD fast-ops functions (or updating rxq/txq data)
> > > on the fly.
> >
> > Thanks for the information. This is very helpful.
> >
> > Are you saying that the previous code also had that same race
> > condition?

Yes, I believe so. 

> It was only updating the rte_eth_dev structure, but I
> > assume the problem would have been the same since rte_eth_rx_burst()
> > in DPDK versions <=20 use the function pointers in rte_eth_dev, not
> > rte_eth_fp_ops.
> >
> > Can you think of a possible solution to this problem? I'm happy to
> > provide a patch to properly fix the problem. Having your guidance
> > would be extremely helpful.
> >
> > Thanks!
> 
> Changing burst mode on a running device is not safe because
> of lack of locking and/or memory barriers.
> 
> Would have been better to not to do this optimization.
> Just have one rx_burst/tx_burst function and look at what
> ever conditions are present there.

I think Stephen is right - within DPDK it is just not possible to switch RX/TX
function on the fly (without some external synchronization).
So the safe way is to always use safe version of RX/TX call.
I personally don't think such few extra checks will affect performance that much.

As another nit: inside failsafe rx_burst functions it is probably better not to access dev->data->rx_queues directly,
but call rte_eth_rx_burst(sdev->sdev_port_id, ...); instead.
Same for TX.

Thanks
Konstantin

 




More information about the stable mailing list