[dpdk-dev] [PATCH] ethdev: fix device state on close

Gaëtan Rivet gaetan.rivet at 6wind.com
Fri Aug 18 11:52:50 CEST 2017


On Thu, Aug 17, 2017 at 06:04:27AM +0000, Shahaf Shuler wrote:
> Wednesday, August 16, 2017 6:26 PM, Gaëtan Rivet:
> > > Even though it is reasonable for driver to call the
> > rte_eth_dev_port_release, I still think the ethdev layer should protect from
> > such bad behavior from the application side.
> > > It is more robust than counting on the different PMD to implement such
> > release.
> > >
> > 
> > The ethdev layer does so in the rte_eth_dev_detach function, which is
> > currently the only way to detach a device from the ethdev level.
> > 
> > `rte_eth_dev_detach` setting the device state seems to me to be a crutch
> > against badly implemented drivers. If I was nitpicky I would actually remove it
> > and ideally everyone should enforce the call of rte_eth_dev_release_port
> > from device removal functions when reviewing PMD implementations.
> > 
> > The hotplug API is available to applications and the only way to have
> > consistent devices states after calling rte_eal_hotplug_remove is to have
> > drivers using a hook in the ethdev layer allowing to clean-up resources upon
> > release. While it is trivial in its current state, such entry-point in the ethdev
> > layer will be useful and should be kept and enforced IMO.
> > 
> > I'm thinking about the 16-bit portid scheduled for v17.11, which implies an
> > arbitrary number of port available. This would imply a dynamic allocation of
> > rte_eth_devices, which would *need* such release hook to be available.
> > Well the API should not be designed from conjectures or speculations of
> > course, but I think it should be useful and there is no reason to remove it.
> > 
> > Going further, I find it dangerous to have two points where the port is
> > semantically released (device state set to UNUSED). If the API of the port
> > release changes, we now have two points where we need to enforce the
> > changes. While trivial concerning an enum, it could become more complex /
> > dangerous if this veered toward memory management.
> 
> Those are valid concerns, and you convinced me the RTE_ETH_DEV_UNUSED cannot be set after device close.
> 
> I still think the ethdev layer missing protection against driver calls (other than detach) following a device close. The API not allows, but the ethdev should enforce it.
> Considering some driver memset to 0 their control structs after device close, with no protection such bad calls might lead to segfault, which is not what we want even under bad behavior. 
> 
> One could suggest to protect it inside the driver by replacing the vtable of the driver ops, however it will introduce a lot of duplicated code which can easily be avoided if ethdev would not pass such
> Calls after device close.
> 
> So, how about adding another state which will indicate the device is closed, cannot be used, yet still attached? 
> 

It's feasible. It might be useful. Adding finer device states seems
logical to me, I made those explicit in the fail-safe.

However, in the fail-safe I had the need to make automatic the handling
of devices, meaning that these device states had to be properly
abstracted and parseable.

Adding finer device states in ethdev:

Pros:

  - Easier to follow API and to spot usage mistakes.
  - Increasing the rigidity of the API might make apparent mistakes from
    PMD implementations. Helps PMD maintainers.

Cons:

  - Some of those "mistakes" might be due to hardware-specific
    limitations that cannot be bypassed. I'm thinking about the
    flow-isolation implementation in MLX4 that had specific requirements
    about when it could be enabled, that would actually be forbidden by
    this proposal as it had to be done before rte_eth_dev_configure.
  - Handling special cases could quickly become a mess of edge-cases
    with workarounds here and there. Making the API evolve would imply
    untangling this mess at times.

I'm considering there a full-blown device state implementation. I know
you are only proposing adding an intermediate device state allowing to
protect the user from using a closed device.

But this proposal should be examined on the direction it would lead us
into. It might be simpler for example to introduce a flag "dev_closed"
in rte_eth_dev_data (along dev_started) for example, and checking only
on this in the relevant functions.

I don't have a strong opinion about the finer-grained device states.
I only wanted to lay out what I saw as possible longer-term effects of
choosing this solution and a possible alternative.

-- 
Gaëtan Rivet
6WIND


More information about the dev mailing list