[dpdk-dev] [PATCH] ethdev: fix device state on close
Gaëtan Rivet
gaetan.rivet at 6wind.com
Fri Aug 18 11:52:50 CEST 2017
On Thu, Aug 17, 2017 at 06:04:27AM +0000, Shahaf Shuler wrote:
> Wednesday, August 16, 2017 6:26 PM, Gaëtan Rivet:
> > > Even though it is reasonable for driver to call the
> > rte_eth_dev_port_release, I still think the ethdev layer should protect from
> > such bad behavior from the application side.
> > > It is more robust than counting on the different PMD to implement such
> > release.
> > >
> >
> > The ethdev layer does so in the rte_eth_dev_detach function, which is
> > currently the only way to detach a device from the ethdev level.
> >
> > `rte_eth_dev_detach` setting the device state seems to me to be a crutch
> > against badly implemented drivers. If I was nitpicky I would actually remove it
> > and ideally everyone should enforce the call of rte_eth_dev_release_port
> > from device removal functions when reviewing PMD implementations.
> >
> > The hotplug API is available to applications and the only way to have
> > consistent devices states after calling rte_eal_hotplug_remove is to have
> > drivers using a hook in the ethdev layer allowing to clean-up resources upon
> > release. While it is trivial in its current state, such entry-point in the ethdev
> > layer will be useful and should be kept and enforced IMO.
> >
> > I'm thinking about the 16-bit portid scheduled for v17.11, which implies an
> > arbitrary number of port available. This would imply a dynamic allocation of
> > rte_eth_devices, which would *need* such release hook to be available.
> > Well the API should not be designed from conjectures or speculations of
> > course, but I think it should be useful and there is no reason to remove it.
> >
> > Going further, I find it dangerous to have two points where the port is
> > semantically released (device state set to UNUSED). If the API of the port
> > release changes, we now have two points where we need to enforce the
> > changes. While trivial concerning an enum, it could become more complex /
> > dangerous if this veered toward memory management.
>
> Those are valid concerns, and you convinced me the RTE_ETH_DEV_UNUSED cannot be set after device close.
>
> I still think the ethdev layer missing protection against driver calls (other than detach) following a device close. The API not allows, but the ethdev should enforce it.
> Considering some driver memset to 0 their control structs after device close, with no protection such bad calls might lead to segfault, which is not what we want even under bad behavior.
>
> One could suggest to protect it inside the driver by replacing the vtable of the driver ops, however it will introduce a lot of duplicated code which can easily be avoided if ethdev would not pass such
> Calls after device close.
>
> So, how about adding another state which will indicate the device is closed, cannot be used, yet still attached?
>
It's feasible. It might be useful. Adding finer device states seems
logical to me, I made those explicit in the fail-safe.
However, in the fail-safe I had the need to make automatic the handling
of devices, meaning that these device states had to be properly
abstracted and parseable.
Adding finer device states in ethdev:
Pros:
- Easier to follow API and to spot usage mistakes.
- Increasing the rigidity of the API might make apparent mistakes from
PMD implementations. Helps PMD maintainers.
Cons:
- Some of those "mistakes" might be due to hardware-specific
limitations that cannot be bypassed. I'm thinking about the
flow-isolation implementation in MLX4 that had specific requirements
about when it could be enabled, that would actually be forbidden by
this proposal as it had to be done before rte_eth_dev_configure.
- Handling special cases could quickly become a mess of edge-cases
with workarounds here and there. Making the API evolve would imply
untangling this mess at times.
I'm considering there a full-blown device state implementation. I know
you are only proposing adding an intermediate device state allowing to
protect the user from using a closed device.
But this proposal should be examined on the direction it would lead us
into. It might be simpler for example to introduce a flag "dev_closed"
in rte_eth_dev_data (along dev_started) for example, and checking only
on this in the relevant functions.
I don't have a strong opinion about the finer-grained device states.
I only wanted to lay out what I saw as possible longer-term effects of
choosing this solution and a possible alternative.
--
Gaëtan Rivet
6WIND
More information about the dev
mailing list