[dpdk-dev] [PATCH v6 4/6] ethdev: adjust APIs removal error report

Matan Azrad matan at mellanox.com
Sat Jan 20 20:04:21 CET 2018


Hi all

From: Thomas Monjalon, Friday, January 19, 2018 8:17 PM
> 19/01/2018 19:13, Ferruh Yigit:
> > On 1/19/2018 5:54 PM, Thomas Monjalon wrote:
> > > 19/01/2018 17:19, Ferruh Yigit:
> > >> On 1/18/2018 6:10 PM, Matan Azrad wrote:
> > >>> From: Ferruh Yigit, Thursday, January 18, 2018 7:31 PM
> > >>>> This patch updates *all* ethdev public APIs to add if device is
> > >>>> removed check?
> > >>>
> > >>> Yes.
> > >>>
> > >>>> And each check goes to ethdev is_removed() dev_ops to ask if dev
> > >>>> is removed.
> > >>> Probably, if the REMOVED state setted in will not call device
> is_remove.
> > >>>
> > >>>> These must be better way of doing this, am I missing something.
> > >>>
> > >>> Suggest.
> > >>
> > >> With a silly analogy, this is like a blind person asking each time
> > >> if he is dead before talking to other person.

Just to accurate the analogy:)
This is like a blind person(application,ethdev) using its guide dog(ethdev device), every time the dog refuses to take action (error occurred), the blind person asks if the dog can be a guide dog anymore(removal error). 

> > >> At first glance I can think of a kind of watchdog timer can be
> > >> implemented in ethdev layer. It provides periodic checks and if
> > >> device is dead it calls the registered user callback function.
> > >>
> > >> This method presented as synchronous method but not triggered from
> > >> side where event happens, I mean not triggered from PMD but from
> application.
> > >> So does application doing polling continuously if device is dead?
> > >> Or if application is relying this patch to add a check in each API,
> > >> what happens if device removed during data processing, will app rely on
> asynchronous method?
> > >
> > > We cannot put a mutex on hardware removal :) So we have to live with
> > > errors du to removal.
> > > If we are trying to configure a removed device, the error will be
> > > not related to the root cause.
> > > This patch is just trying to improve the situation by returning an
> > > appropriate error code if removal can be detected.
> > > Note: the check is run only if there is an error and if the removal
> > > is not already detected.
> > >
> > > If I understand well, you prefer relying only on asynchronous
> > > hotplug events? Even if they come really late?
> >
> > I think asynchronous hotplug events are better approach, but if they
> > are late I understand you need a solution.
> >
> > I assume issue is when device is removed, application can make API
> > calls until it detects the removal.
> >
> > For that case what do you think instead of ethdev abstraction layer
> > doing a translation, define a specific error type and return it from
> > PMDs to indicate that error is related to the missing device and
> > application will know device is no more there? And consistently use that
> error type in all PMDs.
> 
> Yes it is also a good solution.
> I think Matan proposed to integrate it in ethdev to avoid duplicating the
> same mechanism in every PMDs.
> 

Yes, as a lot of ethdev API code pieces do.

> > Or what do you think above suggested flexible watchdog timer
> > implementation in ethdev, so applications may be informed about missing
> device faster.
> 
> I think the watchdog would compete with hotplug events.
> So I prefer not integrating one more asynchronous mechanism for the same
> purpose.
> If we want polling, it can be an option to enable in EAL hotplug.

Konstantin wrote in another thread:
>+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
>+
>+	dev = &rte_eth_devices[port_id];
>+
>+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->is_removed, 0);

> I'd says these 2 checks have to be swapped.

Konstantin, Please explain why.



More information about the dev mailing list