[dpdk-dev,v3,1/7] ethdev: fix port data reset timing

Message ID 1516293317-30748-2-git-send-email-matan@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail apply patch file failure

Commit Message

Matan Azrad Jan. 18, 2018, 4:35 p.m. UTC
  rte_eth_dev_data structure is allocated per ethdev port and can be
used to get a data of the port internally.

rte_eth_dev_attach_secondary tries to find the port identifier using
rte_eth_dev_data name field comparison and may get an identifier of
invalid port in case of this port was released by the primary process
because the port release API doesn't reset the port data.

So, it will be better to reset the port data in release time instead of
allocation time.

Move the port data reset to the port release API.

Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 lib/librte_ether/rte_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Thomas Monjalon Jan. 18, 2018, 5 p.m. UTC | #1
18/01/2018 17:35, Matan Azrad:
> rte_eth_dev_data structure is allocated per ethdev port and can be
> used to get a data of the port internally.
> 
> rte_eth_dev_attach_secondary tries to find the port identifier using
> rte_eth_dev_data name field comparison and may get an identifier of
> invalid port in case of this port was released by the primary process
> because the port release API doesn't reset the port data.
> 
> So, it will be better to reset the port data in release time instead of
> allocation time.
> 
> Move the port data reset to the port release API.
> 
> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>

Acked-by: Thomas Monjalon <thomas@monjalon.net>
  
Ananyev, Konstantin Jan. 19, 2018, 12:38 p.m. UTC | #2
> -----Original Message-----
> From: Matan Azrad [mailto:matan@mellanox.com]
> Sent: Thursday, January 18, 2018 4:35 PM
> To: Thomas Monjalon <thomas@monjalon.net>; Gaetan Rivet <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Neil Horman <nhorman@tuxdriver.com>; Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; stable@dpdk.org
> Subject: [PATCH v3 1/7] ethdev: fix port data reset timing
> 
> rte_eth_dev_data structure is allocated per ethdev port and can be
> used to get a data of the port internally.
> 
> rte_eth_dev_attach_secondary tries to find the port identifier using
> rte_eth_dev_data name field comparison and may get an identifier of
> invalid port in case of this port was released by the primary process
> because the port release API doesn't reset the port data.
> 
> So, it will be better to reset the port data in release time instead of
> allocation time.
> 
> Move the port data reset to the port release API.
> 
> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  lib/librte_ether/rte_ethdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 7044159..156231c 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -204,7 +204,6 @@ struct rte_eth_dev *
>  		return NULL;
>  	}
> 
> -	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
>  	eth_dev = eth_dev_get(port_id);
>  	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
>  	eth_dev->data->port_id = port_id;
> @@ -252,6 +251,7 @@ struct rte_eth_dev *
>  	if (eth_dev == NULL)
>  		return -EINVAL;
> 
> +	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
>  	eth_dev->state = RTE_ETH_DEV_UNUSED;
> 
>  	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_DESTROY, NULL);
> --
> 1.8.3.1

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
  
Ferruh Yigit March 5, 2018, 11:24 a.m. UTC | #3
On 1/18/2018 4:35 PM, Matan Azrad wrote:
> rte_eth_dev_data structure is allocated per ethdev port and can be
> used to get a data of the port internally.
> 
> rte_eth_dev_attach_secondary tries to find the port identifier using
> rte_eth_dev_data name field comparison and may get an identifier of
> invalid port in case of this port was released by the primary process
> because the port release API doesn't reset the port data.
> 
> So, it will be better to reset the port data in release time instead of
> allocation time.
> 
> Move the port data reset to the port release API.
> 
> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process model")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  lib/librte_ether/rte_ethdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 7044159..156231c 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -204,7 +204,6 @@ struct rte_eth_dev *
>  		return NULL;
>  	}
>  
> -	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
>  	eth_dev = eth_dev_get(port_id);
>  	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
>  	eth_dev->data->port_id = port_id;
> @@ -252,6 +251,7 @@ struct rte_eth_dev *
>  	if (eth_dev == NULL)
>  		return -EINVAL;
>  
> +	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));

Hi Matan,

What most of the vdev release path does is:

eth_dev = rte_eth_dev_allocated(...)
rte_free(eth_dev->data->dev_private);
rte_free(eth_dev->data);
rte_eth_dev_release_port(eth_dev);

Since eth_dev->data freed, memset() it in rte_eth_dev_release_port() will be
problem.

We don't run remove path that is why we didn't hit the issue but this seems
problem for all virtual PMDs.
Also rte_eth_dev_pci_release() looks problematic now.

Can you please check the issue?
  
Matan Azrad March 5, 2018, 2:52 p.m. UTC | #4
HI

From: Ferruh Yigit, Sent: Monday, March 5, 2018 1:24 PM

> On 1/18/2018 4:35 PM, Matan Azrad wrote:

> > rte_eth_dev_data structure is allocated per ethdev port and can be

> > used to get a data of the port internally.

> >

> > rte_eth_dev_attach_secondary tries to find the port identifier using

> > rte_eth_dev_data name field comparison and may get an identifier of

> > invalid port in case of this port was released by the primary process

> > because the port release API doesn't reset the port data.

> >

> > So, it will be better to reset the port data in release time instead

> > of allocation time.

> >

> > Move the port data reset to the port release API.

> >

> > Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple

> > process model")

> > Cc: stable@dpdk.org

> >

> > Signed-off-by: Matan Azrad <matan@mellanox.com>

> > ---

> >  lib/librte_ether/rte_ethdev.c | 2 +-

> >  1 file changed, 1 insertion(+), 1 deletion(-)

> >

> > diff --git a/lib/librte_ether/rte_ethdev.c

> > b/lib/librte_ether/rte_ethdev.c index 7044159..156231c 100644

> > --- a/lib/librte_ether/rte_ethdev.c

> > +++ b/lib/librte_ether/rte_ethdev.c

> > @@ -204,7 +204,6 @@ struct rte_eth_dev *

> >  		return NULL;

> >  	}

> >

> > -	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct

> rte_eth_dev_data));

> >  	eth_dev = eth_dev_get(port_id);

> >  	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),

> "%s", name);

> >  	eth_dev->data->port_id = port_id;

> > @@ -252,6 +251,7 @@ struct rte_eth_dev *

> >  	if (eth_dev == NULL)

> >  		return -EINVAL;

> >

> > +	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));

> 

> Hi Matan,

> 

> What most of the vdev release path does is:

> 

> eth_dev = rte_eth_dev_allocated(...)

> rte_free(eth_dev->data->dev_private);

> rte_free(eth_dev->data);

> rte_eth_dev_release_port(eth_dev);

> 

> Since eth_dev->data freed, memset() it in rte_eth_dev_release_port() will

> be problem.

> 

> We don't run remove path that is why we didn't hit the issue but this seems

> problem for all virtual PMDs.


Yes, it is a problem and should be fixed:
For vdevs which use private rte_eth_dev_data the remove order can be:
	private_data = eth_dev->data;
	rte_free(eth_dev->data->dev_private);
	rte_eth_dev_release_port(eth_dev); /* The last operation working on ethdev structure. */
	rte_free(private_data);


> Also rte_eth_dev_pci_release() looks problematic now.


Yes, again, the last operation working on ethdev structure should be rte_eth_dev_release_port().

So need to fix all vdevs and the rte_eth_dev_pci_release() function.

Any comments?
  
Ferruh Yigit March 5, 2018, 3:06 p.m. UTC | #5
On 3/5/2018 2:52 PM, Matan Azrad wrote:
> HI
> 
> From: Ferruh Yigit, Sent: Monday, March 5, 2018 1:24 PM
>> On 1/18/2018 4:35 PM, Matan Azrad wrote:
>>> rte_eth_dev_data structure is allocated per ethdev port and can be
>>> used to get a data of the port internally.
>>>
>>> rte_eth_dev_attach_secondary tries to find the port identifier using
>>> rte_eth_dev_data name field comparison and may get an identifier of
>>> invalid port in case of this port was released by the primary process
>>> because the port release API doesn't reset the port data.
>>>
>>> So, it will be better to reset the port data in release time instead
>>> of allocation time.
>>>
>>> Move the port data reset to the port release API.
>>>
>>> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple
>>> process model")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>>> ---
>>>  lib/librte_ether/rte_ethdev.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_ether/rte_ethdev.c
>>> b/lib/librte_ether/rte_ethdev.c index 7044159..156231c 100644
>>> --- a/lib/librte_ether/rte_ethdev.c
>>> +++ b/lib/librte_ether/rte_ethdev.c
>>> @@ -204,7 +204,6 @@ struct rte_eth_dev *
>>>  		return NULL;
>>>  	}
>>>
>>> -	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct
>> rte_eth_dev_data));
>>>  	eth_dev = eth_dev_get(port_id);
>>>  	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),
>> "%s", name);
>>>  	eth_dev->data->port_id = port_id;
>>> @@ -252,6 +251,7 @@ struct rte_eth_dev *
>>>  	if (eth_dev == NULL)
>>>  		return -EINVAL;
>>>
>>> +	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
>>
>> Hi Matan,
>>
>> What most of the vdev release path does is:
>>
>> eth_dev = rte_eth_dev_allocated(...)
>> rte_free(eth_dev->data->dev_private);
>> rte_free(eth_dev->data);
>> rte_eth_dev_release_port(eth_dev);
>>
>> Since eth_dev->data freed, memset() it in rte_eth_dev_release_port() will
>> be problem.
>>
>> We don't run remove path that is why we didn't hit the issue but this seems
>> problem for all virtual PMDs.
> 
> Yes, it is a problem and should be fixed:
> For vdevs which use private rte_eth_dev_data the remove order can be:
> 	private_data = eth_dev->data;
> 	rte_free(eth_dev->data->dev_private);
> 	rte_eth_dev_release_port(eth_dev); /* The last operation working on ethdev structure. */
> 	rte_free(private_data);

Do we need to save "private_data"?

> 
> 
>> Also rte_eth_dev_pci_release() looks problematic now.
> 
> Yes, again, the last operation working on ethdev structure should be rte_eth_dev_release_port().
> 
> So need to fix all vdevs and the rte_eth_dev_pci_release() function.
> 
> Any comments?
>
  
Matan Azrad March 5, 2018, 3:12 p.m. UTC | #6
Hi Ferruh

From: Ferruh Yigit, Sent: Monday, March 5, 2018 5:07 PM

> On 3/5/2018 2:52 PM, Matan Azrad wrote:

> > HI

> >

> > From: Ferruh Yigit, Sent: Monday, March 5, 2018 1:24 PM

> >> On 1/18/2018 4:35 PM, Matan Azrad wrote:

> >>> rte_eth_dev_data structure is allocated per ethdev port and can be

> >>> used to get a data of the port internally.

> >>>

> >>> rte_eth_dev_attach_secondary tries to find the port identifier using

> >>> rte_eth_dev_data name field comparison and may get an identifier of

> >>> invalid port in case of this port was released by the primary

> >>> process because the port release API doesn't reset the port data.

> >>>

> >>> So, it will be better to reset the port data in release time instead

> >>> of allocation time.

> >>>

> >>> Move the port data reset to the port release API.

> >>>

> >>> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple

> >>> process model")

> >>> Cc: stable@dpdk.org

> >>>

> >>> Signed-off-by: Matan Azrad <matan@mellanox.com>

> >>> ---

> >>>  lib/librte_ether/rte_ethdev.c | 2 +-

> >>>  1 file changed, 1 insertion(+), 1 deletion(-)

> >>>

> >>> diff --git a/lib/librte_ether/rte_ethdev.c

> >>> b/lib/librte_ether/rte_ethdev.c index 7044159..156231c 100644

> >>> --- a/lib/librte_ether/rte_ethdev.c

> >>> +++ b/lib/librte_ether/rte_ethdev.c

> >>> @@ -204,7 +204,6 @@ struct rte_eth_dev *

> >>>  		return NULL;

> >>>  	}

> >>>

> >>> -	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct

> >> rte_eth_dev_data));

> >>>  	eth_dev = eth_dev_get(port_id);

> >>>  	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),

> >> "%s", name);

> >>>  	eth_dev->data->port_id = port_id;

> >>> @@ -252,6 +251,7 @@ struct rte_eth_dev *

> >>>  	if (eth_dev == NULL)

> >>>  		return -EINVAL;

> >>>

> >>> +	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));

> >>

> >> Hi Matan,

> >>

> >> What most of the vdev release path does is:

> >>

> >> eth_dev = rte_eth_dev_allocated(...)

> >> rte_free(eth_dev->data->dev_private);

> >> rte_free(eth_dev->data);

> >> rte_eth_dev_release_port(eth_dev);

> >>

> >> Since eth_dev->data freed, memset() it in rte_eth_dev_release_port()

> >> will be problem.

> >>

> >> We don't run remove path that is why we didn't hit the issue but this

> >> seems problem for all virtual PMDs.

> >

> > Yes, it is a problem and should be fixed:

> > For vdevs which use private rte_eth_dev_data the remove order can be:

> > 	private_data = eth_dev->data;

> > 	rte_free(eth_dev->data->dev_private);

> > 	rte_eth_dev_release_port(eth_dev); /* The last operation working

> on ethdev structure. */

> > 	rte_free(private_data);

> 

> Do we need to save "private_data"?


Just to emphasis that eth_dev structure should not more be available after rte_eth_dev_release_port().
Maybe in the future rte_eth_dev_release_port() will zero eth_dev structure too :)

> >

> >

> >> Also rte_eth_dev_pci_release() looks problematic now.

> >

> > Yes, again, the last operation working on ethdev structure should be

> rte_eth_dev_release_port().

> >

> > So need to fix all vdevs and the rte_eth_dev_pci_release() function.

> >

> > Any comments?

> >
  
Ferruh Yigit March 27, 2018, 10:37 p.m. UTC | #7
On 3/5/2018 3:12 PM, Matan Azrad wrote:
> Hi Ferruh
> 
> From: Ferruh Yigit, Sent: Monday, March 5, 2018 5:07 PM
>> On 3/5/2018 2:52 PM, Matan Azrad wrote:
>>> HI
>>>
>>> From: Ferruh Yigit, Sent: Monday, March 5, 2018 1:24 PM
>>>> On 1/18/2018 4:35 PM, Matan Azrad wrote:
>>>>> rte_eth_dev_data structure is allocated per ethdev port and can be
>>>>> used to get a data of the port internally.
>>>>>
>>>>> rte_eth_dev_attach_secondary tries to find the port identifier using
>>>>> rte_eth_dev_data name field comparison and may get an identifier of
>>>>> invalid port in case of this port was released by the primary
>>>>> process because the port release API doesn't reset the port data.
>>>>>
>>>>> So, it will be better to reset the port data in release time instead
>>>>> of allocation time.
>>>>>
>>>>> Move the port data reset to the port release API.
>>>>>
>>>>> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple
>>>>> process model")
>>>>> Cc: stable@dpdk.org
>>>>>
>>>>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>>>>> ---
>>>>>  lib/librte_ether/rte_ethdev.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/lib/librte_ether/rte_ethdev.c
>>>>> b/lib/librte_ether/rte_ethdev.c index 7044159..156231c 100644
>>>>> --- a/lib/librte_ether/rte_ethdev.c
>>>>> +++ b/lib/librte_ether/rte_ethdev.c
>>>>> @@ -204,7 +204,6 @@ struct rte_eth_dev *
>>>>>  		return NULL;
>>>>>  	}
>>>>>
>>>>> -	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct
>>>> rte_eth_dev_data));
>>>>>  	eth_dev = eth_dev_get(port_id);
>>>>>  	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),
>>>> "%s", name);
>>>>>  	eth_dev->data->port_id = port_id;
>>>>> @@ -252,6 +251,7 @@ struct rte_eth_dev *
>>>>>  	if (eth_dev == NULL)
>>>>>  		return -EINVAL;
>>>>>
>>>>> +	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
>>>>
>>>> Hi Matan,
>>>>
>>>> What most of the vdev release path does is:
>>>>
>>>> eth_dev = rte_eth_dev_allocated(...)
>>>> rte_free(eth_dev->data->dev_private);
>>>> rte_free(eth_dev->data);
>>>> rte_eth_dev_release_port(eth_dev);
>>>>
>>>> Since eth_dev->data freed, memset() it in rte_eth_dev_release_port()
>>>> will be problem.
>>>>
>>>> We don't run remove path that is why we didn't hit the issue but this
>>>> seems problem for all virtual PMDs.
>>>
>>> Yes, it is a problem and should be fixed:
>>> For vdevs which use private rte_eth_dev_data the remove order can be:
>>> 	private_data = eth_dev->data;
>>> 	rte_free(eth_dev->data->dev_private);
>>> 	rte_eth_dev_release_port(eth_dev); /* The last operation working
>> on ethdev structure. */
>>> 	rte_free(private_data);
>>
>> Do we need to save "private_data"?
> 
> Just to emphasis that eth_dev structure should not more be available after rte_eth_dev_release_port().
> Maybe in the future rte_eth_dev_release_port() will zero eth_dev structure too :)

Hi Matan,

Reminder of this issue, it would be nice to fix in this release.

> 
>>>
>>>
>>>> Also rte_eth_dev_pci_release() looks problematic now.
>>>
>>> Yes, again, the last operation working on ethdev structure should be
>> rte_eth_dev_release_port().
>>>
>>> So need to fix all vdevs and the rte_eth_dev_pci_release() function.
>>>
>>> Any comments?
>>>
>
  
Matan Azrad March 28, 2018, 12:07 p.m. UTC | #8
Hi Ferruh

> From: Ferruh Yigit, Wednesday, March 28, 2018 1:38 AM

> On 3/5/2018 3:12 PM, Matan Azrad wrote:

> > Hi Ferruh

> >

> > From: Ferruh Yigit, Sent: Monday, March 5, 2018 5:07 PM

> >> On 3/5/2018 2:52 PM, Matan Azrad wrote:

> >>> HI

> >>>

> >>> From: Ferruh Yigit, Sent: Monday, March 5, 2018 1:24 PM

> >>>> On 1/18/2018 4:35 PM, Matan Azrad wrote:

> >>>>> rte_eth_dev_data structure is allocated per ethdev port and can be

> >>>>> used to get a data of the port internally.

> >>>>>

> >>>>> rte_eth_dev_attach_secondary tries to find the port identifier

> >>>>> using rte_eth_dev_data name field comparison and may get an

> >>>>> identifier of invalid port in case of this port was released by

> >>>>> the primary process because the port release API doesn't reset the

> port data.

> >>>>>

> >>>>> So, it will be better to reset the port data in release time

> >>>>> instead of allocation time.

> >>>>>

> >>>>> Move the port data reset to the port release API.

> >>>>>

> >>>>> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple

> >>>>> process model")

> >>>>> Cc: stable@dpdk.org

> >>>>>

> >>>>> Signed-off-by: Matan Azrad <matan@mellanox.com>

> >>>>> ---

> >>>>>  lib/librte_ether/rte_ethdev.c | 2 +-

> >>>>>  1 file changed, 1 insertion(+), 1 deletion(-)

> >>>>>

> >>>>> diff --git a/lib/librte_ether/rte_ethdev.c

> >>>>> b/lib/librte_ether/rte_ethdev.c index 7044159..156231c 100644

> >>>>> --- a/lib/librte_ether/rte_ethdev.c

> >>>>> +++ b/lib/librte_ether/rte_ethdev.c

> >>>>> @@ -204,7 +204,6 @@ struct rte_eth_dev *

> >>>>>  		return NULL;

> >>>>>  	}

> >>>>>

> >>>>> -	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct

> >>>> rte_eth_dev_data));

> >>>>>  	eth_dev = eth_dev_get(port_id);

> >>>>>  	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),

> >>>> "%s", name);

> >>>>>  	eth_dev->data->port_id = port_id; @@ -252,6 +251,7 @@ struct

> >>>>> rte_eth_dev *

> >>>>>  	if (eth_dev == NULL)

> >>>>>  		return -EINVAL;

> >>>>>

> >>>>> +	memset(eth_dev->data, 0, sizeof(struct

> rte_eth_dev_data));

> >>>>

> >>>> Hi Matan,

> >>>>

> >>>> What most of the vdev release path does is:

> >>>>

> >>>> eth_dev = rte_eth_dev_allocated(...)

> >>>> rte_free(eth_dev->data->dev_private);

> >>>> rte_free(eth_dev->data);

> >>>> rte_eth_dev_release_port(eth_dev);

> >>>>

> >>>> Since eth_dev->data freed, memset() it in

> >>>> rte_eth_dev_release_port() will be problem.

> >>>>

> >>>> We don't run remove path that is why we didn't hit the issue but

> >>>> this seems problem for all virtual PMDs.

> >>>

> >>> Yes, it is a problem and should be fixed:

> >>> For vdevs which use private rte_eth_dev_data the remove order can

> be:

> >>> 	private_data = eth_dev->data;

> >>> 	rte_free(eth_dev->data->dev_private);

> >>> 	rte_eth_dev_release_port(eth_dev); /* The last operation working

> >> on ethdev structure. */

> >>> 	rte_free(private_data);

> >>

> >> Do we need to save "private_data"?

> >

> > Just to emphasis that eth_dev structure should not more be available after

> rte_eth_dev_release_port().

> > Maybe in the future rte_eth_dev_release_port() will zero eth_dev

> > structure too :)

> 

> Hi Matan,

> 

> Reminder of this issue, it would be nice to fix in this release.

> 


Regarding the private rte_eth_dev_data, it should be fixed in the next thread:
https://dpdk.org/dev/patchwork/patch/35632/

Regarding the rte_eth_dev_pci_release() function: I'm going to send a fix.

> >

> >>>

> >>>

> >>>> Also rte_eth_dev_pci_release() looks problematic now.

> >>>

> >>> Yes, again, the last operation working on ethdev structure should be

> >> rte_eth_dev_release_port().

> >>>

> >>> So need to fix all vdevs and the rte_eth_dev_pci_release() function.

> >>>

> >>> Any comments?

> >>>

> >
  
Ferruh Yigit March 30, 2018, 10:39 a.m. UTC | #9
On 3/28/2018 1:07 PM, Matan Azrad wrote:
> Hi Ferruh
> 
>> From: Ferruh Yigit, Wednesday, March 28, 2018 1:38 AM
>> On 3/5/2018 3:12 PM, Matan Azrad wrote:
>>> Hi Ferruh
>>>
>>> From: Ferruh Yigit, Sent: Monday, March 5, 2018 5:07 PM
>>>> On 3/5/2018 2:52 PM, Matan Azrad wrote:
>>>>> HI
>>>>>
>>>>> From: Ferruh Yigit, Sent: Monday, March 5, 2018 1:24 PM
>>>>>> On 1/18/2018 4:35 PM, Matan Azrad wrote:
>>>>>>> rte_eth_dev_data structure is allocated per ethdev port and can be
>>>>>>> used to get a data of the port internally.
>>>>>>>
>>>>>>> rte_eth_dev_attach_secondary tries to find the port identifier
>>>>>>> using rte_eth_dev_data name field comparison and may get an
>>>>>>> identifier of invalid port in case of this port was released by
>>>>>>> the primary process because the port release API doesn't reset the
>> port data.
>>>>>>>
>>>>>>> So, it will be better to reset the port data in release time
>>>>>>> instead of allocation time.
>>>>>>>
>>>>>>> Move the port data reset to the port release API.
>>>>>>>
>>>>>>> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple
>>>>>>> process model")
>>>>>>> Cc: stable@dpdk.org
>>>>>>>
>>>>>>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>>>>>>> ---
>>>>>>>  lib/librte_ether/rte_ethdev.c | 2 +-
>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/lib/librte_ether/rte_ethdev.c
>>>>>>> b/lib/librte_ether/rte_ethdev.c index 7044159..156231c 100644
>>>>>>> --- a/lib/librte_ether/rte_ethdev.c
>>>>>>> +++ b/lib/librte_ether/rte_ethdev.c
>>>>>>> @@ -204,7 +204,6 @@ struct rte_eth_dev *
>>>>>>>  		return NULL;
>>>>>>>  	}
>>>>>>>
>>>>>>> -	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct
>>>>>> rte_eth_dev_data));
>>>>>>>  	eth_dev = eth_dev_get(port_id);
>>>>>>>  	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),
>>>>>> "%s", name);
>>>>>>>  	eth_dev->data->port_id = port_id; @@ -252,6 +251,7 @@ struct
>>>>>>> rte_eth_dev *
>>>>>>>  	if (eth_dev == NULL)
>>>>>>>  		return -EINVAL;
>>>>>>>
>>>>>>> +	memset(eth_dev->data, 0, sizeof(struct
>> rte_eth_dev_data));
>>>>>>
>>>>>> Hi Matan,
>>>>>>
>>>>>> What most of the vdev release path does is:
>>>>>>
>>>>>> eth_dev = rte_eth_dev_allocated(...)
>>>>>> rte_free(eth_dev->data->dev_private);
>>>>>> rte_free(eth_dev->data);
>>>>>> rte_eth_dev_release_port(eth_dev);
>>>>>>
>>>>>> Since eth_dev->data freed, memset() it in
>>>>>> rte_eth_dev_release_port() will be problem.
>>>>>>
>>>>>> We don't run remove path that is why we didn't hit the issue but
>>>>>> this seems problem for all virtual PMDs.
>>>>>
>>>>> Yes, it is a problem and should be fixed:
>>>>> For vdevs which use private rte_eth_dev_data the remove order can
>> be:
>>>>> 	private_data = eth_dev->data;
>>>>> 	rte_free(eth_dev->data->dev_private);
>>>>> 	rte_eth_dev_release_port(eth_dev); /* The last operation working
>>>> on ethdev structure. */
>>>>> 	rte_free(private_data);
>>>>
>>>> Do we need to save "private_data"?
>>>
>>> Just to emphasis that eth_dev structure should not more be available after
>> rte_eth_dev_release_port().
>>> Maybe in the future rte_eth_dev_release_port() will zero eth_dev
>>> structure too :)
>>
>> Hi Matan,
>>
>> Reminder of this issue, it would be nice to fix in this release.
>>
> 
> Regarding the private rte_eth_dev_data, it should be fixed in the next thread:
> https://dpdk.org/dev/patchwork/patch/35632/
> 
> Regarding the rte_eth_dev_pci_release() function: I'm going to send a fix.

Thanks Matan for the patch,

But rte_eth_dev_release_port() is still broken because of this change, please
check _rte_eth_dev_callback_process() which uses dev->data->port_id.

> 
>>>
>>>>>
>>>>>
>>>>>> Also rte_eth_dev_pci_release() looks problematic now.
>>>>>
>>>>> Yes, again, the last operation working on ethdev structure should be
>>>> rte_eth_dev_release_port().
>>>>>
>>>>> So need to fix all vdevs and the rte_eth_dev_pci_release() function.
>>>>>
>>>>> Any comments?
>>>>>
>>>
>
  
Ferruh Yigit April 19, 2018, 11:07 a.m. UTC | #10
On 3/30/2018 11:39 AM, Ferruh Yigit wrote:
> On 3/28/2018 1:07 PM, Matan Azrad wrote:
>> Hi Ferruh
>>
>>> From: Ferruh Yigit, Wednesday, March 28, 2018 1:38 AM
>>> On 3/5/2018 3:12 PM, Matan Azrad wrote:
>>>> Hi Ferruh
>>>>
>>>> From: Ferruh Yigit, Sent: Monday, March 5, 2018 5:07 PM
>>>>> On 3/5/2018 2:52 PM, Matan Azrad wrote:
>>>>>> HI
>>>>>>
>>>>>> From: Ferruh Yigit, Sent: Monday, March 5, 2018 1:24 PM
>>>>>>> On 1/18/2018 4:35 PM, Matan Azrad wrote:
>>>>>>>> rte_eth_dev_data structure is allocated per ethdev port and can be
>>>>>>>> used to get a data of the port internally.
>>>>>>>>
>>>>>>>> rte_eth_dev_attach_secondary tries to find the port identifier
>>>>>>>> using rte_eth_dev_data name field comparison and may get an
>>>>>>>> identifier of invalid port in case of this port was released by
>>>>>>>> the primary process because the port release API doesn't reset the
>>> port data.
>>>>>>>>
>>>>>>>> So, it will be better to reset the port data in release time
>>>>>>>> instead of allocation time.
>>>>>>>>
>>>>>>>> Move the port data reset to the port release API.
>>>>>>>>
>>>>>>>> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple
>>>>>>>> process model")
>>>>>>>> Cc: stable@dpdk.org
>>>>>>>>
>>>>>>>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>>>>>>>> ---
>>>>>>>>  lib/librte_ether/rte_ethdev.c | 2 +-
>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/lib/librte_ether/rte_ethdev.c
>>>>>>>> b/lib/librte_ether/rte_ethdev.c index 7044159..156231c 100644
>>>>>>>> --- a/lib/librte_ether/rte_ethdev.c
>>>>>>>> +++ b/lib/librte_ether/rte_ethdev.c
>>>>>>>> @@ -204,7 +204,6 @@ struct rte_eth_dev *
>>>>>>>>  		return NULL;
>>>>>>>>  	}
>>>>>>>>
>>>>>>>> -	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct
>>>>>>> rte_eth_dev_data));
>>>>>>>>  	eth_dev = eth_dev_get(port_id);
>>>>>>>>  	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),
>>>>>>> "%s", name);
>>>>>>>>  	eth_dev->data->port_id = port_id; @@ -252,6 +251,7 @@ struct
>>>>>>>> rte_eth_dev *
>>>>>>>>  	if (eth_dev == NULL)
>>>>>>>>  		return -EINVAL;
>>>>>>>>
>>>>>>>> +	memset(eth_dev->data, 0, sizeof(struct
>>> rte_eth_dev_data));
>>>>>>>
>>>>>>> Hi Matan,
>>>>>>>
>>>>>>> What most of the vdev release path does is:
>>>>>>>
>>>>>>> eth_dev = rte_eth_dev_allocated(...)
>>>>>>> rte_free(eth_dev->data->dev_private);
>>>>>>> rte_free(eth_dev->data);
>>>>>>> rte_eth_dev_release_port(eth_dev);
>>>>>>>
>>>>>>> Since eth_dev->data freed, memset() it in
>>>>>>> rte_eth_dev_release_port() will be problem.
>>>>>>>
>>>>>>> We don't run remove path that is why we didn't hit the issue but
>>>>>>> this seems problem for all virtual PMDs.
>>>>>>
>>>>>> Yes, it is a problem and should be fixed:
>>>>>> For vdevs which use private rte_eth_dev_data the remove order can
>>> be:
>>>>>> 	private_data = eth_dev->data;
>>>>>> 	rte_free(eth_dev->data->dev_private);
>>>>>> 	rte_eth_dev_release_port(eth_dev); /* The last operation working
>>>>> on ethdev structure. */
>>>>>> 	rte_free(private_data);
>>>>>
>>>>> Do we need to save "private_data"?
>>>>
>>>> Just to emphasis that eth_dev structure should not more be available after
>>> rte_eth_dev_release_port().
>>>> Maybe in the future rte_eth_dev_release_port() will zero eth_dev
>>>> structure too :)
>>>
>>> Hi Matan,
>>>
>>> Reminder of this issue, it would be nice to fix in this release.
>>>
>>
>> Regarding the private rte_eth_dev_data, it should be fixed in the next thread:
>> https://dpdk.org/dev/patchwork/patch/35632/
>>
>> Regarding the rte_eth_dev_pci_release() function: I'm going to send a fix.
> 
> Thanks Matan for the patch,
> 
> But rte_eth_dev_release_port() is still broken because of this change, please
> check _rte_eth_dev_callback_process() which uses dev->data->port_id.

Hi Matan,

Any update on this?
As mentioned above rte_eth_dev_release_port() is still broken.

Thanks,
ferruh

> 
>>
>>>>
>>>>>>
>>>>>>
>>>>>>> Also rte_eth_dev_pci_release() looks problematic now.
>>>>>>
>>>>>> Yes, again, the last operation working on ethdev structure should be
>>>>> rte_eth_dev_release_port().
>>>>>>
>>>>>> So need to fix all vdevs and the rte_eth_dev_pci_release() function.
>>>>>>
>>>>>> Any comments?
>>>>>>
>>>>
>>
>
  
Matan Azrad April 25, 2018, 12:16 p.m. UTC | #11
Hi all

From: Ferruh Yigit, Thursday, April 19, 2018 2:08 PM

> > But rte_eth_dev_release_port() is still broken because of this change,

> > please check _rte_eth_dev_callback_process() which uses dev->data-

> >port_id.


The issue is that a DESTROY callback gets port_id=0 all the time, regardless the destroyed port id.

Let's discuss about the fix:

There are 2 options for the DESTROY event meaning:

1. The device is going to be destroyed in the future (a bit after the callbacks calling).
	The user may think that there is a valid data in the device structure in the callback time,
	Thus, he may use it.
	The fix here is to move the callback to the start of the function,
	In this time the data field is still valid.

2. The device was already destroyed in the past (a bit before the callbacks calling).
	The user should think that there is no any valid data in the device structure in the callback time,
	Thus, he doesn't use it.
	The issue here:
	_rte_eth_dev_callback_process() assumes there is a valid data in the data field  all the time,
	But in this case the data field is not valid because the device was already destroyed.
	Optional fixes:
	1. Always keep the data->port_id valid.
	2. keep the data->port_id valid only for the _rte_eth_dev_callback_process() call.
	2. Change _rte_eth_dev_callback_process() arg from "struct rte_eth_dev *dev" to "uint16_t port_id"
		a. Need to change all the calls for this internal API.

I vote to 2.1.


What do you think?

Matan.
  
Ori Kam April 25, 2018, 12:30 p.m. UTC | #12
Hi

I vote for 2.3.

Ori

> -----Original Message-----

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Matan Azrad

> Sent: Wednesday, April 25, 2018 3:16 PM

> To: Ferruh Yigit <ferruh.yigit@intel.com>; Thomas Monjalon

> <thomas@monjalon.net>; Gaetan Rivet <gaetan.rivet@6wind.com>; Jingjing

> Wu <jingjing.wu@intel.com>

> Cc: dev@dpdk.org; Neil Horman <nhorman@tuxdriver.com>; Bruce

> Richardson <bruce.richardson@intel.com>; Konstantin Ananyev

> <konstantin.ananyev@intel.com>; stable@dpdk.org; Olga Shern

> <olgas@mellanox.com>

> Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH v3 1/7] ethdev: fix port data

> reset timing

> 

> Hi all

> 

> From: Ferruh Yigit, Thursday, April 19, 2018 2:08 PM

> > > But rte_eth_dev_release_port() is still broken because of this

> > >change,  please check _rte_eth_dev_callback_process() which uses

> > >dev->data- port_id.

> 

> The issue is that a DESTROY callback gets port_id=0 all the time, regardless

> the destroyed port id.

> 

> Let's discuss about the fix:

> 

> There are 2 options for the DESTROY event meaning:

> 

> 1. The device is going to be destroyed in the future (a bit after the callbacks

> calling).

> 	The user may think that there is a valid data in the device structure in

> the callback time,

> 	Thus, he may use it.

> 	The fix here is to move the callback to the start of the function,

> 	In this time the data field is still valid.

> 

> 2. The device was already destroyed in the past (a bit before the callbacks

> calling).

> 	The user should think that there is no any valid data in the device

> structure in the callback time,

> 	Thus, he doesn't use it.

> 	The issue here:

> 	_rte_eth_dev_callback_process() assumes there is a valid data in the

> data field  all the time,

> 	But in this case the data field is not valid because the device was

> already destroyed.

> 	Optional fixes:

> 	1. Always keep the data->port_id valid.

> 	2. keep the data->port_id valid only for the

> _rte_eth_dev_callback_process() call.

> 	3. Change _rte_eth_dev_callback_process() arg from "struct

> rte_eth_dev *dev" to "uint16_t port_id"

> 		a. Need to change all the calls for this internal API.

> 

> I vote to 2.1.

> 

> 

> What do you think?

> 

> Matan.

> 

> 

>
  
Ferruh Yigit April 25, 2018, 12:54 p.m. UTC | #13
On 4/25/2018 1:16 PM, Matan Azrad wrote:
> Hi all
> 
> From: Ferruh Yigit, Thursday, April 19, 2018 2:08 PM
>>> But rte_eth_dev_release_port() is still broken because of this change,
>>> please check _rte_eth_dev_callback_process() which uses dev->data-
>>> port_id.
> 
> The issue is that a DESTROY callback gets port_id=0 all the time, regardless the destroyed port id.
> 
> Let's discuss about the fix:
> 
> There are 2 options for the DESTROY event meaning:
> 
> 1. The device is going to be destroyed in the future (a bit after the callbacks calling).
> 	The user may think that there is a valid data in the device structure in the callback time,
> 	Thus, he may use it.
> 	The fix here is to move the callback to the start of the function,
> 	In this time the data field is still valid.
> 
> 2. The device was already destroyed in the past (a bit before the callbacks calling).
> 	The user should think that there is no any valid data in the device structure in the callback time,
> 	Thus, he doesn't use it.
> 	The issue here:
> 	_rte_eth_dev_callback_process() assumes there is a valid data in the data field  all the time,
> 	But in this case the data field is not valid because the device was already destroyed.
> 	Optional fixes:
> 	1. Always keep the data->port_id valid.
> 	2. keep the data->port_id valid only for the _rte_eth_dev_callback_process() call.
> 	2. Change _rte_eth_dev_callback_process() arg from "struct rte_eth_dev *dev" to "uint16_t port_id"
> 		a. Need to change all the calls for this internal API.
> 
> I vote to 2.1.
> 
> 
> What do you think?

What is the concern with 1? It is easy to implement.

And it may be better because if callback called after device destroyed, there is
no guarantee/locking that same port won't be re-used, in the middle of the
callback function rte_eth_dev_data can be updated, no?

> 
> Matan.
> 	
> 
> 
>
  
Matan Azrad April 25, 2018, 2:01 p.m. UTC | #14
Hi Ferruh

 From: Ferruh Yigit, Wednesday, April 25, 2018 3:54 PM

> On 4/25/2018 1:16 PM, Matan Azrad wrote:

> > Hi all

> >

> > From: Ferruh Yigit, Thursday, April 19, 2018 2:08 PM

> >>> But rte_eth_dev_release_port() is still broken because of this

> >>> change, please check _rte_eth_dev_callback_process() which uses

> >>> dev->data- port_id.

> >

> > The issue is that a DESTROY callback gets port_id=0 all the time, regardless

> the destroyed port id.

> >

> > Let's discuss about the fix:

> >

> > There are 2 options for the DESTROY event meaning:

> >

> > 1. The device is going to be destroyed in the future (a bit after the callbacks

> calling).

> > 	The user may think that there is a valid data in the device structure in

> the callback time,

> > 	Thus, he may use it.

> > 	The fix here is to move the callback to the start of the function,

> > 	In this time the data field is still valid.

> >

> > 2. The device was already destroyed in the past (a bit before the callbacks

> calling).

> > 	The user should think that there is no any valid data in the device

> structure in the callback time,

> > 	Thus, he doesn't use it.

> > 	The issue here:

> > 	_rte_eth_dev_callback_process() assumes there is a valid data in the

> data field  all the time,

> > 	But in this case the data field is not valid because the device was

> already destroyed.

> > 	Optional fixes:

> > 	1. Always keep the data->port_id valid.

> > 	2. keep the data->port_id valid only for the

> _rte_eth_dev_callback_process() call.

> > 	3. Change _rte_eth_dev_callback_process() arg from "struct

> rte_eth_dev *dev" to "uint16_t port_id"

> > 		a. Need to change all the calls for this internal API.

> >

> > I vote to 2.1.

> >

> >

> > What do you think?

> 

> What is the concern with 1? It is easy to implement.

> 

Yes, also 2.1 and 2.2 are easy.

> And it may be better because if callback called after device destroyed, there

> is no guarantee/locking that same port won't be re-used, in the middle of the

> callback function rte_eth_dev_data can be updated, no?

> 


Good point!

I think we must guarantee no port allocation for the same port id in the callback time.
I also prefer to not call the callbacks in the critical section.

So maybe call it before the locking is better.
  

Patch

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 7044159..156231c 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -204,7 +204,6 @@  struct rte_eth_dev *
 		return NULL;
 	}
 
-	memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
 	eth_dev = eth_dev_get(port_id);
 	snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
 	eth_dev->data->port_id = port_id;
@@ -252,6 +251,7 @@  struct rte_eth_dev *
 	if (eth_dev == NULL)
 		return -EINVAL;
 
+	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
 	eth_dev->state = RTE_ETH_DEV_UNUSED;
 
 	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_DESTROY, NULL);