app/testpmd: fix closing softnic port before ethdev ports

Message ID 20230309144249.1199517-1-yogesh.jangra@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series app/testpmd: fix closing softnic port before ethdev ports |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS

Commit Message

Yogesh Jangra March 9, 2023, 2:42 p.m. UTC
  SoftNIC runs on the sevice core, it uses the resources from the testpmd
application. When we run quit command, the testpmd application stops
ethdev ports first, SoftNIC will try to access the port and sometimes
that result in segmentation error.This fix will first close the SoftNIC port.

Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
Signed-off-by: Kamalakannan R <kamalakannan.r@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 app/test-pmd/testpmd.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
  

Comments

Stephen Hemminger March 9, 2023, 4:31 p.m. UTC | #1
On Thu,  9 Mar 2023 14:42:49 +0000
Yogesh Jangra <yogesh.jangra@intel.com> wrote:

> +		/*
> +		 * SoftNIC runs on the sevice core, it uses the resources from
> +		 * the testpmd application. When we run quit command, the testpmd
> +		 * application stops ethdev ports first, SoftNIC will try to
> +		 * access the port and sometimes that result in segmentation
> +		 * error. So first closing the SoftNIC port.
> +		 */
> +		RTE_ETH_FOREACH_DEV(pt_id) {
> +			if (!strcmp(ports[pt_id].dev_info.driver_name, "net_softnic")) {
> +				stop_port(pt_id);
> +				close_port(pt_id);
> +			}
> +		}
> +

NAK
No driver specific hacks please.

Instead fix the driver design or bug please.
  
Cristian Dumitrescu March 9, 2023, 5:19 p.m. UTC | #2
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, March 9, 2023 4:31 PM
> To: Jangra, Yogesh <yogesh.jangra@intel.com>
> Cc: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; R,
> Kamalakannan <kamalakannan.r@intel.com>; Suresh Narayane, Harshad
> <harshad.suresh.narayane@intel.com>
> Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev ports
> 
> On Thu,  9 Mar 2023 14:42:49 +0000
> Yogesh Jangra <yogesh.jangra@intel.com> wrote:
> 
> > +		/*
> > +		 * SoftNIC runs on the sevice core, it uses the resources from
> > +		 * the testpmd application. When we run quit command, the
> testpmd
> > +		 * application stops ethdev ports first, SoftNIC will try to
> > +		 * access the port and sometimes that result in segmentation
> > +		 * error. So first closing the SoftNIC port.
> > +		 */
> > +		RTE_ETH_FOREACH_DEV(pt_id) {
> > +			if (!strcmp(ports[pt_id].dev_info.driver_name,
> "net_softnic")) {
> > +				stop_port(pt_id);
> > +				close_port(pt_id);
> > +			}
> > +		}
> > +
> 
> NAK
> No driver specific hacks please.
> 
> Instead fix the driver design or bug please.

Hi Stephen,

This is not a Soft NIC driver-specific hack, this is required for working around some of the ethdev drivers that don't implement the stop() API correctly and free up the device queues or some other internal resources on stop() instead of close().

The Soft NIC is a meta-device that sits on top of other "physical" ethdev devices, so when the Soft NIC device continues to poll the queues of those physical devices after their queues have been freed, the Soft NIC will get a segfault. This fix is required to protect against this sort of incorrect driver behavior by simply stopping the Soft NIC devices first.

We already have several driver specific branches in the test-pmd for e.g. LAG or virtual devices; IMO this small change falls in the same category and it should get accepted.

Please let us know if this makes sense to you?

Regards,
Cristian
  
Cristian Dumitrescu March 9, 2023, 7:08 p.m. UTC | #3
> -----Original Message-----
> From: Jangra, Yogesh <yogesh.jangra@intel.com>
> Sent: Thursday, March 9, 2023 2:43 PM
> To: dev@dpdk.org
> Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Jangra, Yogesh
> <yogesh.jangra@intel.com>; R, Kamalakannan <kamalakannan.r@intel.com>;
> Suresh Narayane, Harshad <harshad.suresh.narayane@intel.com>
> Subject: [PATCH] app/testpmd: fix closing softnic port before ethdev ports
> 
> SoftNIC runs on the sevice core, it uses the resources from the testpmd
> application. When we run quit command, the testpmd application stops
> ethdev ports first, SoftNIC will try to access the port and sometimes
> that result in segmentation error.This fix will first close the SoftNIC port.
> 
> Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
> Signed-off-by: Kamalakannan R <kamalakannan.r@intel.com>
> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> ---
>  app/test-pmd/testpmd.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 0032696608..aa831b2389 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -3767,6 +3767,21 @@ pmd_test_exit(void)
>  #endif
>  	if (ports != NULL) {
>  		no_link_check = 1;
> +
> +		/*
> +		 * SoftNIC runs on the sevice core, it uses the resources from
> +		 * the testpmd application. When we run quit command, the
> testpmd
> +		 * application stops ethdev ports first, SoftNIC will try to
> +		 * access the port and sometimes that result in segmentation
> +		 * error. So first closing the SoftNIC port.
> +		 */
> +		RTE_ETH_FOREACH_DEV(pt_id) {
> +			if (!strcmp(ports[pt_id].dev_info.driver_name,
> "net_softnic")) {
> +				stop_port(pt_id);
> +				close_port(pt_id);
> +			}
> +		}
> +
>  		RTE_ETH_FOREACH_DEV(pt_id) {
>  			printf("\nStopping port %d...\n", pt_id);
>  			fflush(stdout);
> --
> 2.25.1

Adding Aman and Yuying, the test-pmd maintainers.

Aman and Yuying, can you please review this patch when you get a chance, thank you!
  
Cristian Dumitrescu March 9, 2023, 7:09 p.m. UTC | #4
> -----Original Message-----
> From: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Sent: Thursday, March 9, 2023 5:20 PM
> To: Stephen Hemminger <stephen@networkplumber.org>; Jangra, Yogesh
> <yogesh.jangra@intel.com>
> Cc: dev@dpdk.org; R, Kamalakannan <kamalakannan.r@intel.com>; Suresh
> Narayane, Harshad <harshad.suresh.narayane@intel.com>
> Subject: RE: [PATCH] app/testpmd: fix closing softnic port before ethdev ports
> 
> 
> 
> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Thursday, March 9, 2023 4:31 PM
> > To: Jangra, Yogesh <yogesh.jangra@intel.com>
> > Cc: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; R,
> > Kamalakannan <kamalakannan.r@intel.com>; Suresh Narayane, Harshad
> > <harshad.suresh.narayane@intel.com>
> > Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev ports
> >
> > On Thu,  9 Mar 2023 14:42:49 +0000
> > Yogesh Jangra <yogesh.jangra@intel.com> wrote:
> >
> > > +		/*
> > > +		 * SoftNIC runs on the sevice core, it uses the resources from
> > > +		 * the testpmd application. When we run quit command, the
> > testpmd
> > > +		 * application stops ethdev ports first, SoftNIC will try to
> > > +		 * access the port and sometimes that result in segmentation
> > > +		 * error. So first closing the SoftNIC port.
> > > +		 */
> > > +		RTE_ETH_FOREACH_DEV(pt_id) {
> > > +			if (!strcmp(ports[pt_id].dev_info.driver_name,
> > "net_softnic")) {
> > > +				stop_port(pt_id);
> > > +				close_port(pt_id);
> > > +			}
> > > +		}
> > > +
> >
> > NAK
> > No driver specific hacks please.
> >
> > Instead fix the driver design or bug please.
> 
> Hi Stephen,
> 
> This is not a Soft NIC driver-specific hack, this is required for working around
> some of the ethdev drivers that don't implement the stop() API correctly and
> free up the device queues or some other internal resources on stop() instead of
> close().
> 
> The Soft NIC is a meta-device that sits on top of other "physical" ethdev devices,
> so when the Soft NIC device continues to poll the queues of those physical
> devices after their queues have been freed, the Soft NIC will get a segfault. This
> fix is required to protect against this sort of incorrect driver behavior by simply
> stopping the Soft NIC devices first.
> 
> We already have several driver specific branches in the test-pmd for e.g. LAG or
> virtual devices; IMO this small change falls in the same category and it should
> get accepted.
> 
> Please let us know if this makes sense to you?
> 
> Regards,
> Cristian

Adding Aman and Yuying, the test-pmd maintainers, to this conversation.
  
Stephen Hemminger March 9, 2023, 8:22 p.m. UTC | #5
On Thu, 9 Mar 2023 17:19:59 +0000
"Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:

> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Thursday, March 9, 2023 4:31 PM
> > To: Jangra, Yogesh <yogesh.jangra@intel.com>
> > Cc: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; R,
> > Kamalakannan <kamalakannan.r@intel.com>; Suresh Narayane, Harshad
> > <harshad.suresh.narayane@intel.com>
> > Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev ports
> > 
> > On Thu,  9 Mar 2023 14:42:49 +0000
> > Yogesh Jangra <yogesh.jangra@intel.com> wrote:
> >   
> > > +		/*
> > > +		 * SoftNIC runs on the sevice core, it uses the resources from
> > > +		 * the testpmd application. When we run quit command, the  
> > testpmd  
> > > +		 * application stops ethdev ports first, SoftNIC will try to
> > > +		 * access the port and sometimes that result in segmentation
> > > +		 * error. So first closing the SoftNIC port.
> > > +		 */
> > > +		RTE_ETH_FOREACH_DEV(pt_id) {
> > > +			if (!strcmp(ports[pt_id].dev_info.driver_name,  
> > "net_softnic")) {  
> > > +				stop_port(pt_id);
> > > +				close_port(pt_id);
> > > +			}
> > > +		}
> > > +  
> > 
> > NAK
> > No driver specific hacks please.
> > 
> > Instead fix the driver design or bug please.  
> 
> Hi Stephen,
> 
> This is not a Soft NIC driver-specific hack, this is required for working around some of the ethdev drivers that don't implement the stop() API correctly and free up the device queues or some other internal resources on stop() instead of close().
> 
> The Soft NIC is a meta-device that sits on top of other "physical" ethdev devices, so when the Soft NIC device continues to poll the queues of those physical devices after their queues have been freed, the Soft NIC will get a segfault. This fix is required to protect against this sort of incorrect driver behavior by simply stopping the Soft NIC devices first.
> 
> We already have several driver specific branches in the test-pmd for e.g. LAG or virtual devices; IMO this small change falls in the same category and it should get accepted.
> 
> Please let us know if this makes sense to you?
> 
> Regards,
> Cristian

If the application has to do this then something is wrong with the architecture.


You aren't propagating state changes through in a safe manner.
Other applications will have same issue.

If LAG or virtual devices have similar problems then a generic solution is needed.
  
Singh, Aman Deep March 10, 2023, 9:09 a.m. UTC | #6
On 3/10/2023 1:52 AM, Stephen Hemminger wrote:
> On Thu, 9 Mar 2023 17:19:59 +0000
> "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:
>
>>> -----Original Message-----
>>> From: Stephen Hemminger <stephen@networkplumber.org>
>>> Sent: Thursday, March 9, 2023 4:31 PM
>>> To: Jangra, Yogesh <yogesh.jangra@intel.com>
>>> Cc: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; R,
>>> Kamalakannan <kamalakannan.r@intel.com>; Suresh Narayane, Harshad
>>> <harshad.suresh.narayane@intel.com>
>>> Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev ports
>>>
>>> On Thu,  9 Mar 2023 14:42:49 +0000
>>> Yogesh Jangra <yogesh.jangra@intel.com> wrote:
>>>    
>>>> +		/*
>>>> +		 * SoftNIC runs on the sevice core, it uses the resources from
>>>> +		 * the testpmd application. When we run quit command, the
>>> testpmd
>>>> +		 * application stops ethdev ports first, SoftNIC will try to
>>>> +		 * access the port and sometimes that result in segmentation
>>>> +		 * error. So first closing the SoftNIC port.
>>>> +		 */
>>>> +		RTE_ETH_FOREACH_DEV(pt_id) {
>>>> +			if (!strcmp(ports[pt_id].dev_info.driver_name,
>>> "net_softnic")) {
>>>> +				stop_port(pt_id);
>>>> +				close_port(pt_id);
>>>> +			}
>>>> +		}
>>>> +
>>> NAK
>>> No driver specific hacks please.
>>>
>>> Instead fix the driver design or bug please.
>> Hi Stephen,
>>
>> This is not a Soft NIC driver-specific hack, this is required for working around some of the ethdev drivers that don't implement the stop() API correctly and free up the device queues or some other internal resources on stop() instead of close().
>>
>> The Soft NIC is a meta-device that sits on top of other "physical" ethdev devices, so when the Soft NIC device continues to poll the queues of those physical devices after their queues have been freed, the Soft NIC will get a segfault. This fix is required to protect against this sort of incorrect driver behavior by simply stopping the Soft NIC devices first.
>>
>> We already have several driver specific branches in the test-pmd for e.g. LAG or virtual devices; IMO this small change falls in the same category and it should get accepted.
>>
>> Please let us know if this makes sense to you?
>>
>> Regards,
>> Cristian
> If the application has to do this then something is wrong with the architecture.
>
>
> You aren't propagating state changes through in a safe manner.
> Other applications will have same issue.
>
> If LAG or virtual devices have similar problems then a generic solution is needed.

At exit, there is call to stop_packet_forwarding(). Shouldn't that stop SoftNic from polling of the queues ?
  
Ferruh Yigit March 10, 2023, noon UTC | #7
On 3/9/2023 5:19 PM, Dumitrescu, Cristian wrote:
> 
> 
>> -----Original Message-----
>> From: Stephen Hemminger <stephen@networkplumber.org>
>> Sent: Thursday, March 9, 2023 4:31 PM
>> To: Jangra, Yogesh <yogesh.jangra@intel.com>
>> Cc: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; R,
>> Kamalakannan <kamalakannan.r@intel.com>; Suresh Narayane, Harshad
>> <harshad.suresh.narayane@intel.com>
>> Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev ports
>>
>> On Thu,  9 Mar 2023 14:42:49 +0000
>> Yogesh Jangra <yogesh.jangra@intel.com> wrote:
>>
>>> +		/*
>>> +		 * SoftNIC runs on the sevice core, it uses the resources from
>>> +		 * the testpmd application. When we run quit command, the
>> testpmd
>>> +		 * application stops ethdev ports first, SoftNIC will try to
>>> +		 * access the port and sometimes that result in segmentation
>>> +		 * error. So first closing the SoftNIC port.
>>> +		 */
>>> +		RTE_ETH_FOREACH_DEV(pt_id) {
>>> +			if (!strcmp(ports[pt_id].dev_info.driver_name,
>> "net_softnic")) {
>>> +				stop_port(pt_id);
>>> +				close_port(pt_id);
>>> +			}
>>> +		}
>>> +
>>
>> NAK
>> No driver specific hacks please.
>>
>> Instead fix the driver design or bug please.
> 
> Hi Stephen,
> 
> This is not a Soft NIC driver-specific hack, this is required for working around some of the ethdev drivers that don't implement the stop() API correctly and free up the device queues or some other internal resources on stop() instead of close().
> 

Why not fix the misbehaving drivers, instead of working around for
softnic, as Stephen suggested?

Is there a list of problematic drivers?

> The Soft NIC is a meta-device that sits on top of other "physical" ethdev devices, so when the Soft NIC device continues to poll the queues of those physical devices after their queues have been freed, the Soft NIC will get a segfault. This fix is required to protect against this sort of incorrect driver behavior by simply stopping the Soft NIC devices first.
> 
> We already have several driver specific branches in the test-pmd for e.g. LAG or virtual devices; IMO this small change falls in the same category and it should get accepted.
> 
> Please let us know if this makes sense to you?
> 
> Regards,
> Cristian
  
Cristian Dumitrescu March 10, 2023, 1:45 p.m. UTC | #8
> -----Original Message-----
> From: Singh, Aman Deep <aman.deep.singh@intel.com>
> Sent: Friday, March 10, 2023 9:09 AM
> To: Stephen Hemminger <stephen@networkplumber.org>; Dumitrescu, Cristian
> <cristian.dumitrescu@intel.com>
> Cc: Jangra, Yogesh <yogesh.jangra@intel.com>; dev@dpdk.org; R,
> Kamalakannan <kamalakannan.r@intel.com>; Suresh Narayane, Harshad
> <harshad.suresh.narayane@intel.com>
> Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev ports
> 
> 
> On 3/10/2023 1:52 AM, Stephen Hemminger wrote:
> > On Thu, 9 Mar 2023 17:19:59 +0000
> > "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:
> >
> >>> -----Original Message-----
> >>> From: Stephen Hemminger <stephen@networkplumber.org>
> >>> Sent: Thursday, March 9, 2023 4:31 PM
> >>> To: Jangra, Yogesh <yogesh.jangra@intel.com>
> >>> Cc: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>;
> R,
> >>> Kamalakannan <kamalakannan.r@intel.com>; Suresh Narayane, Harshad
> >>> <harshad.suresh.narayane@intel.com>
> >>> Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev
> ports
> >>>
> >>> On Thu,  9 Mar 2023 14:42:49 +0000
> >>> Yogesh Jangra <yogesh.jangra@intel.com> wrote:
> >>>
> >>>> +		/*
> >>>> +		 * SoftNIC runs on the sevice core, it uses the resources from
> >>>> +		 * the testpmd application. When we run quit command, the
> >>> testpmd
> >>>> +		 * application stops ethdev ports first, SoftNIC will try to
> >>>> +		 * access the port and sometimes that result in segmentation
> >>>> +		 * error. So first closing the SoftNIC port.
> >>>> +		 */
> >>>> +		RTE_ETH_FOREACH_DEV(pt_id) {
> >>>> +			if (!strcmp(ports[pt_id].dev_info.driver_name,
> >>> "net_softnic")) {
> >>>> +				stop_port(pt_id);
> >>>> +				close_port(pt_id);
> >>>> +			}
> >>>> +		}
> >>>> +
> >>> NAK
> >>> No driver specific hacks please.
> >>>
> >>> Instead fix the driver design or bug please.
> >> Hi Stephen,
> >>
> >> This is not a Soft NIC driver-specific hack, this is required for working around
> some of the ethdev drivers that don't implement the stop() API correctly and
> free up the device queues or some other internal resources on stop() instead of
> close().
> >>
> >> The Soft NIC is a meta-device that sits on top of other "physical" ethdev
> devices, so when the Soft NIC device continues to poll the queues of those
> physical devices after their queues have been freed, the Soft NIC will get a
> segfault. This fix is required to protect against this sort of incorrect driver
> behavior by simply stopping the Soft NIC devices first.
> >>
> >> We already have several driver specific branches in the test-pmd for e.g. LAG
> or virtual devices; IMO this small change falls in the same category and it should
> get accepted.
> >>
> >> Please let us know if this makes sense to you?
> >>
> >> Regards,
> >> Cristian
> > If the application has to do this then something is wrong with the architecture.
> >
> >
> > You aren't propagating state changes through in a safe manner.
> > Other applications will have same issue.
> >
> > If LAG or virtual devices have similar problems then a generic solution is
> needed.
> 
> At exit, there is call to stop_packet_forwarding(). Shouldn't that stop SoftNic
> from polling of the queues ?

Hi Aman,

Unfortunately not. Calling the stop_packet_forwarding() will not stop the core running the Soft NIC, as Soft NIC is running on a service core, and the service cores are not handled by this function.

Moreover, a Soft NIC device is not taking over the entire service core; the same service core can also run other services e.g. Soft NIC for device A, telemetry, stats, Soft NIC for device B, etc). Therefore, stopping the service core associated with a Soft NIC device will also incorrectly result in stopping all the other services running on the same service core.

Stopping the service for a Soft NIC device is done by the stop() function, which is exactly what we are trying to do with this short patch.

Regards,
Cristian
  
David Marchand March 10, 2023, 1:47 p.m. UTC | #9
On Fri, Mar 10, 2023 at 1:00 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >> NAK
> >> No driver specific hacks please.
> >>
> >> Instead fix the driver design or bug please.
> >
> > Hi Stephen,
> >
> > This is not a Soft NIC driver-specific hack, this is required for working around some of the ethdev drivers that don't implement the stop() API correctly and free up the device queues or some other internal resources on stop() instead of close().
> >
>
> Why not fix the misbehaving drivers, instead of working around for
> softnic, as Stephen suggested?

+1
  
Cristian Dumitrescu March 10, 2023, 1:47 p.m. UTC | #10
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, March 10, 2023 12:00 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Stephen Hemminger
> <stephen@networkplumber.org>; Jangra, Yogesh <yogesh.jangra@intel.com>;
> Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> <yuying.zhang@intel.com>
> Cc: dev@dpdk.org; R, Kamalakannan <kamalakannan.r@intel.com>; Suresh
> Narayane, Harshad <harshad.suresh.narayane@intel.com>
> Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev ports
> 
> On 3/9/2023 5:19 PM, Dumitrescu, Cristian wrote:
> >
> >
> >> -----Original Message-----
> >> From: Stephen Hemminger <stephen@networkplumber.org>
> >> Sent: Thursday, March 9, 2023 4:31 PM
> >> To: Jangra, Yogesh <yogesh.jangra@intel.com>
> >> Cc: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; R,
> >> Kamalakannan <kamalakannan.r@intel.com>; Suresh Narayane, Harshad
> >> <harshad.suresh.narayane@intel.com>
> >> Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev
> ports
> >>
> >> On Thu,  9 Mar 2023 14:42:49 +0000
> >> Yogesh Jangra <yogesh.jangra@intel.com> wrote:
> >>
> >>> +		/*
> >>> +		 * SoftNIC runs on the sevice core, it uses the resources from
> >>> +		 * the testpmd application. When we run quit command, the
> >> testpmd
> >>> +		 * application stops ethdev ports first, SoftNIC will try to
> >>> +		 * access the port and sometimes that result in segmentation
> >>> +		 * error. So first closing the SoftNIC port.
> >>> +		 */
> >>> +		RTE_ETH_FOREACH_DEV(pt_id) {
> >>> +			if (!strcmp(ports[pt_id].dev_info.driver_name,
> >> "net_softnic")) {
> >>> +				stop_port(pt_id);
> >>> +				close_port(pt_id);
> >>> +			}
> >>> +		}
> >>> +
> >>
> >> NAK
> >> No driver specific hacks please.
> >>
> >> Instead fix the driver design or bug please.
> >
> > Hi Stephen,
> >
> > This is not a Soft NIC driver-specific hack, this is required for working around
> some of the ethdev drivers that don't implement the stop() API correctly and
> free up the device queues or some other internal resources on stop() instead of
> close().
> >
> 
> Why not fix the misbehaving drivers, instead of working around for
> softnic, as Stephen suggested?
> 
> Is there a list of problematic drivers?
> 

Ferruh, I think this is not a reasonable request. We don't have the expertise to fix all drivers, not the hardware to test all drivers.

> > The Soft NIC is a meta-device that sits on top of other "physical" ethdev
> devices, so when the Soft NIC device continues to poll the queues of those
> physical devices after their queues have been freed, the Soft NIC will get a
> segfault. This fix is required to protect against this sort of incorrect driver
> behavior by simply stopping the Soft NIC devices first.
> >
> > We already have several driver specific branches in the test-pmd for e.g. LAG
> or virtual devices; IMO this small change falls in the same category and it should
> get accepted.
> >
> > Please let us know if this makes sense to you?
> >
> > Regards,
> > Cristian
  
David Marchand March 10, 2023, 1:49 p.m. UTC | #11
On Fri, Mar 10, 2023 at 2:48 PM Dumitrescu, Cristian
<cristian.dumitrescu@intel.com> wrote:
> > Why not fix the misbehaving drivers, instead of working around for
> > softnic, as Stephen suggested?
> >
> > Is there a list of problematic drivers?
> >
>
> Ferruh, I think this is not a reasonable request. We don't have the expertise to fix all drivers, not the hardware to test all drivers.

Then, please report the issues.
I am also NAKing this horror.
  
Ferruh Yigit March 10, 2023, 1:58 p.m. UTC | #12
On 3/10/2023 1:47 PM, Dumitrescu, Cristian wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, March 10, 2023 12:00 PM
>> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Stephen Hemminger
>> <stephen@networkplumber.org>; Jangra, Yogesh <yogesh.jangra@intel.com>;
>> Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
>> <yuying.zhang@intel.com>
>> Cc: dev@dpdk.org; R, Kamalakannan <kamalakannan.r@intel.com>; Suresh
>> Narayane, Harshad <harshad.suresh.narayane@intel.com>
>> Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev ports
>>
>> On 3/9/2023 5:19 PM, Dumitrescu, Cristian wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Stephen Hemminger <stephen@networkplumber.org>
>>>> Sent: Thursday, March 9, 2023 4:31 PM
>>>> To: Jangra, Yogesh <yogesh.jangra@intel.com>
>>>> Cc: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; R,
>>>> Kamalakannan <kamalakannan.r@intel.com>; Suresh Narayane, Harshad
>>>> <harshad.suresh.narayane@intel.com>
>>>> Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev
>> ports
>>>>
>>>> On Thu,  9 Mar 2023 14:42:49 +0000
>>>> Yogesh Jangra <yogesh.jangra@intel.com> wrote:
>>>>
>>>>> +		/*
>>>>> +		 * SoftNIC runs on the sevice core, it uses the resources from
>>>>> +		 * the testpmd application. When we run quit command, the
>>>> testpmd
>>>>> +		 * application stops ethdev ports first, SoftNIC will try to
>>>>> +		 * access the port and sometimes that result in segmentation
>>>>> +		 * error. So first closing the SoftNIC port.
>>>>> +		 */
>>>>> +		RTE_ETH_FOREACH_DEV(pt_id) {
>>>>> +			if (!strcmp(ports[pt_id].dev_info.driver_name,
>>>> "net_softnic")) {
>>>>> +				stop_port(pt_id);
>>>>> +				close_port(pt_id);
>>>>> +			}
>>>>> +		}
>>>>> +
>>>>
>>>> NAK
>>>> No driver specific hacks please.
>>>>
>>>> Instead fix the driver design or bug please.
>>>
>>> Hi Stephen,
>>>
>>> This is not a Soft NIC driver-specific hack, this is required for working around
>> some of the ethdev drivers that don't implement the stop() API correctly and
>> free up the device queues or some other internal resources on stop() instead of
>> close().
>>>
>>
>> Why not fix the misbehaving drivers, instead of working around for
>> softnic, as Stephen suggested?
>>
>> Is there a list of problematic drivers?
>>
> 
> Ferruh, I think this is not a reasonable request. We don't have the expertise to fix all drivers, not the hardware to test all drivers.
> 

Please don't make it over dramatic ;), this is not about having
expertise in all drivers or having their hardware to test.

You claim some drivers does free up their resources on stop() and
continue to polling from them cause segfault. Action is move resource
free from stop() to close().

And my intention was not request a fix from you, if you have any
particular misbehaving drivers, I can facilitate a fix from those driver
maintainers.
Eventually drivers freeing resources in stop() is a bigger problem and
can hit other applications too, this is not just testpmd problem.

>>> The Soft NIC is a meta-device that sits on top of other "physical" ethdev
>> devices, so when the Soft NIC device continues to poll the queues of those
>> physical devices after their queues have been freed, the Soft NIC will get a
>> segfault. This fix is required to protect against this sort of incorrect driver
>> behavior by simply stopping the Soft NIC devices first.
>>>
>>> We already have several driver specific branches in the test-pmd for e.g. LAG
>> or virtual devices; IMO this small change falls in the same category and it should
>> get accepted.
>>>
>>> Please let us know if this makes sense to you?
>>>
>>> Regards,
>>> Cristian
>
  
Cristian Dumitrescu March 10, 2023, 2:36 p.m. UTC | #13
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Friday, March 10, 2023 1:49 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: Ferruh Yigit <ferruh.yigit@amd.com>; Stephen Hemminger
> <stephen@networkplumber.org>; Jangra, Yogesh <yogesh.jangra@intel.com>;
> Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> <yuying.zhang@intel.com>; dev@dpdk.org; R, Kamalakannan
> <kamalakannan.r@intel.com>; Suresh Narayane, Harshad
> <harshad.suresh.narayane@intel.com>
> Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev ports
> 
> On Fri, Mar 10, 2023 at 2:48 PM Dumitrescu, Cristian
> <cristian.dumitrescu@intel.com> wrote:
> > > Why not fix the misbehaving drivers, instead of working around for
> > > softnic, as Stephen suggested?
> > >
> > > Is there a list of problematic drivers?
> > >
> >
> > Ferruh, I think this is not a reasonable request. We don't have the expertise to
> fix all drivers, not the hardware to test all drivers.
> 
> Then, please report the issues.
> I am also NAKing this horror.
> 
> 
> --
> David Marchand

Hi David,

Your ideas on how to fix this issue would be way more helpful than your NAK😉

Can you please explain why this fix is a problem? We have device X that sits on top of devices A, B, C, ... who are created and started before device X, to me it makes sense to stop device X first before stopping A, B, C, ...

Thanks,
Cristian
  
David Marchand March 10, 2023, 2:39 p.m. UTC | #14
On Fri, Mar 10, 2023 at 3:37 PM Dumitrescu, Cristian
<cristian.dumitrescu@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: David Marchand <david.marchand@redhat.com>
> > Sent: Friday, March 10, 2023 1:49 PM
> > To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> > Cc: Ferruh Yigit <ferruh.yigit@amd.com>; Stephen Hemminger
> > <stephen@networkplumber.org>; Jangra, Yogesh <yogesh.jangra@intel.com>;
> > Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> > <yuying.zhang@intel.com>; dev@dpdk.org; R, Kamalakannan
> > <kamalakannan.r@intel.com>; Suresh Narayane, Harshad
> > <harshad.suresh.narayane@intel.com>
> > Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev ports
> >
> > On Fri, Mar 10, 2023 at 2:48 PM Dumitrescu, Cristian
> > <cristian.dumitrescu@intel.com> wrote:
> > > > Why not fix the misbehaving drivers, instead of working around for
> > > > softnic, as Stephen suggested?
> > > >
> > > > Is there a list of problematic drivers?
> > > >
> > >
> > > Ferruh, I think this is not a reasonable request. We don't have the expertise to
> > fix all drivers, not the hardware to test all drivers.
> >
> > Then, please report the issues.
> > I am also NAKing this horror.
> >
> >
> > --
> > David Marchand
>
> Hi David,
>
> Your ideas on how to fix this issue would be way more helpful than your NAK😉

Well sorry, but Ferruh request was reasonable to me.
Trying to drop the ball made me a bit angry.


>
> Can you please explain why this fix is a problem? We have device X that sits on top of devices A, B, C, ... who are created and started before device X, to me it makes sense to stop device X first before stopping A, B, C, ...

There are other drivers like the failsafe driver that sit on top of
other devices.
I'd recommend looking at what it does.
  
Thomas Monjalon March 10, 2023, 2:58 p.m. UTC | #15
10/03/2023 15:36, Dumitrescu, Cristian:
> From: David Marchand <david.marchand@redhat.com>
> > On Fri, Mar 10, 2023 at 2:48 PM Dumitrescu, Cristian
> > <cristian.dumitrescu@intel.com> wrote:
> > > > Why not fix the misbehaving drivers, instead of working around for
> > > > softnic, as Stephen suggested?
> > > >
> > > > Is there a list of problematic drivers?
> > > >
> > >
> > > Ferruh, I think this is not a reasonable request. We don't have the expertise to
> > fix all drivers, not the hardware to test all drivers.
> > 
> > Then, please report the issues.
> > I am also NAKing this horror.
> > 
> > --
> > David Marchand
> 
> Hi David,
> 
> Your ideas on how to fix this issue would be way more helpful than your NAK😉
> 
> Can you please explain why this fix is a problem? We have device X that sits on top of devices A, B, C, ... who are created and started before device X, to me it makes sense to stop device X first before stopping A, B, C, ...

Ferruh has already explained it well: it is not a testpmd problem.
The problem you describe can happen in any application,
so we need to fix the root cause.
If there are problems in some drivers, please tell us which one
and we'll look at how to fix them.
  
Stephen Hemminger March 10, 2023, 4:44 p.m. UTC | #16
On Fri, 10 Mar 2023 13:58:52 +0000
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> >>
> >> Why not fix the misbehaving drivers, instead of working around for
> >> softnic, as Stephen suggested?
> >>
> >> Is there a list of problematic drivers?
> >>  
> > 
> > Ferruh, I think this is not a reasonable request. We don't have the expertise to fix all drivers, not the hardware to test all drivers.
> >   
> 
> Please don't make it over dramatic ;), this is not about having
> expertise in all drivers or having their hardware to test.
> 
> You claim some drivers does free up their resources on stop() and
> continue to polling from them cause segfault. Action is move resource
> free from stop() to close().
> 
> And my intention was not request a fix from you, if you have any
> particular misbehaving drivers, I can facilitate a fix from those driver
> maintainers.
> Eventually drivers freeing resources in stop() is a bigger problem and
> can hit other applications too, this is not just testpmd problem.

Lets all work together to resolve this.
I and others are willing to review and fix drivers you identify as problematic.
If this is a common problem, ideally we can update CI infrastructure to
test shutdown and resource issues more thoroughly via ASAN builds etc.
  
Yogesh Jangra March 17, 2023, 7:11 a.m. UTC | #17
-----Original Message-----
From: Stephen Hemminger <stephen@networkplumber.org>
Sent: Friday, March 10, 2023 10:15 PM
To: Ferruh Yigit <ferruh.yigit@amd.com>
Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Jangra, Yogesh <yogesh.jangra@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>; dev@dpdk.org; R, Kamalakannan <kamalakannan.r@intel.com>; Suresh Narayane, Harshad <harshad.suresh.narayane@intel.com>
Subject: Re: [PATCH] app/testpmd: fix closing softnic port before ethdev ports



On Fri, 10 Mar 2023 13:58:52 +0000

Ferruh Yigit <ferruh.yigit@amd.com<mailto:ferruh.yigit@amd.com>> wrote:



> >>

> >> Why not fix the misbehaving drivers, instead of working around for

> >> softnic, as Stephen suggested?

> >>

> >> Is there a list of problematic drivers?

> >>

> >

> > Ferruh, I think this is not a reasonable request. We don't have the expertise to fix all drivers, not the hardware to test all drivers.

> >

>

> Please don't make it over dramatic ;), this is not about having

> expertise in all drivers or having their hardware to test.

>

> You claim some drivers does free up their resources on stop() and

> continue to polling from them cause segfault. Action is move resource

> free from stop() to close().

>

> And my intention was not request a fix from you, if you have any

> particular misbehaving drivers, I can facilitate a fix from those

> driver maintainers.

> Eventually drivers freeing resources in stop() is a bigger problem and

> can hit other applications too, this is not just testpmd problem.



Lets all work together to resolve this.

I and others are willing to review and fix drivers you identify as problematic.

If this is a common problem, ideally we can update CI infrastructure to test shutdown and resource issues more thoroughly via ASAN builds etc.







Hi Stephen,



I am getting segmentation error while stopping only the physical port .

I run testpmd application using Soft NIC driver. I used one physical port and Soft NIC is running on the top of that.



In my scenario, I started the traffic on the port and when I stop the port I am getting segmentation error in rte_eth_rx_burst api.

This shows the order of stopping the port is casing the issue. As stopping the port is not handling the service thread stop.



Please refer screenshot for reference.



[cid:image001.png@01D958CD.BE580600]



Thanks & Regards,

Yogesh
  

Patch

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 0032696608..aa831b2389 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -3767,6 +3767,21 @@  pmd_test_exit(void)
 #endif
 	if (ports != NULL) {
 		no_link_check = 1;
+
+		/*
+		 * SoftNIC runs on the sevice core, it uses the resources from
+		 * the testpmd application. When we run quit command, the testpmd
+		 * application stops ethdev ports first, SoftNIC will try to
+		 * access the port and sometimes that result in segmentation
+		 * error. So first closing the SoftNIC port.
+		 */
+		RTE_ETH_FOREACH_DEV(pt_id) {
+			if (!strcmp(ports[pt_id].dev_info.driver_name, "net_softnic")) {
+				stop_port(pt_id);
+				close_port(pt_id);
+			}
+		}
+
 		RTE_ETH_FOREACH_DEV(pt_id) {
 			printf("\nStopping port %d...\n", pt_id);
 			fflush(stdout);