[v2] net/pcap: enable data path on secondary

Message ID 20181112165109.33296-1-qi.z.zhang@intel.com (mailing list archive)
State Changes Requested, archived
Delegated to: Ferruh Yigit
Headers
Series [v2] net/pcap: enable data path on secondary |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/intel-Performance-Testing success Performance Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Qi Zhang Nov. 12, 2018, 4:51 p.m. UTC
  Private vdev on secondary is never supported by the new shared
device mode but pdump still relies on a private pcap PMD on secondary.
The patch enables pcap PMD's data path on secondary so that pdump can
work as usual.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Yufeng Mo <yufengx.mo@intel.com>

---

v2:
- fix init issue when try to dump to an iface.

 drivers/net/pcap/rte_eth_pcap.c | 94 +++++++++++++++++++++++++----------------
 1 file changed, 58 insertions(+), 36 deletions(-)
  

Comments

Ferruh Yigit Nov. 13, 2018, 4:56 p.m. UTC | #1
On 11/12/2018 4:51 PM, Qi Zhang wrote:
> Private vdev on secondary is never supported by the new shared
> device mode but pdump still relies on a private pcap PMD on secondary.
> The patch enables pcap PMD's data path on secondary so that pdump can
> work as usual.

It would be great if you described the problem a little more.

Private vdev was the way previously, when pdump developed, now with shared
device mode on virtual devices, pcap data path in secondary is not working.

What exactly not working is (please correct me if I am wrong):
When secondary adds a virtual device, related data transferred to primary and
primary creates the device and shares device back with secondary.
When pcap device created in primary, pcap handlers (pointers) are process local
and they are not valid for secondary process. This breaks secondary.

So we can't directly share the pcap handlers, but need to create a new set of
handlers for secondary, that is what you are doing in this patch, although I
have some comments, please check below.

Since there is single storage for pcap handlers that primary and secondary
shares and they can't share the handlers, you can't make both primary and
secondary data path work. Also freeing handlers is another concern. What is
needed is `rte_eth_dev->process_private` which has been added in this release.

> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> Tested-by: Yufeng Mo <yufengx.mo@intel.com>

<...>

> @@ -934,6 +935,10 @@ pmd_init_internals(struct rte_vdev_device *vdev,
>  	 */
>  	(*eth_dev)->dev_ops = &ops;
>  
> +	/* store a copy of devargs for secondary process */
> +	strlcpy(internals->devargs, rte_vdev_device_args(vdev),
> +			ETH_PCAP_ARG_MAXLEN);

Why we need to cover this in PMD level?

Why secondary probe isn't getting devargs? Can't we fix this in eal level?
It can be OK to workaround in the PMD taking account of the time of the release,
but for long term I think this should be fixed in eal.

<...>

> @@ -1122,23 +1126,37 @@ pmd_pcap_probe(struct rte_vdev_device *dev)
>  	start_cycles = rte_get_timer_cycles();
>  	hz = rte_get_timer_hz();
>  
> -	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +		kvlist = rte_kvargs_parse(rte_vdev_device_args(dev),
> +				valid_arguments);
> +		if (kvlist == NULL)
> +			return -EINVAL;
> +		if (rte_kvargs_count(kvlist, ETH_PCAP_IFACE_ARG) == 1)
> +			nb_rx_queue = 1;
> +		else
> +			nb_rx_queue =
> +				rte_kvargs_count(kvlist,
> +					ETH_PCAP_RX_PCAP_ARG) ? 1 : 0;
> +		nb_tx_queue = 1;

This part is wrong. pcap pmd supports multi queue, you can't hardcode the number
of queues. Also for Tx why it ignores `rx_iface` argument?
This is just hacking the driver for a specific use case breaking others.

> +		ret = pmd_init_internals(dev, nb_rx_queue,
> +				nb_tx_queue, &eth_dev);

I think it is not required to move pmd_init_internals() here.
This can be done simpler, I will send a draft patch as a reply to this mail for
possible solution.
But again that can't be final solution, we need to use `process_private`

<...>
  
Thomas Monjalon Nov. 13, 2018, 5:14 p.m. UTC | #2
Just a quick comment:
There are probably some ideas to take from what was done for tap.


13/11/2018 17:56, Ferruh Yigit:
> On 11/12/2018 4:51 PM, Qi Zhang wrote:
> > Private vdev on secondary is never supported by the new shared
> > device mode but pdump still relies on a private pcap PMD on secondary.
> > The patch enables pcap PMD's data path on secondary so that pdump can
> > work as usual.
> 
> It would be great if you described the problem a little more.
> 
> Private vdev was the way previously, when pdump developed, now with shared
> device mode on virtual devices, pcap data path in secondary is not working.
> 
> What exactly not working is (please correct me if I am wrong):
> When secondary adds a virtual device, related data transferred to primary and
> primary creates the device and shares device back with secondary.
> When pcap device created in primary, pcap handlers (pointers) are process local
> and they are not valid for secondary process. This breaks secondary.
> 
> So we can't directly share the pcap handlers, but need to create a new set of
> handlers for secondary, that is what you are doing in this patch, although I
> have some comments, please check below.
> 
> Since there is single storage for pcap handlers that primary and secondary
> shares and they can't share the handlers, you can't make both primary and
> secondary data path work. Also freeing handlers is another concern. What is
> needed is `rte_eth_dev->process_private` which has been added in this release.
> 
> > 
> > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > Tested-by: Yufeng Mo <yufengx.mo@intel.com>
> 
> <...>
> 
> > @@ -934,6 +935,10 @@ pmd_init_internals(struct rte_vdev_device *vdev,
> >  	 */
> >  	(*eth_dev)->dev_ops = &ops;
> >  
> > +	/* store a copy of devargs for secondary process */
> > +	strlcpy(internals->devargs, rte_vdev_device_args(vdev),
> > +			ETH_PCAP_ARG_MAXLEN);
> 
> Why we need to cover this in PMD level?
> 
> Why secondary probe isn't getting devargs? Can't we fix this in eal level?
> It can be OK to workaround in the PMD taking account of the time of the release,
> but for long term I think this should be fixed in eal.
> 
> <...>
> 
> > @@ -1122,23 +1126,37 @@ pmd_pcap_probe(struct rte_vdev_device *dev)
> >  	start_cycles = rte_get_timer_cycles();
> >  	hz = rte_get_timer_hz();
> >  
> > -	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> > +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > +		kvlist = rte_kvargs_parse(rte_vdev_device_args(dev),
> > +				valid_arguments);
> > +		if (kvlist == NULL)
> > +			return -EINVAL;
> > +		if (rte_kvargs_count(kvlist, ETH_PCAP_IFACE_ARG) == 1)
> > +			nb_rx_queue = 1;
> > +		else
> > +			nb_rx_queue =
> > +				rte_kvargs_count(kvlist,
> > +					ETH_PCAP_RX_PCAP_ARG) ? 1 : 0;
> > +		nb_tx_queue = 1;
> 
> This part is wrong. pcap pmd supports multi queue, you can't hardcode the number
> of queues. Also for Tx why it ignores `rx_iface` argument?
> This is just hacking the driver for a specific use case breaking others.
> 
> > +		ret = pmd_init_internals(dev, nb_rx_queue,
> > +				nb_tx_queue, &eth_dev);
> 
> I think it is not required to move pmd_init_internals() here.
> This can be done simpler, I will send a draft patch as a reply to this mail for
> possible solution.
> But again that can't be final solution, we need to use `process_private`
> 
> <...>
>
  
Qi Zhang Nov. 13, 2018, 6:27 p.m. UTC | #3
First, apologies to make this in rush since I was somehow under pressure to make pdump works in 18.11.
I agree there is lot of things need to improve, but the strategy here is to make it work quietly and not break anything else :) 
add some comments inline.

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Tuesday, November 13, 2018 9:15 AM
> To: Yigit, Ferruh <ferruh.yigit@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; Lin, Xueqin <xueqin.lin@intel.com>
> Subject: Re: [PATCH v2] net/pcap: enable data path on secondary
> 
> Just a quick comment:
> There are probably some ideas to take from what was done for tap.

> 
> 
> 13/11/2018 17:56, Ferruh Yigit:
> > On 11/12/2018 4:51 PM, Qi Zhang wrote:
> > > Private vdev on secondary is never supported by the new shared
> > > device mode but pdump still relies on a private pcap PMD on secondary.
> > > The patch enables pcap PMD's data path on secondary so that pdump
> > > can work as usual.
> >
> > It would be great if you described the problem a little more.
> >
> > Private vdev was the way previously, when pdump developed, now with
> > shared device mode on virtual devices, pcap data path in secondary is not
> working.
> >
> > What exactly not working is (please correct me if I am wrong):
> > When secondary adds a virtual device, related data transferred to
> > primary and primary creates the device and shares device back with
> secondary.
> > When pcap device created in primary, pcap handlers (pointers) are
> > process local and they are not valid for secondary process. This breaks
> secondary.
> >
> > So we can't directly share the pcap handlers, but need to create a new
> > set of handlers for secondary, that is what you are doing in this
> > patch, although I have some comments, please check below.
> >
> > Since there is single storage for pcap handlers that primary and
> > secondary shares and they can't share the handlers, you can't make
> > both primary and secondary data path work. Also freeing handlers is
> > another concern. What is needed is `rte_eth_dev->process_private` which
> has been added in this release.

You are right, we should prevent handler be opened in primary be corrupted during probe at secondary.
Now, I see this problem in pcap , as an example: internals->tx_queue[i].dumper/pcap is shared but will be overwritten at secondary, we should fix them by use process_private, 

> >
> > >
> > > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > > Tested-by: Yufeng Mo <yufengx.mo@intel.com>
> >
> > <...>
> >
> > > @@ -934,6 +935,10 @@ pmd_init_internals(struct rte_vdev_device
> *vdev,
> > >  	 */
> > >  	(*eth_dev)->dev_ops = &ops;
> > >
> > > +	/* store a copy of devargs for secondary process */
> > > +	strlcpy(internals->devargs, rte_vdev_device_args(vdev),
> > > +			ETH_PCAP_ARG_MAXLEN);
> >
> > Why we need to cover this in PMD level?
> >
> > Why secondary probe isn't getting devargs? Can't we fix this in eal level?
> > It can be OK to workaround in the PMD taking account of the time of
> > the release, but for long term I think this should be fixed in eal.

Yes this is the workaround for quick fix.
Ideally secondary process should not take care of devargs, it just attach.
And it's better to only parse devargs on one process ( primary process), the parsed result could be stored to intermediate result in shared memory,(examples, internal->nb_rx_queue_required) so secondary process don't need to parse it again.
> >
> > <...>
> >
> > > @@ -1122,23 +1126,37 @@ pmd_pcap_probe(struct rte_vdev_device
> *dev)
> > >  	start_cycles = rte_get_timer_cycles();
> > >  	hz = rte_get_timer_hz();
> > >
> > > -	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> > > +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > > +		kvlist = rte_kvargs_parse(rte_vdev_device_args(dev),
> > > +				valid_arguments);
> > > +		if (kvlist == NULL)
> > > +			return -EINVAL;
> > > +		if (rte_kvargs_count(kvlist, ETH_PCAP_IFACE_ARG) == 1)
> > > +			nb_rx_queue = 1;
> > > +		else
> > > +			nb_rx_queue =
> > > +				rte_kvargs_count(kvlist,
> > > +					ETH_PCAP_RX_PCAP_ARG) ? 1 : 0;
> > > +		nb_tx_queue = 1;
> >
> > This part is wrong. pcap pmd supports multi queue, you can't hardcode
> > the number of queues. Also for Tx why it ignores `rx_iface` argument?
> > This is just hacking the driver for a specific use case breaking others.

Previously the nb_tx_queue and nb_rx_queue is decided by pcaps.num_of_queue and dumpers.num_of_queues.
I just can't figure out a way that we can have more than 1 queue during probe, look at below code.

If ETH_PCAP_IFACE_ARG

	pcaps.num_of_queue = 1;
	dumpers.num_of_queue = 1;

else
	is_rx_pcap = rte_kvargs_count(kvlist, ETH_PCAP_RX_PCAP_ARG) ? 1 : 0;
        pcaps.num_of_queue = 0;
	if (is_rx_pcap) {
                ret = rte_kvargs_process(kvlist, ETH_PCAP_RX_PCAP_ARG,
                                &open_rx_pcap, &pcaps);

				// pcaps.num_of_queue = 1;
        } else {
                ret = rte_kvargs_process(kvlist, NULL,
                                &rx_iface_args_process, &pcaps);
				// pcaps.num_of_queue = 0;
        }

		is_tx_pcap = rte_kvargs_count(kvlist, ETH_PCAP_TX_PCAP_ARG) ? 1 : 0;
        dumpers.num_of_queue = 0;

        if (is_tx_pcap)
                ret = rte_kvargs_process(kvlist, ETH_PCAP_TX_PCAP_ARG,
                                &open_tx_pcap, &dumpers);
				// dumpers.num_of_queue = 1
        else
                ret = rte_kvargs_process(kvlist, ETH_PCAP_TX_IFACE_ARG,
                                &open_tx_iface, &dumpers);
				// dumpers.num_of_queue = 1

That's the same logic I applied, did I missed something, would you explain more for this?

Thanks
Qi

> >
> > > +		ret = pmd_init_internals(dev, nb_rx_queue,
> > > +				nb_tx_queue, &eth_dev);
> >
> > I think it is not required to move pmd_init_internals() here.
> > This can be done simpler, I will send a draft patch as a reply to this
> > mail for possible solution.
> > But again that can't be final solution, we need to use
> > `process_private`
> >
> > <...>
> >
> 
> 
> 
>
  
Ferruh Yigit Nov. 13, 2018, 6:43 p.m. UTC | #4
On 11/13/2018 6:27 PM, Zhang, Qi Z wrote:
> First, apologies to make this in rush since I was somehow under pressure to make pdump works in 18.11.
> I agree there is lot of things need to improve, but the strategy here is to make it work quietly and not break anything else :) 
> add some comments inline.
> 
>> -----Original Message-----
>> From: Thomas Monjalon [mailto:thomas@monjalon.net]
>> Sent: Tuesday, November 13, 2018 9:15 AM
>> To: Yigit, Ferruh <ferruh.yigit@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
>> Cc: dev@dpdk.org; Lin, Xueqin <xueqin.lin@intel.com>
>> Subject: Re: [PATCH v2] net/pcap: enable data path on secondary
>>
>> Just a quick comment:
>> There are probably some ideas to take from what was done for tap.
> 
>>
>>
>> 13/11/2018 17:56, Ferruh Yigit:
>>> On 11/12/2018 4:51 PM, Qi Zhang wrote:
>>>> Private vdev on secondary is never supported by the new shared
>>>> device mode but pdump still relies on a private pcap PMD on secondary.
>>>> The patch enables pcap PMD's data path on secondary so that pdump
>>>> can work as usual.
>>>
>>> It would be great if you described the problem a little more.
>>>
>>> Private vdev was the way previously, when pdump developed, now with
>>> shared device mode on virtual devices, pcap data path in secondary is not
>> working.
>>>
>>> What exactly not working is (please correct me if I am wrong):
>>> When secondary adds a virtual device, related data transferred to
>>> primary and primary creates the device and shares device back with
>> secondary.
>>> When pcap device created in primary, pcap handlers (pointers) are
>>> process local and they are not valid for secondary process. This breaks
>> secondary.
>>>
>>> So we can't directly share the pcap handlers, but need to create a new
>>> set of handlers for secondary, that is what you are doing in this
>>> patch, although I have some comments, please check below.
>>>
>>> Since there is single storage for pcap handlers that primary and
>>> secondary shares and they can't share the handlers, you can't make
>>> both primary and secondary data path work. Also freeing handlers is
>>> another concern. What is needed is `rte_eth_dev->process_private` which
>> has been added in this release.
> 
> You are right, we should prevent handler be opened in primary be corrupted during probe at secondary.
> Now, I see this problem in pcap , as an example: internals->tx_queue[i].dumper/pcap is shared but will be overwritten at secondary, we should fix them by use process_private, 
> 
>>>
>>>>
>>>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>>>> Tested-by: Yufeng Mo <yufengx.mo@intel.com>
>>>
>>> <...>
>>>
>>>> @@ -934,6 +935,10 @@ pmd_init_internals(struct rte_vdev_device
>> *vdev,
>>>>  	 */
>>>>  	(*eth_dev)->dev_ops = &ops;
>>>>
>>>> +	/* store a copy of devargs for secondary process */
>>>> +	strlcpy(internals->devargs, rte_vdev_device_args(vdev),
>>>> +			ETH_PCAP_ARG_MAXLEN);
>>>
>>> Why we need to cover this in PMD level?
>>>
>>> Why secondary probe isn't getting devargs? Can't we fix this in eal level?
>>> It can be OK to workaround in the PMD taking account of the time of
>>> the release, but for long term I think this should be fixed in eal.
> 
> Yes this is the workaround for quick fix.
> Ideally secondary process should not take care of devargs, it just attach.
> And it's better to only parse devargs on one process ( primary process), the parsed result could be stored to intermediate result in shared memory,(examples, internal->nb_rx_queue_required) so secondary process don't need to parse it again.
>>>
>>> <...>
>>>
>>>> @@ -1122,23 +1126,37 @@ pmd_pcap_probe(struct rte_vdev_device
>> *dev)
>>>>  	start_cycles = rte_get_timer_cycles();
>>>>  	hz = rte_get_timer_hz();
>>>>
>>>> -	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
>>>> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
>>>> +		kvlist = rte_kvargs_parse(rte_vdev_device_args(dev),
>>>> +				valid_arguments);
>>>> +		if (kvlist == NULL)
>>>> +			return -EINVAL;
>>>> +		if (rte_kvargs_count(kvlist, ETH_PCAP_IFACE_ARG) == 1)
>>>> +			nb_rx_queue = 1;
>>>> +		else
>>>> +			nb_rx_queue =
>>>> +				rte_kvargs_count(kvlist,
>>>> +					ETH_PCAP_RX_PCAP_ARG) ? 1 : 0;
>>>> +		nb_tx_queue = 1;
>>>
>>> This part is wrong. pcap pmd supports multi queue, you can't hardcode
>>> the number of queues. Also for Tx why it ignores `rx_iface` argument?
>>> This is just hacking the driver for a specific use case breaking others.
> 
> Previously the nb_tx_queue and nb_rx_queue is decided by pcaps.num_of_queue and dumpers.num_of_queues.
> I just can't figure out a way that we can have more than 1 queue during probe, look at below code.
> 
> If ETH_PCAP_IFACE_ARG
> 
> 	pcaps.num_of_queue = 1;
> 	dumpers.num_of_queue = 1;
> 
> else
> 	is_rx_pcap = rte_kvargs_count(kvlist, ETH_PCAP_RX_PCAP_ARG) ? 1 : 0;
>         pcaps.num_of_queue = 0;
> 	if (is_rx_pcap) {
>                 ret = rte_kvargs_process(kvlist, ETH_PCAP_RX_PCAP_ARG,
>                                 &open_rx_pcap, &pcaps);
> 
> 				// pcaps.num_of_queue = 1;
>         } else {
>                 ret = rte_kvargs_process(kvlist, NULL,
>                                 &rx_iface_args_process, &pcaps);
> 				// pcaps.num_of_queue = 0;
>         }
> 
> 		is_tx_pcap = rte_kvargs_count(kvlist, ETH_PCAP_TX_PCAP_ARG) ? 1 : 0;
>         dumpers.num_of_queue = 0;
> 
>         if (is_tx_pcap)
>                 ret = rte_kvargs_process(kvlist, ETH_PCAP_TX_PCAP_ARG,
>                                 &open_tx_pcap, &dumpers);
> 				// dumpers.num_of_queue = 1
>         else
>                 ret = rte_kvargs_process(kvlist, ETH_PCAP_TX_IFACE_ARG,
>                                 &open_tx_iface, &dumpers);
> 				// dumpers.num_of_queue = 1
> 
> That's the same logic I applied, did I missed something, would you explain more for this?

ETH_PCAP_IFACE_ARG is "iface=xxx" usage, both Rx and Tx use the same interface,
because of implementation limitation it only supports 1 queue.

rx_pcap, rx_iface, rx_iface_in supports multiple queues, by providing them
multiple time. Like "rx_pcap=q1.pcap,rx_pcap=q2.pcap,rx_pcap=q3.pcap" will
create 3 Rx queues each having their own .pcap file. Same is valid for Tx.

rte_kvargs_process() calls callback function per argument provided, so if an
argument provided multiple times, it will call same callback multiple times,
that is why 'num_of_queue' increased in callback functions.

In high-level, pmd_pcap_probe() first parses the arguments and creates pcap
handlers based on arguments, later as a last thing creates ethdev using these
information. I am for keeping this logic, doing something different for
secondary can cause issues in edge cases not obvious at first look.

> 
> Thanks
> Qi
> 
>>>
>>>> +		ret = pmd_init_internals(dev, nb_rx_queue,
>>>> +				nb_tx_queue, &eth_dev);
>>>
>>> I think it is not required to move pmd_init_internals() here.
>>> This can be done simpler, I will send a draft patch as a reply to this
>>> mail for possible solution.
>>> But again that can't be final solution, we need to use
>>> `process_private`
>>>
>>> <...>
>>>
>>
>>
>>
>>
>
  
Qi Zhang Nov. 13, 2018, 7:18 p.m. UTC | #5
> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Tuesday, November 13, 2018 10:44 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Thomas Monjalon
> <thomas@monjalon.net>
> Cc: dev@dpdk.org; Lin, Xueqin <xueqin.lin@intel.com>
> Subject: Re: [PATCH v2] net/pcap: enable data path on secondary
> 
> On 11/13/2018 6:27 PM, Zhang, Qi Z wrote:
> > First, apologies to make this in rush since I was somehow under pressure to
> make pdump works in 18.11.
> > I agree there is lot of things need to improve, but the strategy here
> > is to make it work quietly and not break anything else :) add some
> comments inline.
> >
> >> -----Original Message-----
> >> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> >> Sent: Tuesday, November 13, 2018 9:15 AM
> >> To: Yigit, Ferruh <ferruh.yigit@intel.com>; Zhang, Qi Z
> >> <qi.z.zhang@intel.com>
> >> Cc: dev@dpdk.org; Lin, Xueqin <xueqin.lin@intel.com>
> >> Subject: Re: [PATCH v2] net/pcap: enable data path on secondary
> >>
> >> Just a quick comment:
> >> There are probably some ideas to take from what was done for tap.
> >
> >>
> >>
> >> 13/11/2018 17:56, Ferruh Yigit:
> >>> On 11/12/2018 4:51 PM, Qi Zhang wrote:
> >>>> Private vdev on secondary is never supported by the new shared
> >>>> device mode but pdump still relies on a private pcap PMD on
> secondary.
> >>>> The patch enables pcap PMD's data path on secondary so that pdump
> >>>> can work as usual.
> >>>
> >>> It would be great if you described the problem a little more.
> >>>
> >>> Private vdev was the way previously, when pdump developed, now with
> >>> shared device mode on virtual devices, pcap data path in secondary
> >>> is not
> >> working.
> >>>
> >>> What exactly not working is (please correct me if I am wrong):
> >>> When secondary adds a virtual device, related data transferred to
> >>> primary and primary creates the device and shares device back with
> >> secondary.
> >>> When pcap device created in primary, pcap handlers (pointers) are
> >>> process local and they are not valid for secondary process. This
> >>> breaks
> >> secondary.
> >>>
> >>> So we can't directly share the pcap handlers, but need to create a
> >>> new set of handlers for secondary, that is what you are doing in
> >>> this patch, although I have some comments, please check below.
> >>>
> >>> Since there is single storage for pcap handlers that primary and
> >>> secondary shares and they can't share the handlers, you can't make
> >>> both primary and secondary data path work. Also freeing handlers is
> >>> another concern. What is needed is `rte_eth_dev->process_private`
> >>> which
> >> has been added in this release.
> >
> > You are right, we should prevent handler be opened in primary be
> corrupted during probe at secondary.
> > Now, I see this problem in pcap , as an example:
> > internals->tx_queue[i].dumper/pcap is shared but will be overwritten
> > at secondary, we should fix them by use process_private,
> >
> >>>
> >>>>
> >>>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> >>>> Tested-by: Yufeng Mo <yufengx.mo@intel.com>
> >>>
> >>> <...>
> >>>
> >>>> @@ -934,6 +935,10 @@ pmd_init_internals(struct rte_vdev_device
> >> *vdev,
> >>>>  	 */
> >>>>  	(*eth_dev)->dev_ops = &ops;
> >>>>
> >>>> +	/* store a copy of devargs for secondary process */
> >>>> +	strlcpy(internals->devargs, rte_vdev_device_args(vdev),
> >>>> +			ETH_PCAP_ARG_MAXLEN);
> >>>
> >>> Why we need to cover this in PMD level?
> >>>
> >>> Why secondary probe isn't getting devargs? Can't we fix this in eal level?
> >>> It can be OK to workaround in the PMD taking account of the time of
> >>> the release, but for long term I think this should be fixed in eal.
> >
> > Yes this is the workaround for quick fix.
> > Ideally secondary process should not take care of devargs, it just attach.
> > And it's better to only parse devargs on one process ( primary process), the
> parsed result could be stored to intermediate result in shared
> memory,(examples, internal->nb_rx_queue_required) so secondary process
> don't need to parse it again.
> >>>
> >>> <...>
> >>>
> >>>> @@ -1122,23 +1126,37 @@ pmd_pcap_probe(struct rte_vdev_device
> >> *dev)
> >>>>  	start_cycles = rte_get_timer_cycles();
> >>>>  	hz = rte_get_timer_hz();
> >>>>
> >>>> -	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> >>>> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> >>>> +		kvlist = rte_kvargs_parse(rte_vdev_device_args(dev),
> >>>> +				valid_arguments);
> >>>> +		if (kvlist == NULL)
> >>>> +			return -EINVAL;
> >>>> +		if (rte_kvargs_count(kvlist, ETH_PCAP_IFACE_ARG) == 1)
> >>>> +			nb_rx_queue = 1;
> >>>> +		else
> >>>> +			nb_rx_queue =
> >>>> +				rte_kvargs_count(kvlist,
> >>>> +					ETH_PCAP_RX_PCAP_ARG) ? 1 : 0;
> >>>> +		nb_tx_queue = 1;
> >>>
> >>> This part is wrong. pcap pmd supports multi queue, you can't
> >>> hardcode the number of queues. Also for Tx why it ignores `rx_iface`
> argument?
> >>> This is just hacking the driver for a specific use case breaking others.
> >
> > Previously the nb_tx_queue and nb_rx_queue is decided by
> pcaps.num_of_queue and dumpers.num_of_queues.
> > I just can't figure out a way that we can have more than 1 queue during
> probe, look at below code.
> >
> > If ETH_PCAP_IFACE_ARG
> >
> > 	pcaps.num_of_queue = 1;
> > 	dumpers.num_of_queue = 1;
> >
> > else
> > 	is_rx_pcap = rte_kvargs_count(kvlist, ETH_PCAP_RX_PCAP_ARG) ? 1 : 0;
> >         pcaps.num_of_queue = 0;
> > 	if (is_rx_pcap) {
> >                 ret = rte_kvargs_process(kvlist,
> ETH_PCAP_RX_PCAP_ARG,
> >                                 &open_rx_pcap, &pcaps);
> >
> > 				// pcaps.num_of_queue = 1;
> >         } else {
> >                 ret = rte_kvargs_process(kvlist, NULL,
> >                                 &rx_iface_args_process, &pcaps);
> > 				// pcaps.num_of_queue = 0;
> >         }
> >
> > 		is_tx_pcap = rte_kvargs_count(kvlist, ETH_PCAP_TX_PCAP_ARG) ?
> 1 : 0;
> >         dumpers.num_of_queue = 0;
> >
> >         if (is_tx_pcap)
> >                 ret = rte_kvargs_process(kvlist,
> ETH_PCAP_TX_PCAP_ARG,
> >                                 &open_tx_pcap, &dumpers);
> > 				// dumpers.num_of_queue = 1
> >         else
> >                 ret = rte_kvargs_process(kvlist,
> ETH_PCAP_TX_IFACE_ARG,
> >                                 &open_tx_iface, &dumpers);
> > 				// dumpers.num_of_queue = 1
> >
> > That's the same logic I applied, did I missed something, would you explain
> more for this?
> 
> ETH_PCAP_IFACE_ARG is "iface=xxx" usage, both Rx and Tx use the same
> interface, because of implementation limitation it only supports 1 queue.
> 
> rx_pcap, rx_iface, rx_iface_in supports multiple queues, by providing them
> multiple time. Like "rx_pcap=q1.pcap,rx_pcap=q2.pcap,rx_pcap=q3.pcap"
> will create 3 Rx queues each having their own .pcap file. Same is valid for Tx.
> 
> rte_kvargs_process() calls callback function per argument provided, so if an
> argument provided multiple times, it will call same callback multiple times,
> that is why 'num_of_queue' increased in callback functions.

Ok, this is what I missed, I didn't notice pcap devargs will take advantage by using the same key for multiple times.

BTW, I also noticed it may not necessary to estimate the queue number and call pmd_init_internals before open any pcap file or iface as you mentioned.
The reason I have this is, previously I worried about if a pcap file can be locked by one process and prevent another process to access it, so I think about to add "owner" to in devargs to make sure only one process will access the files, but after I figure out this is not necessary, the queue estimation part should be removed and rollback to a simple solution as you suggested. 

Thanks for the help!
Qi

> 
> In high-level, pmd_pcap_probe() first parses the arguments and creates pcap
> handlers based on arguments, later as a last thing creates ethdev using these
> information. I am for keeping this logic, doing something different for
> secondary can cause issues in edge cases not obvious at first look.
> 
> >
> > Thanks
> > Qi
> >
> >>>
> >>>> +		ret = pmd_init_internals(dev, nb_rx_queue,
> >>>> +				nb_tx_queue, &eth_dev);
> >>>
> >>> I think it is not required to move pmd_init_internals() here.
> >>> This can be done simpler, I will send a draft patch as a reply to
> >>> this mail for possible solution.
> >>> But again that can't be final solution, we need to use
> >>> `process_private`
> >>>
> >>> <...>
> >>>
> >>
> >>
> >>
> >>
> >
  

Patch

diff --git a/drivers/net/pcap/rte_eth_pcap.c b/drivers/net/pcap/rte_eth_pcap.c
index 7bbe72e25..ad4fe0bf7 100644
--- a/drivers/net/pcap/rte_eth_pcap.c
+++ b/drivers/net/pcap/rte_eth_pcap.c
@@ -81,6 +81,7 @@  struct pmd_internals {
 	int if_index;
 	int single_iface;
 	int phy_mac;
+	char devargs[ETH_PCAP_ARG_MAXLEN];
 };
 
 struct pmd_devargs {
@@ -892,19 +893,19 @@  static int
 pmd_init_internals(struct rte_vdev_device *vdev,
 		const unsigned int nb_rx_queues,
 		const unsigned int nb_tx_queues,
-		struct pmd_internals **internals,
 		struct rte_eth_dev **eth_dev)
 {
 	struct rte_eth_dev_data *data;
 	unsigned int numa_node = vdev->device.numa_node;
+	struct pmd_internals *internals;
 
 	PMD_LOG(INFO, "Creating pcap-backed ethdev on numa socket %d",
 		numa_node);
 
 	/* reserve an ethdev entry */
-	*eth_dev = rte_eth_vdev_allocate(vdev, sizeof(**internals));
+	*eth_dev = rte_eth_vdev_allocate(vdev, sizeof(*internals));
 	if (!(*eth_dev))
-		return -1;
+		return -ENODEV;
 
 	/* now put it all together
 	 * - store queue data in internals,
@@ -912,21 +913,21 @@  pmd_init_internals(struct rte_vdev_device *vdev,
 	 * - point eth_dev_data to internals
 	 * - and point eth_dev structure to new eth_dev_data structure
 	 */
-	*internals = (*eth_dev)->data->dev_private;
+	internals = (*eth_dev)->data->dev_private;
 	/*
 	 * Interface MAC = 02:70:63:61:70:<iface_idx>
 	 * derived from: 'locally administered':'p':'c':'a':'p':'iface_idx'
 	 * where the middle 4 characters are converted to hex.
 	 */
-	(*internals)->eth_addr = (struct ether_addr) {
+	internals->eth_addr = (struct ether_addr) {
 		.addr_bytes = { 0x02, 0x70, 0x63, 0x61, 0x70, iface_idx++ }
 	};
-	(*internals)->phy_mac = 0;
+	internals->phy_mac = 0;
 	data = (*eth_dev)->data;
 	data->nb_rx_queues = (uint16_t)nb_rx_queues;
 	data->nb_tx_queues = (uint16_t)nb_tx_queues;
 	data->dev_link = pmd_link;
-	data->mac_addrs = &(*internals)->eth_addr;
+	data->mac_addrs = &internals->eth_addr;
 
 	/*
 	 * NOTE: we'll replace the data element, of originally allocated
@@ -934,6 +935,10 @@  pmd_init_internals(struct rte_vdev_device *vdev,
 	 */
 	(*eth_dev)->dev_ops = &ops;
 
+	/* store a copy of devargs for secondary process */
+	strlcpy(internals->devargs, rte_vdev_device_args(vdev),
+			ETH_PCAP_ARG_MAXLEN);
+
 	return 0;
 }
 
@@ -1022,10 +1027,11 @@  eth_pcap_update_mac(const char *if_name, struct rte_eth_dev *eth_dev,
 }
 
 static int
-eth_from_pcaps_common(struct rte_vdev_device *vdev,
-		struct pmd_devargs *rx_queues, const unsigned int nb_rx_queues,
-		struct pmd_devargs *tx_queues, const unsigned int nb_tx_queues,
-		struct pmd_internals **internals, struct rte_eth_dev **eth_dev)
+eth_from_pcaps_common(struct pmd_devargs *rx_queues,
+			const unsigned int nb_rx_queues,
+			struct pmd_devargs *tx_queues,
+			const unsigned int nb_tx_queues,
+			struct pmd_internals *internals)
 {
 	unsigned int i;
 
@@ -1035,12 +1041,8 @@  eth_from_pcaps_common(struct rte_vdev_device *vdev,
 	if (tx_queues == NULL && nb_tx_queues > 0)
 		return -1;
 
-	if (pmd_init_internals(vdev, nb_rx_queues, nb_tx_queues, internals,
-			eth_dev) < 0)
-		return -1;
-
 	for (i = 0; i < nb_rx_queues; i++) {
-		struct pcap_rx_queue *rx = &(*internals)->rx_queue[i];
+		struct pcap_rx_queue *rx = &internals->rx_queue[i];
 		struct devargs_queue *queue = &rx_queues->queue[i];
 
 		rx->pcap = queue->pcap;
@@ -1049,7 +1051,7 @@  eth_from_pcaps_common(struct rte_vdev_device *vdev,
 	}
 
 	for (i = 0; i < nb_tx_queues; i++) {
-		struct pcap_tx_queue *tx = &(*internals)->tx_queue[i];
+		struct pcap_tx_queue *tx = &internals->tx_queue[i];
 		struct devargs_queue *queue = &tx_queues->queue[i];
 
 		tx->dumper = queue->dumper;
@@ -1062,17 +1064,17 @@  eth_from_pcaps_common(struct rte_vdev_device *vdev,
 }
 
 static int
-eth_from_pcaps(struct rte_vdev_device *vdev,
+eth_from_pcaps(struct rte_vdev_device *vdev, struct rte_eth_dev *eth_dev,
 		struct pmd_devargs *rx_queues, const unsigned int nb_rx_queues,
 		struct pmd_devargs *tx_queues, const unsigned int nb_tx_queues,
 		int single_iface, unsigned int using_dumpers)
 {
 	struct pmd_internals *internals = NULL;
-	struct rte_eth_dev *eth_dev = NULL;
 	int ret;
 
-	ret = eth_from_pcaps_common(vdev, rx_queues, nb_rx_queues,
-		tx_queues, nb_tx_queues, &internals, &eth_dev);
+	internals = eth_dev->data->dev_private;
+	ret = eth_from_pcaps_common(rx_queues, nb_rx_queues,
+		tx_queues, nb_tx_queues, internals);
 
 	if (ret < 0)
 		return ret;
@@ -1099,7 +1101,6 @@  eth_from_pcaps(struct rte_vdev_device *vdev,
 	else
 		eth_dev->tx_pkt_burst = eth_pcap_tx;
 
-	rte_eth_dev_probing_finish(eth_dev);
 	return 0;
 }
 
@@ -1108,12 +1109,15 @@  pmd_pcap_probe(struct rte_vdev_device *dev)
 {
 	const char *name;
 	unsigned int is_rx_pcap = 0, is_tx_pcap = 0;
-	struct rte_kvargs *kvlist;
+	struct rte_kvargs *kvlist = NULL;
 	struct pmd_devargs pcaps = {0};
 	struct pmd_devargs dumpers = {0};
-	struct rte_eth_dev *eth_dev;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals = NULL;
+	unsigned int nb_rx_queue;
+	unsigned int nb_tx_queue;
 	int single_iface = 0;
-	int ret;
+	int ret = 0;
 
 	name = rte_vdev_device_name(dev);
 	PMD_LOG(INFO, "Initializing pmd_pcap for %s", name);
@@ -1122,23 +1126,37 @@  pmd_pcap_probe(struct rte_vdev_device *dev)
 	start_cycles = rte_get_timer_cycles();
 	hz = rte_get_timer_hz();
 
-	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		kvlist = rte_kvargs_parse(rte_vdev_device_args(dev),
+				valid_arguments);
+		if (kvlist == NULL)
+			return -EINVAL;
+		if (rte_kvargs_count(kvlist, ETH_PCAP_IFACE_ARG) == 1)
+			nb_rx_queue = 1;
+		else
+			nb_rx_queue =
+				rte_kvargs_count(kvlist,
+					ETH_PCAP_RX_PCAP_ARG) ? 1 : 0;
+		nb_tx_queue = 1;
+		ret = pmd_init_internals(dev, nb_rx_queue,
+				nb_tx_queue, &eth_dev);
+		if (ret != 0)
+			goto free_kvlist;
+	} else {
 		eth_dev = rte_eth_dev_attach_secondary(name);
 		if (!eth_dev) {
 			PMD_LOG(ERR, "Failed to probe %s", name);
-			return -1;
+			ret = -ENODEV;
+			goto free_kvlist;
 		}
-		/* TODO: request info from primary to set up Rx and Tx */
 		eth_dev->dev_ops = &ops;
 		eth_dev->device = &dev->device;
-		rte_eth_dev_probing_finish(eth_dev);
-		return 0;
+		internals = eth_dev->data->dev_private;
+		kvlist = rte_kvargs_parse(internals->devargs, valid_arguments);
+		if (kvlist == NULL)
+			return -EINVAL;
 	}
 
-	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
-	if (kvlist == NULL)
-		return -1;
-
 	/*
 	 * If iface argument is passed we open the NICs and use them for
 	 * reading / writing
@@ -1202,8 +1220,12 @@  pmd_pcap_probe(struct rte_vdev_device *dev)
 		goto free_kvlist;
 
 create_eth:
-	ret = eth_from_pcaps(dev, &pcaps, pcaps.num_of_queue, &dumpers,
-		dumpers.num_of_queue, single_iface, is_tx_pcap);
+	ret = eth_from_pcaps(dev, eth_dev, &pcaps, pcaps.num_of_queue,
+			&dumpers, dumpers.num_of_queue,
+			single_iface, is_tx_pcap);
+	if (ret != 0)
+		goto free_kvlist;
+	rte_eth_dev_probing_finish(eth_dev);
 
 free_kvlist:
 	rte_kvargs_free(kvlist);