[dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF

Lu, Wenzhuo wenzhuo.lu at intel.com
Tue May 17 03:11:17 CEST 2016


Hi Olivier,

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Monday, May 16, 2016 8:01 PM
> To: Lu, Wenzhuo; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF
> 
> Hi Wenzhuo,
> 
> On 05/04/2016 11:10 PM, Wenzhuo Lu wrote:
> > When the physical link is down and recover later, the VF link cannot
> > recover until the user stop and start it manually.
> > This patch implements the automatic recovery of VF port.
> > The automatic recovery bases on the link up/down message received from
> > PF. When VF receives the link up/down message, it will replace the
> > RX/TX and operation functions with fake ones to stop RX/TX and any
> > future operation. Then reset the VF port.
> > After successfully resetting the port, recover the RX/TX and operation
> > functions.
> >
> > Signed-off-by: Wenzhuo Lu <wenzhuo.lu at intel.com>
> >
> > [...]
> >
> > +void
> > +ixgbevf_dev_link_up_down_handler(struct rte_eth_dev *dev) {
> > +	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> > +	struct ixgbe_adapter *adapter =
> > +		(struct ixgbe_adapter *)dev->data->dev_private;
> > +	int diag;
> > +	uint32_t vteiam;
> > +
> > +	/* Only one working core need to performance VF reset */
> > +	if (rte_spinlock_trylock(&adapter->vf_reset_lock)) {
> > +		/**
> > +		 * When fake rec/xmit is replaced, working thread may is
> running
> > +		 * into real RX/TX func, so wait long enough to assume all
> > +		 * working thread exit. The assumption is it will spend less
> > +		 * than 100us for each execution of RX and TX func.
> > +		 */
> > +		rte_delay_us(100);
> > +
> > +		do {
> > +			dev->data->dev_started = 0;
> > +			ixgbevf_dev_stop(dev);
> > +			rte_delay_us(1000000);
> 
> If I understand well, ixgbevf_dev_link_up_down_handler() is called by
> ixgbevf_recv_pkts_fake() on a dataplane core. It means that the core that
> acquired the lock will loop during 100us + 1sec at least.
> If this core was also in charge of polling other queues of other ports, or timers,
> many packets will be dropped (even with a 100us loop). I don't think it is
> acceptable to actively wait inside a rx function.
> 
> I think it would avoid many issues to delegate this work to the application,
> maybe by notifying it that the port is in a bad state and must be restarted. The
> application could then properly stop polling the queues, and stop and restart the
> port in a separate thread, without bothering the dataplane cores.
Thanks for the comments.
Yes, you're right. I had a wrong assumption that every queue is handled by one core.
But surely it's not right, we cannot tell how the users will deploy their system.

I plan to update this patch set. The solution now is, first let the users choose if they want this
auto-reset feature. If so, we will apply another series rx/tx functions which have lock. So we
can stop the rx/tx of the bad ports.
And we also apply a reset API for users. The APPs should call this API in their management thread or so.
It means APPs should guarantee the thread safe for the API.
You see, there're 2 things,
1, Lock the rx/tx to stop them for users.
2, Apply a resetting API for users, and every NIC can do their own job. APPs need not to worry about the difference 
between different NICs.

Surely, it's not *automatic* now. The reason is DPDK doesn't guarantee the thread safe. So the operations have to be
left to the APPs and let them to guarantee the thread safe.

And if the users choose not using auto-reset feature, we will leave this work to the APP :)

> 
> 
> Regards,
> Olivier


More information about the dev mailing list