[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

Neil Horman nhorman at tuxdriver.com
Fri Sep 26 15:40:14 CEST 2014


On Fri, Sep 26, 2014 at 12:37:54PM +0000, Wodkowski, PawelX wrote:
> > So basically cancel() just set ALARM_CANCELLED and leaves actual alarm
> > deletion to the callback()?
> > That was the thought, yes.
> > 
> > > I think it is doable - but I don't see any real advantage with that approach.
> > > Yes, code will become a bit simpler, as  we'll have one point when we remove
> > alarm from the list.
> > Yes, that would be the advantage, that the code would be much simpler.
> > 
> > > But from other side, imagine such simple test-case:
> > >
> > > for (i = 0; i < 0x100000; i++) {
> > >    rte_eal_alarm_set(ONE_MIN, cb_func, (void *)i);
> > >    rte_eal_alarm_cancel(cb_func, (void *)i);
> > > }
> > >
> > > We'll endup with 1M of cancelled, but still not removed entries in the
> > alarm_list.
> > > With current implementation that means - few MBs of wasted memory,
> > Thats correct, and the tradeoff to choose between.  Do you want simpler code
> > that is easier to maintain, or do you want a high speed cancel and set
> > operation.  I'm not aware of all the use cases, but I have a hard time seeing
> > a use case in which the in-flight alarm list grows unboundedly large, which in
> > my mind mitigates the risk of deferred removal, but I'm perfectly willing to
> > believe that there are use cases which I'm not aware of.
> > 
> > > plus very slow set() and cancel(), as they'll  have to traverse all entries in the
> > list.
> > > And all that - for empty from user perspective alarm_list
> > > So I still prefer Michal's way.
> > > After all, it doesn't look that complicated to me.
> > Except that the need for Michals fix arose from the fact that we have two free
> > locations that might both get called depending on the situation.  Thats what I'm
> > trying to address here, the complexity itself, rather than the fix (which I
> > agree is perfectly valid).
> > 
> > > BTW, any particular reason you are so negative about pthread_self()?
> > >
> > Nothing specifically against it (save for its inverted return code sense, which
> > made it difficult for me to parse when reviewing).  Its more the complexity
> > itself in the alarm cancel and callback routine that I was looking at.  Given
> > that the origional bug happened because an cancel in a callback might produce a
> > double free, I wanted to fix it by simpifying the code, not adding conditions
> > which make it more complex.
> > 
> > You know, looking at it, something else just occured to me.  I think this could
> > all be fixed without the cancel flag or the pthread_self assignments.  What if
> > we simply removed the alarm from the list before we called the callback in
> > rte_eal_alarm_callback()?  That way any cancel operation called from within the
> > callback would fail, as it wouldn't appear on the list, and the callback
> > operation would be the only freeing entity?  That would let you still have a
> > fast set and cancel, and avoid the race.  Thoughts?  Untested sample patch
> > below
> > 
> > 
> > > >
> > > > It also seems like the alarm api as a whole could use some improvement.
> > The
> > > > way its written right now, theres no way to refer to a specific alarm (i.e.
> > > > cancelation relies on the specification of a function and data pointer, which
> > > > may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an opaque
> > > > handle to a unique timer instance that can be store by a caller and used to
> > > > specfically cancel that timer?  Thats how both the bsd and linux timer
> > > > subsystems model timers.
> > >
> > > Yeh,  alarm API looks a bit unusual.
> > > Though, I suppose that's subject for another patch/discussion :)
> > >
> > Yes, agreed :)
> > 
> 
> Please read quoted message bellow:
> 
> > >
> > >
> > > His solution *does* eliminate race condition too.
> > >
> > I applied his patch. And here is the problem
> > 1	rte_spinlock_lock(&alarm_list_lk);
> > 2	while ((ap = LIST_FIRST(&alarm_list)) !=NULL &&
> > 3			gettimeofday(&now, NULL) == 0 &&
> > 4			(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> > now.tv_sec &&
> > 5						ap->time.tv_usec <=
> > now.tv_usec))){
> > 6		ap->executing |= ALARM_EXECUTING;
> > 7		if (likely(!(ap->executing & ALARM_CANCELLED))) {
> > 8				rte_spinlock_unlock(&alarm_list_lk);
> > 9                           //another thread: rte_alarm_cancel called, mark this timer
> > canceled and exit ( THE RACE)
> > 10				ap->cb_fn(ap->cb_arg); // rte_alarm_set called
> > (THE RACE)
> > 11
> > 12				rte_spinlock_lock(&alarm_list_lk);
> > 13		}
> > 14
> > 15		rte_spinlock_lock(&alarm_list_lk);
> > 16		LIST_REMOVE(ap, next);
> > 17		rte_free(ap);
> > 18	}
> > 
> > Imagine
> > 
> > Thread 1:                                         Thread2
> > Execute eal_alarm_callback
> > Lock list at 1                                   rte_alarm_cancel -> block on spinlock
> > ....
> > Realease lock at line 8                  rte_alarm_cancel -> resumes execution -> it
> > mark this timer canceled
> > ap->cb_fn is called at line 10       rte_alarm_cancel exits reporting all alarms are
> > canceled and not executing (which is not true!)
> >     rte_alarm_set is called
> >     to rearm itself (THE RACE)
> > 
> > He only mark timer to * do not execute* but does not assure that it is not
> > executing while canceling.
> > Race condition is made by unlocking at line 8 and our solution workarounds this
> > by looping until all alarms finish execution then cancel what left after callback
> > finish (*including rearmed alarms*).
> > Real elimination of the race would be by using recursive locking when *other
> > thread can't* access the list *while callback* is running but callback *itself can
> > by using recursive locking*.
> > 
> > Maybe I don't see something obvious? :)

I think you're missing the fact that your patch doesn't do what you assert above
either :)

First, lets address rte_alarm_set.  There is no notion of "re-arming" in this
alarm implementation, because theres no ability to refer to a specific alarm
from the callers perspective.  When you call rte_eal_alarm_set you get a new
alarm every time.  So I don't really see a race there.  It might not be exactly
the behavior you want, but its not a race, becuase you're not modifying an alarm
in the middle of execution, you're just creating a new alarm, which is safe.

There is a race in what you describe above, insofar as its possible that you
might call rte_eal_alarm_cancel and return without having canceled all the
matching alarms.  I don't see any clear documentation on what the behavior is
supposed to be, but if you want to ensure that all matching alarms are cancelled
or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in linux
API parlance, thats usually denoted as a cancel_sync operation).

For that race condition, you're correct, my patch doesn't address it, I see that
now.  Though your patch doesn't either.  If you call rte_eal_alarm_cancel from
within a callback function, then, by definition, you can't wait on the
completion of the active alarm, because thats a deadlock.  Its a necessecary
evil, I grant you, but it means that you can't be guaranteed the cancelled and
complete (cancel_sync) behavior that you want, at least not with the current
api.  If you want that behavior, you need to do one of two things:

1) Modify the api to allow callers to individually reference timer instances, so
that when cancelling, we can return an appropriate return code to indicate to
the caller that this alarm is in-progress.  That way you can guarantee the
caller that the specific alarm that you cancelled is either complete and cancelled
or currently executing.  Add an api to expicitly wait on a referenced alarm as
well.  This allows developers to know that, when executing an alarm callback, an
-ECURRENTLYEXECUTING return code is ok, because they are in the currently
executing context.


2) Understand that the guarantee can't be met, and change nothing, because you
consider it ok for an in-progress alarm to be ignored when getting cancelled.


Neil



More information about the dev mailing list