[dpdk-dev] [PATCH v3 2/2] mem: revert to using flock() and add per-segment lockfiles

Andrew Rybchenko arybchenko at solarflare.com
Sat Apr 28 11:38:48 CEST 2018


On 04/25/2018 01:36 PM, Anatoly Burakov wrote:
> The original implementation used flock() locks, but was later
> switched to using fcntl() locks for page locking, because
> fcntl() locks allow locking parts of a file, which is useful
> for single-file segments mode, where locking the entire file
> isn't as useful because we still need to grow and shrink it.
>
> However, according to fcntl()'s Ubuntu manpage [1], semantics of
> fcntl() locks have a giant oversight:
>
>    This interface follows the completely stupid semantics of System
>    V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all
>    locks associated with a file for a given process are removed
>    when any file descriptor for that file is closed by that process.
>    This semantic means that applications must be aware of any files
>    that a subroutine library may access.
>
> Basically, closing *any* fd with an fcntl() lock (which we do because
> we don't want to leak fd's) will drop the lock completely.
>
> So, in this commit, we will be reverting back to using flock() locks
> everywhere. However, that still leaves the problem of locking parts
> of a memseg list file in single file segments mode, and we will be
> solving it with creating separate lock files per each page, and
> tracking those with flock().
>
> We will also be removing all of this tailq business and replacing it
> with a simple array - saving a few bytes is not worth the extra
> hassle of dealing with pointers and potential memory allocation
> failures. Also, remove the tailq lock since it is not needed - these
> fd lists are per-process, and within a given process, it is always
> only one thread handling access to hugetlbfs.
>
> So, first one to allocate a segment will create a lockfile, and put
> a shared lock on it. When we're shrinking the page file, we will be
> trying to take out a write lock on that lockfile, which would fail if
> any other process is holding onto the lockfile as well. This way, we
> can know if we can shrink the segment file. Also, if no other locks
> are found in the lock list for a given memseg list, the memseg list
> fd is automatically closed.
>
> One other thing to note is, according to flock() Ubuntu manpage [2],
> upgrading the lock from shared to exclusive is implemented by dropping
> and reacquiring the lock, which is not atomic and thus would have
> created race conditions. So, on attempting to perform operations in
> hugetlbfs, we will take out a writelock on hugetlbfs directory, so
> that only one process could perform hugetlbfs operations concurrently.
>
> [1] http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html
> [2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html
>
> Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
> Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
> Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime")
> Fixes: 2a04139f66b4 ("eal: add single file segments option")
> Cc: anatoly.burakov at intel.com
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
> Acked-by: Bruce Richardson <bruce.richardson at intel.com>

We have a problem with the changeset if EAL option -m or --socket-mem is 
used.
EAL initialization hangs just after EAL: Probing VFIO support...
strace points to flock(7, LOCK_EX
List of file descriptors:
# ls /proc/25452/fd -l
total 0
lrwx------ 1 root root 64 Apr 28 10:34 0 -> /dev/pts/0
lrwx------ 1 root root 64 Apr 28 10:34 1 -> /dev/pts/0
lrwx------ 1 root root 64 Apr 28 10:32 2 -> /dev/pts/0
lrwx------ 1 root root 64 Apr 28 10:34 3 -> /run/.rte_config
lrwx------ 1 root root 64 Apr 28 10:34 4 -> socket:[154166]
lrwx------ 1 root root 64 Apr 28 10:34 5 -> socket:[154158]
lr-x------ 1 root root 64 Apr 28 10:34 6 -> /dev/hugepages
lr-x------ 1 root root 64 Apr 28 10:34 7 -> /dev/hugepages

I guess the problem is that there are two /dev/hugepages and
it hangs on the second.

Ideas how to solve it?

Andrew.


More information about the dev mailing list