[dpdk-dev] Analysis for "lpm/lpm6: fix missing free of rules_tbl and lpm"

Christian Ehrhardt christian.ehrhardt at canonical.com
Fri Mar 4 14:28:18 CET 2016


Hi everybody,

I created a fix for it which will hit the mailing list soon, but considered
it important to send this mail ahead.
All that analysis has no place in the patch description, but it helps to
understand why/what was going on.
The follow up patch will have title "lpm/lpm6: fix missing free of
rules_tbl and lpm"

I ran into issues with the lpm6 autotest failing for me.

Looking at it I saw all kind of these:
Error at line 679:
ERROR: LPM Test tests6[i]: FAIL
LPM: LPM memory allocation failed
[...]

It turned out that 2500M memory would have been enough, but that couldn't
be the solution
With some debugging eventually it boiled down to
find_suitable_element(heap, size, flags, align, bound) not finding any
space.
While for the same sized allocation before it did find it.

Note: Along the way I found the use after free I submitted a patch this
morning.
I expected a leak, but valgrind wasn't too helpful, but then that was
expected as I guess that would be more an internal leak/fragmentation in
the structures than a real leak.

Thinking of a leak / fragmentation I have broken up the loop in test_lpm6
and ran them in segments:
- 1-end: failing at 13 and following as reported
- 13-end: working
- skipping some
... (you get the idea)
A bit like bisecting :-)
It turned out that idx 2 (=> test2) was very important, but not the only
source of the issue.

This particular test does iterative allocation and free with slightly
changed config (a bit smaller) each time.

It always failed at the 22nd allocation via rte_lpm6_create and all later
ones failed.
It really just is this innerloop:
for (i = 0; i < 100; i++) {
        config.max_rules = MAX_RULES - 100 + i;
        printf("INFO: %s - allocating for %d rules (%d/100)\n", __func__,
config.max_rules, i);
        lpm = rte_lpm6_create(__func__, SOCKET_ID_ANY, &config);
        TEST_LPM_ASSERT(lpm != NULL);
        rte_lpm6_free(lpm);
}

But while we see "LPM: LPM memory allocation failed" the following
assertion doesn't trigger.
NOTE: that is what was fixed by my patch this morning.

The failing alloc is for the rules tables:
rte_zmalloc_socket -> rte_malloc_socket -> malloc_heap_alloc ->
find_suitable_element with sizes usually at or close to "18000000".
That is ~17MB, as it fails at alloc 22 with a leak that would be ~374M for
these alone.
So as a ballpark estimation a leak or a fragmenting consumption makes sense
to assume.

Reporting heap->alloc_count in find_suitable_element proved that it was
exhausting the pool.
Once can see that the alloc_count is always increasing.

Then I realized that while the assignment that eventually fails is this:
lpm->rules_tbl = (struct rte_lpm6_rule
*)rte_zmalloc_socket(NULL,(size_t)rules_size, RTE_CACHE_LINE_SIZE,
socket_id);
There is no free for that pointer ever
  grep -Hrn -C 3 rules_tbl * | grep free

So I found in rte_lpm6_free that
- lpm might not be freed if it didn't find a te
- lpm->rules_tbl was not freed ever

As I said a patch will follow soon.

Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd


More information about the dev mailing list