[PATCH 2/3] mem: fix ASan shadow for remapped memory segments

Burakov, Anatoly anatoly.burakov at intel.com
Tue Apr 26 14:54:36 CEST 2022


On 21-Apr-22 2:18 PM, Burakov, Anatoly wrote:
> On 21-Apr-22 10:37 AM, David Marchand wrote:
>> On Wed, Apr 20, 2022 at 4:47 PM Burakov, Anatoly
>> <anatoly.burakov at intel.com> wrote:
>>>
>>> On 15-Apr-22 6:31 PM, David Marchand wrote:
>>>> When releasing some memory, the allocator can choose to return some
>>>> pages to the OS. At the same time, this memory was poisoned in ASAn
>>>> shadow. Doing the latter made it impossible to remap this same page
>>>> later.
>>>> On the other hand, without this poison, the OS would pagefault in any
>>>> case for this page.
>>>>
>>>> Remove the poisoning for unmapped pages.
>>>>
>>>> Bugzilla ID: 994
>>>> Fixes: 6cc51b1293ce ("mem: instrument allocator for ASan")
>>>> Cc: stable at dpdk.org
>>>>
>>>> Signed-off-by: David Marchand <david.marchand at redhat.com>
>>>> ---
>>>>    lib/eal/common/malloc_elem.h |  4 ++++
>>>>    lib/eal/common/malloc_heap.c | 12 +++++++++++-
>>>>    2 files changed, 15 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/eal/common/malloc_elem.h 
>>>> b/lib/eal/common/malloc_elem.h
>>>> index 228f178418..b859003722 100644
>>>> --- a/lib/eal/common/malloc_elem.h
>>>> +++ b/lib/eal/common/malloc_elem.h
>>>> @@ -272,6 +272,10 @@ old_malloc_size(struct malloc_elem *elem)
>>>>
>>>>    #else /* !RTE_MALLOC_ASAN */
>>>>
>>>> +static inline void
>>>> +asan_set_zone(void *ptr __rte_unused, size_t len __rte_unused,
>>>> +     uint32_t val __rte_unused) { }
>>>> +
>>>>    static inline void
>>>>    asan_set_freezone(void *ptr __rte_unused, size_t size 
>>>> __rte_unused) { }
>>>>
>>>> diff --git a/lib/eal/common/malloc_heap.c 
>>>> b/lib/eal/common/malloc_heap.c
>>>> index 6c572b6f2c..5913d9f862 100644
>>>> --- a/lib/eal/common/malloc_heap.c
>>>> +++ b/lib/eal/common/malloc_heap.c
>>>> @@ -860,6 +860,7 @@ malloc_heap_free(struct malloc_elem *elem)
>>>>        size_t len, aligned_len, page_sz;
>>>>        struct rte_memseg_list *msl;
>>>>        unsigned int i, n_segs, before_space, after_space;
>>>> +     bool unmapped_pages = false;
>>>>        int ret;
>>>>        const struct internal_config *internal_conf =
>>>>                eal_get_internal_configuration();
>>>> @@ -999,6 +1000,13 @@ malloc_heap_free(struct malloc_elem *elem)
>>>>
>>>>                /* don't care if any of this fails */
>>>>                malloc_heap_free_pages(aligned_start, aligned_len);
>>>> +             /*
>>>> +              * Clear any poisoning in ASan for the associated 
>>>> pages so that
>>>> +              * next time EAL maps those pages, the allocator can 
>>>> access
>>>> +              * them.
>>>> +              */
>>>> +             asan_set_zone(aligned_start, aligned_len, 0x00);
>>>> +             unmapped_pages = true;
>>>>
>>>>                request_sync();
>>>>        } else {
>>>> @@ -1032,7 +1040,9 @@ malloc_heap_free(struct malloc_elem *elem)
>>>>
>>>>        rte_mcfg_mem_write_unlock();
>>>>    free_unlock:
>>>> -     asan_set_freezone(asan_ptr, asan_data_len);
>>>> +     /* Poison memory range if belonging to some still mapped 
>>>> pages. */
>>>> +     if (!unmapped_pages)
>>>> +             asan_set_freezone(asan_ptr, asan_data_len);
>>>>
>>>>        rte_spinlock_unlock(&(heap->lock));
>>>>        return ret;
>>>
>>> I suspect the patch should be a little more complicated than that. When
>>> we unmap pages, we don't necessarily unmap the entire malloc element, it
>>> could be that we have a freed allocation like so:
>>>
>>> | malloc header | free space | unmapped space | free space | next malloc
>>> header |
>>>
>>> So, i think the freezone should be set from asan_ptr till aligned_start,
>>> and then from (aligned_start + aligned_len) till (asan_ptr +
>>> asan_data_len). Does that make sense?
>>
>> (btw, I get a bounce for Zhihong mail address, is he not working at
>> Intel anymore?)
>>
>> To be honest, I don't understand if we can get to this situation :-)
>> (especially the free space after the unmapped region).
>> But I guess you mean something like (on top of current patch):
>>
>> @@ -1040,9 +1040,25 @@ malloc_heap_free(struct malloc_elem *elem)
>>
>>          rte_mcfg_mem_write_unlock();
>>   free_unlock:
>> -       /* Poison memory range if belonging to some still mapped 
>> pages. */
>> -       if (!unmapped_pages)
>> +       if (!unmapped_pages) {
>>                  asan_set_freezone(asan_ptr, asan_data_len);
>> +       } else {
>> +               /*
>> +                * We may be in a situation where we unmapped pages 
>> like this:
>> +                * malloc header | free space | unmapped space | free
>> space | malloc header
>> +                */
>> +               void *free1_start = asan_ptr;
>> +               void *free1_end = aligned_start;
>> +               void *free2_start = RTE_PTR_ADD(aligned_start, 
>> aligned_len);
>> +               void *free2_end = RTE_PTR_ADD(asan_ptr, asan_data_len);
>> +
>> +               if (free1_start < free1_end)
>> +                       asan_set_freezone(free1_start,
>> +                               RTE_PTR_DIFF(free1_end, free1_start));
>> +               if (free2_start < free2_end)
>> +                       asan_set_freezone(free2_start,
>> +                               RTE_PTR_DIFF(free2_end, free2_start));
>> +       }
>>
>>          rte_spinlock_unlock(&(heap->lock));
>>          return ret;
>>
> 
> Something like that, yes. I will have to think through this a bit more, 
> especially in light of your func_reentrancy splat :)
> 

So, the reason splat in func_reentrancy test happens is as follows: the 
above patch is sorta correct (i have a different one but does the same 
thing), but incomplete. What happens then is when we add new memory, we 
are integrating it into our existing malloc heap, which triggers 
`malloc_elem_join_adjacent_free()` which will trigger a write into old 
header space being merged, which may be marked as "freed". So, again we 
are hit with our internal allocator messing with ASan.

To properly fix this is to answer the following question: what is the 
goal of having ASan support in DPDK? Is it there to catch bugs *in the 
allocator*, or can we just trust that our allocator code is correct, and 
only concern ourselves with user-allocated areas of the code? Because it 
seems like the best way to address this issue would be to just avoid 
triggering ASan checks for certain allocator-internal actions: this way, 
we don't need to care what allocator itself does, just what user code 
does. As in, IIRC there was a compiler attribute that disables ASan 
checks for a specific function: perhaps we could just wrap certain 
access in that and be done with it?

What do you think?

-- 
Thanks,
Anatoly


More information about the stable mailing list