ptmalloc2 internals

What is the heap?

I had decided to write this due to my lack of organization and my plethora of notes I had taken
during these past few months. I wanted to organize them into one big writeup so I could piece things
together, and I decided that while I was at it I might as well throw it on here :)

Table of contents:

1. Overview

2. Implementation

3. Internals

4. Exploitation

Overview

The heap, at its highest level is global dynamic memory. This means that it will dynamically interact with the system to request memory for our precious programs to use. This is important for us, as local variables on the stack will be deallocated after the stack frame is cleared.

The heap on the other hand, allows us to store massive amounts of data, in a global memory space away from the lifetime of any variable on the stack. We also cannot directly interact with it, as we are returned a pointer to the allocated chunk instead of using the relative offsets of rsp and rbp.

In this explanation, I will only be talking about glibc's ptmalloc, which stands for POSIX thread aware malloc. This is the newer version of dlmalloc(doug lee malloc). In essence, the difference between the two is optimization and threading.

The functions that we will be dealing with mainly are:

malloc()    - allocates memory and returns a pointer
free()      - free's wherever provided pointer, points to

Implementation

There are several ways to implement dynamic memory from data structures to when you require more memory than what the stack may provide. It should be noted that the heap is astronomically slower than local allocation on the stack. It is much more efficient and safe to use local variables when needed.

Here i will be explaining the basics in which malloc, calloc, realloc, and free work and the scenarios we can use them in. A data structure i will be detailing here is the linked list, which will become extremely important later on when we get into the internals of ptmalloc so make sure you really understand the concept of them.

the normal convention in which malloc is called is:

unsigned long long ptr = (type*)malloc(size);

we will cast whichever data type we want, and allocate whichever size we want.

also, just a little note. The chunk size in which you provide malloc must be 8 byte aligned it can also be a multiple of 2 i guess, any multiple of 2 is ok for malloc. here is an example of how to use malloc within an application

int main(int argc, char**argv) {
    int ptr = (char*)malloc(0x10);     // 16 in decimal ; places of 16 ; base 16
    printf("%p", &ptr);
}

first, we initialize our pointer with int ptr. we cast a random data type to malloc(best practice) and allocate 0x10 bytes, which is 16 in decimal. Next, we print out the value of ptr after malloc() lets see the result:

0x7fff77bfead4

This is the address of our chunk in the heap. Ptmalloc() will only return the USABLE memory that we have allocated, the extra bytes will be for metadata(I will explain thoroughly in internals). so if malloc returns a pointer to our allocated chunk, how do we use free?

lets use the same script, just two more lines

int main(int argc, char**argv) {
    int *ptr = (char*)malloc(0x10);     // 16 in decimal ; places of 16 ; base 16
    printf("%p\n", &ptr);
    free(ptr);
    printf("Pointer has been freed: %p\n", &ptr);
}

Now if we look at the output, we can see that it is exactly the same address

0x7ffc14f0b820
Pointer has been freed: 0x7ffc14f0b820

This is another attack vector for us, obviously this is a bad place to mention exploitation within the implementation section so I will keep things brief and try to help you write safe code. What we have here is a "dangling pointer", or a potential Use After Free vulnerability.

Our chunk we had previously malloc'd had now been freed, but our pointer still remains. Just remember to not forget about pointers, since they can lead to devastating consequences.

Anyways, lets see a real implementation of the heap within a data structure

#include <stdlib.h>
struct node {
    int data;
    struct node* next;
};

int main() {
    struct node* first = malloc(sizeof(struct node));
    struct node* second = malloc(sizeof(struct node));
    struct node* third = malloc(sizeof(struct node));

    first->data=1;
    first->next=second;

    second->data=2;
    second->next=third;

    third->data=3;
    third->next=NULL;

    printf("First: %d -> Second: %d -> Third: %d",
                                            first->data,
                                            second->data,
                                            third->data);
    struct node* tmp;
    while(first!=NULL) {
        tmp=first;
        first=first->next;
        free(tmp);
    }
    return 0;
}

anyhow, lets take a look at this and explain how dynamic memory is implemented within this data structure we will first create the node struct, this is essentially the "template" for each of our nodes

struct node {
    int data;
    struct node* next;
};

as we can see, it hold a data variable, and a pointer to the next node. the data integer will hold whatever we want it to, since this is still a list that we want to make use of. next, we have the heap operations:

struct node* first = malloc(sizeof(struct node));
struct node* second = malloc(sizeof(struct node));
struct node* third = malloc(sizeof(struct node));

these will create 3 nodes from our node struct. they will point to the allocated chunk, in which its size is the size of the struct node. This makes sure that we dont waste any memory when allocating structs. now, first, second, and third all point to a location on the heap.

These are allocated on dynamic memory, which means this list can grow, or shrink. That is the upside to using linked lists, as their size is not statically set at compile time. They are allowed to grow however they please.

Next, we will initialize the variable and pointer of the linked list

first->data=1;
first->next=second;

second->data=2;
second->next=third;

third->data=3;
third->next=NULL;

lets take a look at our first initialization of data
the '->' operator in c is essentially the same as the '.' operator

the difference between

first.data=1;

and

first->data=1;

is that '->' will return a pointer, and '.' will attempt to directly access that variable
next, we can see that it makes our first node point to the second node:

first->next=second;

we are created a "linked list" we will continue this process until the next node for our third struct is NULL. This will conclude our linked list, so we do not iterate over.

after we initialize our linked list, we output the list. A smarter way to output this would be to use a for loop.

Our next snippet of code:

struct node* tmp;
while(first!=NULL) {
    tmp=first;
    first=first->next;
    free(tmp);
}

This piece of code demonstrates a valid implementation of our free() function. There is much much more built into free() that we will explain in the internals module.

it will check if our first node is NULL then, it will save our first node into a tmp node our first node will then point to first->next, our next node within the linked list then it will free tmp, our pointer to the first node's allocation on the heap

it will then repeat the process until there are no nodes left on the list.

now that i have explained free() and malloc(), i hope you understand it now. If not, I am sorry for failing you, but hopefully these sources wont:

https://www.geeksforgeeks.org/data-structures/linked-list/

https://www.youtube.com/watch?v=njTh_OwMljA

https://www.learn-c.org/en/Linked_lists

Internals

In this internals module, I will give my best shot at explaining everything about dynamic memory allocators, from caching and optimization to threading support.

Before we get started on the internals of the functions that allow us to play with dynamic memory, lets first learn the layout of the heap.

 |  +---------+ <-- chunk 1
 |  |  data   | <-- metadata
 |  +---------+ <--- heap
 |  | AAAA    |
 |  |    BBBB |
 V  +---------+
    +---------+
 Λ  |   ebp   |
 |  +---------+
 |  |         |
 |  |         | <--- stack
 |  |         |
 |  |         |
 |  +---------+
    |   esp   |
    +---------+

The stack will grow downwards, while the heap will grow upwards I know this may seem strange, since the chart presented shows the heap growing downwards, and stack upwards, but these structures just work like this. higher values, will start from top to bottom, so the higher the addresses, the lower you will go, theoretically speaking.

Think of the stack/memory as an upside down bucket, still a LIFO data structure, just an upside down bucket, instead of one facing upwards.

Gravity is your worst enemy when studying a concept like the stack!!

Overview

Malloc Internals

malloc algorithm, the process malloc takes in the process of allocating your precious dynamic memory: 1. if there is an exact match within tcache, it will use that cached chunk 2. if the chunk is too large for brk to allocate within the bounds of the wilderness, mmap() will be used to directly request memory from the operating system. 3. next, it will use a fastbin(simple chunk) to represent that chunk, must be less than or equals to 64 bytes 4. if the allocation size is too large for a fastbin, then it will take everything in the fastbins and move into an unsorted bin. 5. start taking chunks off of the unsorted list and begin moving them into small/large bins 6. if the request is "large", then begin looking through the largebins for an exact match 7. if chunks still exist within fastbin, consolidate and repeat

Free Internals

free algorithm, the process in which free() takes to find, cache, and deallocate memory from the heap 1. if there is room in tcache, cache the chunk then return 2. if chunk is small enough, place within a fastbin 3. if the chunk was allocated with mmap, mummap it 4. put chunk into unsorted list/bin 5. if chunk is large enough, use fastbins to see if any room to give memory back to system

That was a little overview of the process in which both of them take to allocate memory. It is ok if you do not understand them at first, I will go through and explain each keyword mentioned

Metadata

what is heap metadata?

when we call malloc, we allocate a chunk, which not only contains memory for us to write and read from and to, but it contains associated metadata within that chunk that allows other heap functions to work with it well. This is how free understands how many bytes to free.

we have already encountered metadata when you free() an allocation, the data of that allocated/freed memory that area was reused for tcache metadata to create the tcache linked list this is called per-chunk metdata

what is a chunk? what we see as addresses in memory, that malloc returns, is not the real address of what ptmalloc tracks, ptmalloc will track the chunk address, but will return the USEABLE portion of that chunk on the heap

the fact that chunk sizes need to be multiples of 0x10 this means that it will need to pad higher than the size that has been requested by ptmalloc

so when you call malloc, it will always return a pointer to a memory region, greater or equals to the size you had requested. This is due to the fact that it stores metadata within itself, and that metadata needs space here is the source to what a chunk should contain:

struct malloc_chunk {
    INTERNAL_SIZE_T     mchunk_prev_size;
    INTERNAL_SIZE_T     mchunk_size;

    struct malloc_chunk* fd;
    struct malloc_chunk* bk;

    struct malloc_chunk* fd_nextsize;
    struct malloc_chunk* bk_nextsize;
};
typedef struct malloc_chunk* mchunkptr;

mchunk_prev_size and mchunk_size will contain sizes for free(), or for cacheing purposes in which we will get into later. For now, all you need to know about this is that mchunk_prev_size will save the size of the previous chunk, if it is free. And mchunk_size will keep the size of the current chunk.

The fd and bk pointers are peculiar in the sense that, after the chunk on the heap has been freed, the fd and bk pointers will be initialized to form a doubly linked list. These pointers will only exist if the chunk has not been used yet, or after it has beed freed.

It also resides within the usable allocation that is returned by malloc, which CAN cause some security problems. We will elaborate more in the exploitation module.

We will also have flags, which are set by the lower(lsb) 3 bits here is a diagram to help visualize the chunk metadata:

   chunk -> +---------------------------------------------------------------+
            |size of previous chunk if unallocated(P clear) mchunk_prev_size|
            +---------------------------------------------------------------+ flags for mchunk_size:
            |         Size of chunk, in bytes mchunk_size             |2|1|0| bit 0: PREV_IN_USE
      mem-> +---------------------------------------------------------------+ bit 1: IS_MAPPED
            |             User data starts here...                          . bit 2: NON_MAIN_ARENA
            .                                                               .
            .             (malloc_usable_size() bytes)                      .
            .                                                               |
            +---------------------------------------------------------------+

nextchunk-> +---------------------------------------------------------------+
            |             (size of chunk, but used for application data)    |
            +---------------------------------------------------------------+
            |             Size of next chunk, in bytes                |2|0|1|
            +---------------------------------------------------------------+
            |                                                               |
            :                           etc...                              :
            .                                                               .

P (PREV_INUSE): 0 when previous chunk (not the previous chunk in the linked list, but the one directly
before it in memory) is free (and hence the size of previous chunk is stored in the first field).
The very first chunk allocated has this bit set. If it is 1, then we cannot determine the size of the
previous chunk.

M (IS_MMAPPED): The chunk is obtained through mmap. The other two bits are ignored. mmapped chunks are
neither in an arena, not adjacent to a free chunk.

A (NON_MAIN_ARENA): 0 for chunks in the main arena. Each thread spawned receives its own arena and for
those chunks, this bit is set.

These flags we will elaborate on later as well.

Tcache

What is tcache?

Thread local caching, in ptmalloc(posix thread aware malloc), is utilized to speed up the process of
repeated and small allocations within the bounds of a single thread. It is a bin that will store
recently freed chunks. The tcache will consist of a singly linked list, in which each entry will
point to the next, while simultaneously pointing to the main tcache table.

This is an important optimization layer for ptmalloc() as the extent of it's awareness of threads
are that they exist. It acknowledges the fact that threads exist but does nothing to optimize it.
This is where tcache comes in.

A little joke i hear tossed around is that optimization, like tcache, leads to security bugs, which leads to forcing security checks within malloc, which leads to slower but secure code, which leads to application developers complaining about overhead and speed, which leads to vulnerable optimization implementations within malloc.

The process in which tcache is used is this. when you free a chunk, it will first go through tcache, if its small enough. If tcache is not available, then the chunk will be put into a fastbin, for allocations no larger than 160 bytes.

If the fastbin doesnt work, is unavailble, or larger than expected chunk size, then that chunk will be thrown into a unsorted bin. It is a system consisting of doubly linked lists of chunks that have been recently freed, and not yet sorted into better fitting bins or cache.

This is used to catch quick freed chunk reuse scenarios. If we attempt to malloc a chunk, and there is no exact match found within the tcache or fastbin, the next place malloc will look is in the unsorted bin.

If there is no match, then we will take the desired chunk size and sort it into either 64 doubly linked "small bins", for allocations up to 512 bytes. Or doubly linked "large bins", for again, allocations up to but not limited to 512 bytes(0x200 in base 16/hex).

Tcache's structure consists of the following code:

typedef struct tcache_perthread_struct {
    char counts[TCACHE_MAX_BINS];
    tcache_entry *entries[TCACHE_MAX_BINS];
    tcache_perthread_struct;
} tcache_perthread_struct;

typedef struct tcache_entry {
    struct tcache_entry *next;
    struct tcache_perthread_struct *key;
} tcache_entry;

tcache_perthread_struct will be the main tcache table in which we will insert separate tcache entries. The counts, will keep count of the amount of likewise sized chunks on the heap. Entries will keep track of each tcache entry.

Our entries resemble our linked list, as the struct will keep track of the next node whilst also keeping track of the main tcache_perthread_struct table.

lets do a little challenge to visualize how tcache works we have the following pseudocode, which allocates bytes in a semi random order

int main() {
    a = malloc(16);
    b = malloc(16);
    c = malloc(32);
    d = malloc(48);
    e = malloc(32);
    f = malloc(32);

    // breakpoint

    free(b);
    free(a);
    free(f);
    free(e);
    free(c);
    free(d);
}
         +-----+-----+-----+-----+
counts:  |16: 0|32: 0|48: 0|64: 0|
         |     |     |     |     |
         +-----+-----+-----+-----+-----------+----------+
entries: |16:        |32:        | 48:       |64:       |
         |           |           |           |          |
         +-----------+-----------+-----------+----------+

When we first allocate each of our chunks on the heap with malloc, there will still be nothing within tcache. This is due to the fact that tcache only comes into play AFTER we have freed the chunk. It will "cache", that chunk and store it for potential later use.

lets look at the aftermath of the first free:

free(b)
         +-----+-----+-----+-----+
counts:  |16: 1|32: 0|48: 0|64: 0|
         |     |     |     |     |
         +-----+-----+-----+-----+-----------+----------+
entries: |16: &b     |32:        | 48:       |64:       |
         |           |           |           |          |
         +-----------+-----------+-----------+----------+

lets take a look at the next free:

free(b)
free(a)
         +-----+-----+-----+-----+
counts:  |16: 2|32: 0|48: 0|64: 0|
         |     |     |     |     |
         +-----+-----+-----+-----+-----------+----------+
entries: |16: &b &a  |32:        | 48:       |64:       |
         |           |           |           |          |
         +-----------+-----------+-----------+----------+

now a and be are the same allocation size, so this is the result of the tcache 2 counts of 16 byte allocations, and 2 references to our allocations in the tcache entries This is how tcache works, it will cache each chunk after it has been freed. When malloc is called, it will first look through tcache entries to see if there is an exact match.

Normally, this would not be of much help but in a threaded enviroment, this helps wonders as "local thread cache", is extremely efficient at what it was made to do, obviously.

Bins

Here, I will talk about another caching mechanism within ptmalloc called bins tcache is an example of a bin, except it will only cache recently freed chunks

bins, are essentially a linked list of chunks, their purpose is to cache and organize the heap. Lets explain what each bin's purpose is within our precious dynamic allocator.

there are 4 types of bins:

Fast bin
Unsorted bin
Small bin
Large bin

The purpose for the fastbin is for small chunks that have been allocated. This will be a simpler chunk than the rest, for example the fd and bk pointers will not exist within our fastbin.

The next bin is the unsorted bin, this will hold our small and large bins. Our small bin acts as a FIFO queue of sorts, works only with small chunks. The last bin is a largebin, obviously to store and cache large chunks.

As we had mentioned before, malloc will first look though tcache to find chunks with its exact match, but what if it doesnt find any? It will then look through the fastbin, which is where chunks get cached after tcache is unavaiable.

As we had previously stated before, tcache is only for small allocations, around 64 bytes is the cutoff i believe. If the request is too large for brk to allocate within the bounds of our wilderness, which tells us how much space is left on the heap, we will use the mmap syscall to directly request memory from the operating system. If the allocation size is within the bounds of the wilderness, it will then choose to look through the fastbins.

The maximum size for an allocation to be represented by a fastbin is 64 bytes, and if it is too large for a fastbin, it will then be placed within an unsorted bin. Within the unsorted bin, it will then begin to take chunks off the unsorted linked list, and move them into small/large bins. If allocation size request is large enough, begin to look within the largebins for an exact match

The purpose for each of our bins is performance, application developers require no overhead and tip top speed in order to stay satisfied, so our dynamic memory allocator developers need to get creative with how to implement safe optimization of the heap.

wilderness

not done yet, very simple so will probablky finish soon

arenas

not done yet, complex and we will have to get into how threading works with POSIX

Exploitation

Let me start off with this disclaimer, THERE ARE SO MANY TECHNIQUES IN WHICH YOU CAN EXPLOIT THE HEAP It will take me a few months before I can practice and understand them all. This might get you thinking, if there are so many heap vulnerabilities, why arent there 0days being dropped left and right?

This is due to these techniques requiring a fair amount of control over heap operations, and only within specific scenarios. The enviroment, program, and version must all be perfectly compliant with the techniques requirements before any exploitation can take place. That, coupled with the fact that there are so many of them has made me fully admit that heap exploitation is the most exciting thing i've ever attempted to learn.

That being said, I will link a plethora of amazing resources like how2heap, phrack, and many more on heap exploitation. There is no way I will be able to cover them all, but I will be attempting to explain the techniques and vulnerabilities that I understand fairly well. This ensures that I dont say anything stupid, that will confuse you as well, anyways lets get on with the explanation.

and im not done with it either :(

Last updated