Skip to content

Why stack is so fast?

  1. stack is aka: program stack, call stack, execution stack.

  2. a program (executable) must ask the OS for memory space. The OS allocates it, but while running, if the program tries to access memory it shouldn’t, the OS terminates it with Segmentation Fault (core dumped).

  3. external fragmentation: memory exists, but we still run out of usable space due to bad allocation patterns.

    external fragmentation

  4. requesting more memory takes time; hence using the heap gives a performance hit.

  5. if main memory (RAM) is exhausted, the OS gives storage-backed memory (aka virtual memory), but storage is thousands of times slower than RAM.

    comparison of speeds between different types of memory

  6. even fetching data from main memory has a noticeable cost, hence we have CPU caches (different from registers).
  7. if data is present in cache → cache hit; otherwise → cache miss (CPU stalls while data is fetched from main memory).
  8. hence keeping data compact and contiguous is beneficial to maximize cache hits (this is locality: spatial + temporal).

    cache and locality

  9. the size of the preallocated stack region depends on the compiler / OS / platform.

    size of stack in diff compilers

might seem small, but
1 MB = 1,048,576 bytes → 1,048,576 / 8 = 131,072 64-bit integers.

  1. stack is a contiguous memory region:
  2. beginning of stack = stack origin
  3. the stack pointer (SP) keeps track of the top (yes, it usually grows downwards).
  4. the stack pointer must be stored somewhere → it lives in a CPU register.

  5. stack allocation is extremely cheap:

  6. allocation = adjust stack pointer (SP +=/-= size)
  7. no OS call, no searching for free blocks
  8. heap allocation requires OS involvement, metadata, bookkeeping, and fragmentation-handling strategies → slower.

  9. when a function is called:

  10. a stack frame is pushed onto the stack.

    stack frame

  11. it contains:
    • local variables
    • function arguments (depending on ABI)
    • saved registers
    • return address (return-link)
  12. the starting point of the frame must be tracked so that after function completion, the stack pointer can be reset correctly.

  13. in recursion, if there is no base case:

  14. new stack frames keep getting added
  15. eventually, the stack runs out of space → stack overflow.

  16. each thread has its own stack:

  17. pros: no synchronization needed for local variables
  18. cons: fixed size per thread, memory overhead
  19. this is one of the reasons shared data structures usually live on the heap.

nice video for reference by core dumped