Memory Management

Memory Hierarchy

┌─────────────┐  Fastest, smallest, most expensive
│  Registers   │  ~0.3 ns, ~KB
├─────────────┤
│  L1 Cache    │  ~1 ns, ~64 KB
├─────────────┤
│  L2 Cache    │  ~4 ns, ~256 KB
├─────────────┤
│  L3 Cache    │  ~12 ns, ~8 MB
├─────────────┤
│  Main Memory │  ~100 ns, ~16-64 GB
│  (DRAM)      │
├─────────────┤
│  SSD         │  ~100 μs, ~1 TB
├─────────────┤
│  HDD         │  ~10 ms, ~4 TB
└─────────────┘  Slowest, largest, cheapest

Virtual Memory

Virtual memory gives each process the illusion of its own private, contiguous address space, even though physical memory is shared and fragmented.

Key Benefits

Isolation — Processes can't access each other's memory
Abstraction — Programs don't need to know physical addresses
Overcommit — Total virtual memory can exceed physical RAM (using disk swap)
Sharing — Libraries can be mapped into multiple processes (shared pages)

Address Translation

Virtual Address → [MMU + Page Table] → Physical Address

┌──────────────────┬───────────┐
│  Virtual Page #   │  Offset   │
└──────────────────┴───────────┘
         │
         ▼
   ┌──────────┐
   │Page Table │
   │  Entry    │ → Frame # (physical)
   └──────────┘
         │
         ▼
┌──────────────────┬───────────┐
│  Physical Frame # │  Offset   │
└──────────────────┴───────────┘

Paging

The OS divides virtual and physical memory into fixed-size pages (typically 4 KB).

Page Table Entry (PTE)

Field	Purpose
Frame number	Physical frame address
Present/Valid bit	Is page in physical memory?
Dirty bit	Has page been modified?
Referenced bit	Has page been accessed recently?
Protection bits	Read/Write/Execute permissions

Multi-Level Page Tables

Problem: A flat page table for 64-bit addresses would be enormous.

Solution: Hierarchical page tables — only allocate entries for used address ranges.

4-Level Page Table (x86-64):
Virtual Address (48 bits used):
┌─────┬─────┬─────┬─────┬────────────┐
│PML4 │ PDP │ PD  │ PT  │   Offset   │
│9 bit│9 bit│9 bit│9 bit│  12 bits   │
└──┬──┴──┬──┴──┬──┴──┬──┴────────────┘
   │     │     │     │
   ▼     ▼     ▼     ▼
  L4 →  L3 →  L2 →  L1 → Physical Frame

TLB (Translation Lookaside Buffer)

A hardware cache for recent virtual-to-physical translations.

Aspect	Detail
Hit time	~1 ns
Miss penalty	~10-100 ns (page table walk)
Typical size	64–1024 entries
Hit rate	>99% for most workloads

TLB Flush: On context switch between processes, the TLB may be flushed (expensive). Tagged TLBs (ASID) avoid full flushes.

Page Replacement Algorithms

When physical memory is full and a new page is needed, one must be evicted.

Algorithm	Description	Optimal?	Practical?
OPT	Replace page not used for longest future time	Yes	No (requires future knowledge)
FIFO	Replace oldest page	No	Simple but Belady's anomaly
LRU	Replace least recently used	Near-optimal	Expensive to track perfectly
Clock (Second Chance)	Circular FIFO with reference bits	Good	Practical LRU approximation
LFU	Replace least frequently used	Good for some workloads	Doesn't adapt to changes

Clock Algorithm (Second Chance)

Pages in circular buffer with reference bit:
┌─R1──R1──R0──R1──R0──R1─┐
│  A    B    C    D    E    F  │  ← circular
└─────────────────────────┘
         ↑ clock hand

On page fault:
1. Check page at clock hand
2. If reference bit = 1: set to 0, advance hand
3. If reference bit = 0: EVICT this page, load new page here
4. Repeat until victim found

Segmentation

Divides memory into variable-sized segments based on logical divisions (code, stack, heap, data).

Aspect	Paging	Segmentation
Unit size	Fixed (4KB)	Variable
Fragmentation	Internal (wasted space in page)	External (gaps between segments)
View	Physical division	Logical division
Modern use	Primary mechanism	Minimal (x86-64 uses flat segments)

Practical: malloc and Memory Allocation

Process Memory Layout (Linux):
┌──────────────────┐ High addresses
│     Stack ↓      │ Local vars, return addresses
├──────────────────┤
│                  │
│   (free space)   │
│                  │
├──────────────────┤
│     Heap ↑       │ malloc/new allocations
├──────────────────┤
│   BSS Segment    │ Uninitialized globals (zeroed)
├──────────────────┤
│   Data Segment   │ Initialized globals
├──────────────────┤
│   Text Segment   │ Machine code (read-only)
└──────────────────┘ Low addresses

Allocator Strategies

Strategy	How	Trade-off
First Fit	Scan from start, use first block that fits	Fast, but external fragmentation
Best Fit	Find smallest block that fits	Less waste, slower
Worst Fit	Find largest block	Leaves large holes (bad in practice)
Buddy System	Split/merge power-of-2 blocks	Fast coalescing, internal fragmentation
Slab Allocator	Pre-allocate objects of same size	Excellent for kernel objects

Key Takeaways

Virtual memory provides isolation, abstraction, and overcommit — foundational to modern OSes
Paging with multi-level page tables is the standard — TLB makes it fast
Page replacement matters under memory pressure — LRU approximations (Clock) are standard
Stack vs Heap: Stack is LIFO, fast, auto-managed. Heap is flexible, manual/GC-managed, fragmentation-prone