Why Indirection Is the Price of Polymorphism, With Assembly You Can Read


 

Polymorphism lets you call methods through an interface without knowing the concrete type at compile time. The CPU and the ABI do not understand interfaces, traits, or virtuals. They only understand bytes, fixed register widths, and stack slots. That mismatch is the root cause of why runtime polymorphism uses indirection. In this extended version I will show concrete assembly flavored examples so you can see exactly where size knowledge matters, how vtables and fat pointers are used, and what gets passed in registers or on the stack. I will use x86-64 System V ABI conventions where practical. Exact assembly will vary by compiler and flags, but the patterns are stable. The goal is to make the invisible visible.

A tiny ABI checklist before we start

The ABI decides how arguments are passed and where return values land. On x86-64 System V:

  • Integer or pointer arguments are passed in RDI, RSI, RDX, RCX, R8, R9, then the stack.

  • Floating arguments use XMM0..XMM7.

  • The caller aligns RSP to a 16 byte boundary before call.

  • The callee must preserve RBX, RBP, R12..R15, and restore RSP.

  • By value objects must have a known compile time size so the caller can move the right number of bytes into registers or onto the stack.

If the compiler cannot know a type size, it cannot reserve space, spill, copy, or return it by value in ABI compliant code. That is why we pass a fixed size handle when the concrete type is unknown.

Example 1. C struct by value vs pointer

C code:

typedef struct {
    long a;
    long b;
} Pair;

long sum_pair(Pair p) { return p.a + p.b; }
long sum_pair_ptr(const Pair* p) { return p->a + p->b; }

Clang -O2 emits something like:

# long sum_pair(Pair p)
# ABI knows Pair is 16 bytes. With two longs, the compiler can pass in RDI and RSI or pack in registers.
sum_pair:
    lea     rax, [rdi + rsi]     # add p.a and p.b that arrived in regs due to known layout
    ret

# long sum_pair_ptr(const Pair* p)
sum_pair_ptr:
    mov     rax, qword ptr [rdi]       # p->a
    add     rax, qword ptr [rdi + 8]   # p->b
    ret

Key point. The by value call only works because the size and field layout of Pair is known at compile time. The compiler can place fields into registers or copy the right byte count. If Pair had unknown size, there would be no legal instruction sequence. A pointer works because it is always 8 bytes here.

Example 2. C++ virtual call through vtable and why by value fails

C++ code:

struct Animal { virtual ~Animal() {} virtual int speak() const = 0; };
struct Dog : Animal { long bark; int speak() const override { return 1; } };

int call_speak(const Animal& a) { return a.speak(); }

A plausible Itanium ABI style layout is:

  • Object memory starts with a vptr at offset 0. vptr points at the vtable.

  • The dog object memory: [vptr][bark:8 bytes].

A typical -O2 call site for call_speak ends up like:

# int call_speak(Animal const& a)
call_speak:
    mov     rax, qword ptr [rdi]     # load vptr from object at address in RDI
    mov     rax, qword ptr [rax + 16]# load function pointer from vtable slot for speak
    jmp     rax                      # tail call or "call rax" then "ret"

Notes:

  • The reference parameter is just a pointer in RDI. Fixed size, ABI friendly.

  • The callee reads the vptr, then reads a code pointer from a fixed table slot, then indirect jumps. The CPU never needs to know Dog vs Cat size here.

What if you try to pass Animal by value?

int bad(Animal a); // illegal, cannot instantiate abstract class

Even if the class were not abstract, passing by value would cause slicing. The caller would only copy the Animal base subobject, which has a different notion of size than any derived class. The ABI would still need the number of bytes to copy, which is only well defined for the base subobject. You would lose derived state and end up with wrong behavior. This is why dynamic dispatch is paired with references or pointers to base. The rule follows from the size requirement.

Example 3. Rust borrowed trait object and method call lowering

Rust code:

trait Speak { fn speak(&self) -> i32; }

struct Dog { bark: i64 }
impl Speak for Dog { fn speak(&self) -> i32 { 1 } }

fn call_speak(x: &dyn Speak) -> i32 { x.speak() }

fn demo() -> i32 {
    let d = Dog { bark: 42 };
    call_speak(&d)
}

Important runtime shape:

  • &dyn Speak is a fat pointer of two machine words.

    • data: pointer to Dog value

    • vtable: pointer to the Speak vtable for Dog

A typical lowering for call_speak looks like:

# Rust uses System V too. Let's describe the intent:
# RDI = address of fat pointer on caller stack or passed inline as two regs depending on ABI lowering
# Commonly Rust will pass the two words directly in RDI and RSI when calling into the function body.
# Assume RDI = data ptr, RSI = vtable ptr for clarity.

call_speak:
    # load function pointer from vtable
    mov     rax, qword ptr [rsi + SPEAK_OFFSET]   # method fn pointer
    # first argument for a method on &self is the data pointer as &T
    mov     rdi, rdi                               # self in RDI already
    jmp     rax

At the call site in demo, the compiler materializes the fat pointer:

demo:
    # allocate Dog on stack
    sub     rsp, 16
    mov     qword ptr [rsp], 42               # d.bark
    # set up fat pointer args for &dyn Speak
    lea     rdi, [rsp]                        # data pointer to Dog
    mov     rsi, qword ptr [rip + VTABLE_FOR_DOG_SPEAK]  # vtable pointer
    call    call_speak
    add     rsp, 16
    ret

No heap was required. The fat pointer has a fixed size. The ABI is satisfied because the callee sees two known size machine words and knows how to use the vtable to find the code pointer. The unsized part is the underlying dyn Speak, which never moves by value.

Example 4. Rust Box dyn Trait for owned polymorphism

Rust code:

fn own_and_call(x: Box<dyn Speak>) -> i32 { x.speak() }

A Box<dyn Speak> is a single word pointer to a heap allocation that begins with the data followed by an internal pointer to the vtable, or stored sidecar in metadata known to the compiler. Conceptually at call time the callee receives a single pointer of known size. The first method call loads the vtable, then indirect calls the method:

own_and_call:
    mov     rax, qword ptr [rdi + VT_PTR_OFFSET]   # load vtable pointer from box header or metadata
    mov     rcx, qword ptr [rax + SPEAK_OFFSET]    # code pointer
    mov     rdi, rdi                                # self remains the heap data address
    call    rcx
    # when dropping Box, callee will also use the vtable drop glue to run the right destructor, then free
    ret

The reason Box shows up is ownership with unknown size. The handle itself has a known size. The destructor is selected through the vtable at drop time.

Example 5. Go interface as a two word descriptor

Go code:

type Speaker interface { Speak() int }

type Dog struct{ bark int64 }
func (d *Dog) Speak() int { return 1 }

func CallSpeak(s Speaker) int { return s.Speak() }

func Demo() int {
    var d Dog
    return CallSpeak(&d)
}

Go’s interface value is two machine words:

  • itab or type pointer that carries method table and type identity

  • data pointer that points at the concrete value or a copy

A reasonable pseudo assembly for CallSpeak:

# Assume Go passes interface in two registers RDI=itab, RSI=data for clarity
CallSpeak:
    mov     rax, qword ptr [rdi + SPEAK_SLOT]   # method code pointer
    mov     rdi, rsi                            # receiver in first arg register
    call    rax
    ret

At the call site:

Demo:
    sub     rsp, 16
    mov     qword ptr [rsp], 0               # d.bark
    lea     rsi, [rsp]                       # data pointer points to local Dog
    mov     rdi, qword ptr [rip + ITAB_FOR_PTR_DOG_TO_SPEAKER]  # method table for *Dog
    call    CallSpeak
    add     rsp, 16
    ret

Escape analysis decides whether d must move to the heap. The interface representation itself has a fixed size, so the ABI is always satisfied.

Example 6. Why the compiler must know by value size at the call site

Consider a pretend interface type I with unknown size and a function:

// imaginary C-like
int f(I x); // wants by value interface

To call f, the caller must:

  • reserve space for arguments

  • copy x into argument registers or stack slots

  • adjust RSP by a known constant

  • restore RSP after call

If the caller does not know the byte size of x, it cannot:

  • compute the stack frame layout

  • generate the correct number of mov instructions

  • honor the red zone and alignment rules

There is no x86 instruction that says move an unknown number of bytes from this address into the call argument area. You always see fixed width moves like mov, movsq, vmovdqu, with counts computed in compile time loops when inlined. Even uses of rep movsb encode a size that comes from registers, but the ABI still needs a consistent framing decision by both caller and callee. The ABI does not let one side pick sizes at runtime while the other side assumed a different size.

Concrete example, caller side lowering when size is known:

# void g(Big b) with Big being 64 bytes
# Caller knows b is 64 bytes and ABI says pass first 16 in regs then spill rest
mov     rdi, qword ptr [rsi]        # b[0..7]
mov     rsi, qword ptr [rsi + 8]    # b[8..15]
sub     rsp, 48
mov     qword ptr [rsp],     qword ptr [rsi + 16]
mov     qword ptr [rsp + 8], qword ptr [rsi + 24]
mov     qword ptr [rsp +16], qword ptr [rsi + 32]
mov     qword ptr [rsp +24], qword ptr [rsi + 40]
mov     qword ptr [rsp +32], qword ptr [rsi + 48]
mov     qword ptr [rsp +40], qword ptr [rsi + 56]
call    g
add     rsp, 48

If Big had unknown size, none of those constants exist. The compiler cannot substitute a symbol like SIZE(Big) at runtime.

Example 7. Virtual destructor path in C++ and delete through base

C++ code:

struct Base { virtual ~Base() {} virtual int go() const = 0; };
struct D : Base { int x; int go() const override { return x; } };

int run_and_delete(Base* p) {
    int r = p->go();
    delete p;        // must call D::~D if p points to D
    return r;
}

The destructor call is resolved at runtime through the vtable:

run_and_delete:
    mov     rax, qword ptr [rdi]           # vptr
    mov     rcx, qword ptr [rax + GO_OFF]  # function pointer for go
    call    rcx
    mov     rbx, eax                        # save result in callee saved
    mov     rax, qword ptr [rdi]           # reload vptr for destructor
    mov     rcx, qword ptr [rax + DTOR_OFF]
    call    rcx                             # calls D::~D if that is dynamic type
    mov     eax, ebx
    ret

Key point. The base pointer has fixed size. The callee finds the right destructor through the vtable to reclaim the right number of bytes and run the right cleanup. None of this would be possible if the base subobject by value hid the derived size. The delete works because the handle stays a pointer.

Example 8. Rust trait object drop glue mirrors C++ destructor logic

Rust code:

trait Run { fn go(&self) -> i32; }
struct D { x: i32 }
impl Run for D { fn go(&self) -> i32 { self.x } }

fn run_and_drop(p: Box<dyn Run>) -> i32 {
    let r = p.go();
    r
} // drop occurs here

At drop, the compiler uses the vtable drop glue:

# conceptual
run_and_drop:
    # RDI = pointer to heap allocation header of Box<dyn Run>
    # method call
    mov     rax, qword ptr [rdi + VT_PTR_OFF]   # vtable
    mov     rcx, qword ptr [rax + GO_OFF]       # go fn ptr
    mov     rsi, rdi                             # receiver = data pointer
    call    rcx
    mov     ebx, eax
    # drop glue
    mov     rax, qword ptr [rdi + VT_PTR_OFF]
    mov     rcx, qword ptr [rax + DROP_OFF]     # destructor glue
    mov     rsi, rdi
    call    rcx                                  # runs D drop then frees memory
    mov     eax, ebx
    ret

Again, everything works because the handle is a fixed size pointer and the vtable provides type specific behavior.

Example 9. Go method call through interface with inlining disabled

Go code:

type R interface { Run() int }

type T struct{ x int }
func (t *T) Run() int { return t.x }

func F(r R) int { return r.Run() }

A plausible assembly sketch when not inlined:

F:
    # RDI = itab pointer
    # RSI = data pointer
    mov     rax, qword ptr [rdi + RUN_SLOT] # method code pointer for (*T).Run
    mov     rdi, rsi                        # receiver
    call    rax
    ret

The pattern matches Rust and C++ because the constraint is the same. Pass a fixed size descriptor, read a function pointer from a table, indirect call it.

Example 10. Why static dispatch needs no indirection and what the assembly looks like

Rust generic function with static dispatch:

trait Speak { fn speak(&self) -> i32; }

fn call_speak_generic<T: Speak>(t: &T) -> i32 { t.speak() }

fn demo() -> i32 {
    let d = Dog { bark: 1 };
    call_speak_generic(&d)
}

Monomorphization produces a concrete function for Dog:

# call_speak_generic::<Dog>
call_speak_generic_Dog:
    # direct call, no vtable lookup
    mov     rax, qword ptr [rdi]   # maybe load state, here not needed
    # but real code will just inline or call Dog::speak directly
    jmp     Dog_speak

Because the compiler knows the exact type, it can emit a direct call or inline the body. No indirect call, no table lookups, no fat pointer required. This shows that indirection is a property of not knowing the concrete type at compile time, not a property of the method call idea itself.

Example 11. Type erasure with small buffer optimization in C++

Many value like polymorphic containers fake a fixed size using an inline buffer and a vtable like control block.

Sketch:

struct Fun {
    void* obj;
    int (*call)(void*);
    void (*destroy)(void*);
    alignas(16) unsigned char buf[32]; // inline storage

    template<class F>
    Fun(F f) {
        if sizeof(F) <= 32 {
            new (buf) F(std::move(f));
            obj = buf;
            call = [](void* p){ return (*reinterpret_cast<F*>(p))(); };
            destroy = [](void* p){ reinterpret_cast<F*>(p)->~F(); };
        } else {
            F* heap = new F(std::move(f));
            obj = heap;
            call = [](void* p){ return (*reinterpret_cast<F*>(p))(); };
            destroy = [](void* p){ delete reinterpret_cast<F*>(p); };
        }
    }

    ~Fun(){ destroy(obj); }
    int operator()(){ return call(obj); }
};

Assembly effects:

  • The wrapper itself has fixed size. ABI is satisfied.

  • Calls always go through an indirect function pointer call.

  • Small objects avoid heap and live in the inline buffer.

  • Large ones allocate, but the handle is still fixed size.

This shows that when you cannot make the callee type known at compile time, you can still force the caller side to have a fixed size wrapper and push indirection inside the wrapper.

Example 12. The impossible call site without known size

Imagine a function taking a truly unknown by value object:

# Pseudocode, not legal C
int process(Unknown x);

Caller would have to do:

sub     rsp, ???          # unknown amount
rep movsb                 # copy ??? bytes from source to argument area
call    process
add     rsp, ???          # unknown amount

Because ??? is not a compile time constant, the compiler cannot encode the frame prolog and epilog. Even if it put the size in a register, the ABI would break because callee and caller would disagree on stack frame layout and who owns which bytes. This is why no mainstream ABI supports such a parameter passing mode for general code. The model is always fixed size values or pointer like references.

Connecting the dots

Every example above reduces to the same fixed idea. The ABI requires that the size and layout of by value parameters and returns are known to the compiler. Interfaces, traits, and abstract bases hide the concrete type, which hides layout and size. That breaks the ABI unless we shift to passing a fixed size descriptor. C++ uses pointers or references to base plus vtables. Rust uses fat pointers to trait objects and vtables and adds ownership through Box for unsized values. Go uses a two word interface value with a type pointer and a data pointer. When you want value like semantics without knowing the concrete type, you put a fixed size wrapper around the variable sized thing and forward through function pointers. All of these are different clothing on the same rule.

Short answers to common confusions

  • Do you always need the heap? No. Borrowed references to polymorphic objects use stack or existing storage. Heap shows up when you need ownership of unknown size or lifetimes that outlive the current frame.

  • Is indirection a performance problem? It is a tiny cost of one extra pointer read and an indirect branch. Many workloads tolerate it. Static dispatch removes both costs if you can accept monomorphization.

  • Could a different architecture avoid this? You would need a hardware and ABI model that can pass runtime sized opaque values while preserving stack invariants and interop. Mainstream CPUs and ABIs do not support this.

One last walk through the chain

Polymorphism hides type. Hidden type hides layout. Hidden layout hides size. Hidden size breaks the calling convention. To restore the convention we pass a fixed size handle and use a method table to recover behavior. If we also need ownership, we store the unknown sized object behind the handle in a heap or arena. The assembly listings above are the concrete footprints of that story.

Comments

Popular posts from this blog

Is Docker Still Relevant in 2025? A Practical Guide to Modern Containerization

Going In With Rust: The Interview Prep Guide for the Brave (or the Mad)

Mastering Prompt Engineering: How to Think Like an AI and Write Prompts That Never Fail