Why Indirection Is the Price of Polymorphism, With Assembly You Can Read
Polymorphism lets you call methods through an interface without knowing the concrete type at compile time. The CPU and the ABI do not understand interfaces, traits, or virtuals. They only understand bytes, fixed register widths, and stack slots. That mismatch is the root cause of why runtime polymorphism uses indirection. In this extended version I will show concrete assembly flavored examples so you can see exactly where size knowledge matters, how vtables and fat pointers are used, and what gets passed in registers or on the stack. I will use x86-64 System V ABI conventions where practical. Exact assembly will vary by compiler and flags, but the patterns are stable. The goal is to make the invisible visible.
A tiny ABI checklist before we start
The ABI decides how arguments are passed and where return values land. On x86-64 System V:
-
Integer or pointer arguments are passed in RDI, RSI, RDX, RCX, R8, R9, then the stack.
-
Floating arguments use XMM0..XMM7.
-
The caller aligns RSP to a 16 byte boundary before call.
-
The callee must preserve RBX, RBP, R12..R15, and restore RSP.
-
By value objects must have a known compile time size so the caller can move the right number of bytes into registers or onto the stack.
If the compiler cannot know a type size, it cannot reserve space, spill, copy, or return it by value in ABI compliant code. That is why we pass a fixed size handle when the concrete type is unknown.
Example 1. C struct by value vs pointer
C code:
typedef struct {
long a;
long b;
} Pair;
long sum_pair(Pair p) { return p.a + p.b; }
long sum_pair_ptr(const Pair* p) { return p->a + p->b; }
Clang -O2 emits something like:
# long sum_pair(Pair p)
# ABI knows Pair is 16 bytes. With two longs, the compiler can pass in RDI and RSI or pack in registers.
sum_pair:
lea rax, [rdi + rsi] # add p.a and p.b that arrived in regs due to known layout
ret
# long sum_pair_ptr(const Pair* p)
sum_pair_ptr:
mov rax, qword ptr [rdi] # p->a
add rax, qword ptr [rdi + 8] # p->b
ret
Key point. The by value call only works because the size and field layout of Pair is known at compile time. The compiler can place fields into registers or copy the right byte count. If Pair had unknown size, there would be no legal instruction sequence. A pointer works because it is always 8 bytes here.
Example 2. C++ virtual call through vtable and why by value fails
C++ code:
struct Animal { virtual ~Animal() {} virtual int speak() const = 0; };
struct Dog : Animal { long bark; int speak() const override { return 1; } };
int call_speak(const Animal& a) { return a.speak(); }
A plausible Itanium ABI style layout is:
-
Object memory starts with a vptr at offset 0. vptr points at the vtable.
-
The dog object memory: [vptr][bark:8 bytes].
A typical -O2 call site for call_speak ends up like:
# int call_speak(Animal const& a)
call_speak:
mov rax, qword ptr [rdi] # load vptr from object at address in RDI
mov rax, qword ptr [rax + 16]# load function pointer from vtable slot for speak
jmp rax # tail call or "call rax" then "ret"
Notes:
-
The reference parameter is just a pointer in RDI. Fixed size, ABI friendly.
-
The callee reads the vptr, then reads a code pointer from a fixed table slot, then indirect jumps. The CPU never needs to know Dog vs Cat size here.
What if you try to pass Animal by value?
int bad(Animal a); // illegal, cannot instantiate abstract class
Even if the class were not abstract, passing by value would cause slicing. The caller would only copy the Animal base subobject, which has a different notion of size than any derived class. The ABI would still need the number of bytes to copy, which is only well defined for the base subobject. You would lose derived state and end up with wrong behavior. This is why dynamic dispatch is paired with references or pointers to base. The rule follows from the size requirement.
Example 3. Rust borrowed trait object and method call lowering
Rust code:
trait Speak { fn speak(&self) -> i32; }
struct Dog { bark: i64 }
impl Speak for Dog { fn speak(&self) -> i32 { 1 } }
fn call_speak(x: &dyn Speak) -> i32 { x.speak() }
fn demo() -> i32 {
let d = Dog { bark: 42 };
call_speak(&d)
}
Important runtime shape:
-
&dyn Speakis a fat pointer of two machine words.-
data: pointer to Dog value
-
vtable: pointer to the Speak vtable for Dog
-
A typical lowering for call_speak looks like:
# Rust uses System V too. Let's describe the intent:
# RDI = address of fat pointer on caller stack or passed inline as two regs depending on ABI lowering
# Commonly Rust will pass the two words directly in RDI and RSI when calling into the function body.
# Assume RDI = data ptr, RSI = vtable ptr for clarity.
call_speak:
# load function pointer from vtable
mov rax, qword ptr [rsi + SPEAK_OFFSET] # method fn pointer
# first argument for a method on &self is the data pointer as &T
mov rdi, rdi # self in RDI already
jmp rax
At the call site in demo, the compiler materializes the fat pointer:
demo:
# allocate Dog on stack
sub rsp, 16
mov qword ptr [rsp], 42 # d.bark
# set up fat pointer args for &dyn Speak
lea rdi, [rsp] # data pointer to Dog
mov rsi, qword ptr [rip + VTABLE_FOR_DOG_SPEAK] # vtable pointer
call call_speak
add rsp, 16
ret
No heap was required. The fat pointer has a fixed size. The ABI is satisfied because the callee sees two known size machine words and knows how to use the vtable to find the code pointer. The unsized part is the underlying dyn Speak, which never moves by value.
Example 4. Rust Box dyn Trait for owned polymorphism
Rust code:
fn own_and_call(x: Box<dyn Speak>) -> i32 { x.speak() }
A Box<dyn Speak> is a single word pointer to a heap allocation that begins with the data followed by an internal pointer to the vtable, or stored sidecar in metadata known to the compiler. Conceptually at call time the callee receives a single pointer of known size. The first method call loads the vtable, then indirect calls the method:
own_and_call:
mov rax, qword ptr [rdi + VT_PTR_OFFSET] # load vtable pointer from box header or metadata
mov rcx, qword ptr [rax + SPEAK_OFFSET] # code pointer
mov rdi, rdi # self remains the heap data address
call rcx
# when dropping Box, callee will also use the vtable drop glue to run the right destructor, then free
ret
The reason Box shows up is ownership with unknown size. The handle itself has a known size. The destructor is selected through the vtable at drop time.
Example 5. Go interface as a two word descriptor
Go code:
type Speaker interface { Speak() int }
type Dog struct{ bark int64 }
func (d *Dog) Speak() int { return 1 }
func CallSpeak(s Speaker) int { return s.Speak() }
func Demo() int {
var d Dog
return CallSpeak(&d)
}
Go’s interface value is two machine words:
-
itab or type pointer that carries method table and type identity
-
data pointer that points at the concrete value or a copy
A reasonable pseudo assembly for CallSpeak:
# Assume Go passes interface in two registers RDI=itab, RSI=data for clarity
CallSpeak:
mov rax, qword ptr [rdi + SPEAK_SLOT] # method code pointer
mov rdi, rsi # receiver in first arg register
call rax
ret
At the call site:
Demo:
sub rsp, 16
mov qword ptr [rsp], 0 # d.bark
lea rsi, [rsp] # data pointer points to local Dog
mov rdi, qword ptr [rip + ITAB_FOR_PTR_DOG_TO_SPEAKER] # method table for *Dog
call CallSpeak
add rsp, 16
ret
Escape analysis decides whether d must move to the heap. The interface representation itself has a fixed size, so the ABI is always satisfied.
Example 6. Why the compiler must know by value size at the call site
Consider a pretend interface type I with unknown size and a function:
// imaginary C-like
int f(I x); // wants by value interface
To call f, the caller must:
-
reserve space for arguments
-
copy x into argument registers or stack slots
-
adjust RSP by a known constant
-
restore RSP after call
If the caller does not know the byte size of x, it cannot:
-
compute the stack frame layout
-
generate the correct number of mov instructions
-
honor the red zone and alignment rules
There is no x86 instruction that says move an unknown number of bytes from this address into the call argument area. You always see fixed width moves like mov, movsq, vmovdqu, with counts computed in compile time loops when inlined. Even uses of rep movsb encode a size that comes from registers, but the ABI still needs a consistent framing decision by both caller and callee. The ABI does not let one side pick sizes at runtime while the other side assumed a different size.
Concrete example, caller side lowering when size is known:
# void g(Big b) with Big being 64 bytes
# Caller knows b is 64 bytes and ABI says pass first 16 in regs then spill rest
mov rdi, qword ptr [rsi] # b[0..7]
mov rsi, qword ptr [rsi + 8] # b[8..15]
sub rsp, 48
mov qword ptr [rsp], qword ptr [rsi + 16]
mov qword ptr [rsp + 8], qword ptr [rsi + 24]
mov qword ptr [rsp +16], qword ptr [rsi + 32]
mov qword ptr [rsp +24], qword ptr [rsi + 40]
mov qword ptr [rsp +32], qword ptr [rsi + 48]
mov qword ptr [rsp +40], qword ptr [rsi + 56]
call g
add rsp, 48
If Big had unknown size, none of those constants exist. The compiler cannot substitute a symbol like SIZE(Big) at runtime.
Example 7. Virtual destructor path in C++ and delete through base
C++ code:
struct Base { virtual ~Base() {} virtual int go() const = 0; };
struct D : Base { int x; int go() const override { return x; } };
int run_and_delete(Base* p) {
int r = p->go();
delete p; // must call D::~D if p points to D
return r;
}
The destructor call is resolved at runtime through the vtable:
run_and_delete:
mov rax, qword ptr [rdi] # vptr
mov rcx, qword ptr [rax + GO_OFF] # function pointer for go
call rcx
mov rbx, eax # save result in callee saved
mov rax, qword ptr [rdi] # reload vptr for destructor
mov rcx, qword ptr [rax + DTOR_OFF]
call rcx # calls D::~D if that is dynamic type
mov eax, ebx
ret
Key point. The base pointer has fixed size. The callee finds the right destructor through the vtable to reclaim the right number of bytes and run the right cleanup. None of this would be possible if the base subobject by value hid the derived size. The delete works because the handle stays a pointer.
Example 8. Rust trait object drop glue mirrors C++ destructor logic
Rust code:
trait Run { fn go(&self) -> i32; }
struct D { x: i32 }
impl Run for D { fn go(&self) -> i32 { self.x } }
fn run_and_drop(p: Box<dyn Run>) -> i32 {
let r = p.go();
r
} // drop occurs here
At drop, the compiler uses the vtable drop glue:
# conceptual
run_and_drop:
# RDI = pointer to heap allocation header of Box<dyn Run>
# method call
mov rax, qword ptr [rdi + VT_PTR_OFF] # vtable
mov rcx, qword ptr [rax + GO_OFF] # go fn ptr
mov rsi, rdi # receiver = data pointer
call rcx
mov ebx, eax
# drop glue
mov rax, qword ptr [rdi + VT_PTR_OFF]
mov rcx, qword ptr [rax + DROP_OFF] # destructor glue
mov rsi, rdi
call rcx # runs D drop then frees memory
mov eax, ebx
ret
Again, everything works because the handle is a fixed size pointer and the vtable provides type specific behavior.
Example 9. Go method call through interface with inlining disabled
Go code:
type R interface { Run() int }
type T struct{ x int }
func (t *T) Run() int { return t.x }
func F(r R) int { return r.Run() }
A plausible assembly sketch when not inlined:
F:
# RDI = itab pointer
# RSI = data pointer
mov rax, qword ptr [rdi + RUN_SLOT] # method code pointer for (*T).Run
mov rdi, rsi # receiver
call rax
ret
The pattern matches Rust and C++ because the constraint is the same. Pass a fixed size descriptor, read a function pointer from a table, indirect call it.
Example 10. Why static dispatch needs no indirection and what the assembly looks like
Rust generic function with static dispatch:
trait Speak { fn speak(&self) -> i32; }
fn call_speak_generic<T: Speak>(t: &T) -> i32 { t.speak() }
fn demo() -> i32 {
let d = Dog { bark: 1 };
call_speak_generic(&d)
}
Monomorphization produces a concrete function for Dog:
# call_speak_generic::<Dog>
call_speak_generic_Dog:
# direct call, no vtable lookup
mov rax, qword ptr [rdi] # maybe load state, here not needed
# but real code will just inline or call Dog::speak directly
jmp Dog_speak
Because the compiler knows the exact type, it can emit a direct call or inline the body. No indirect call, no table lookups, no fat pointer required. This shows that indirection is a property of not knowing the concrete type at compile time, not a property of the method call idea itself.
Example 11. Type erasure with small buffer optimization in C++
Many value like polymorphic containers fake a fixed size using an inline buffer and a vtable like control block.
Sketch:
struct Fun {
void* obj;
int (*call)(void*);
void (*destroy)(void*);
alignas(16) unsigned char buf[32]; // inline storage
template<class F>
Fun(F f) {
if sizeof(F) <= 32 {
new (buf) F(std::move(f));
obj = buf;
call = [](void* p){ return (*reinterpret_cast<F*>(p))(); };
destroy = [](void* p){ reinterpret_cast<F*>(p)->~F(); };
} else {
F* heap = new F(std::move(f));
obj = heap;
call = [](void* p){ return (*reinterpret_cast<F*>(p))(); };
destroy = [](void* p){ delete reinterpret_cast<F*>(p); };
}
}
~Fun(){ destroy(obj); }
int operator()(){ return call(obj); }
};
Assembly effects:
-
The wrapper itself has fixed size. ABI is satisfied.
-
Calls always go through an indirect function pointer
call. -
Small objects avoid heap and live in the inline buffer.
-
Large ones allocate, but the handle is still fixed size.
This shows that when you cannot make the callee type known at compile time, you can still force the caller side to have a fixed size wrapper and push indirection inside the wrapper.
Example 12. The impossible call site without known size
Imagine a function taking a truly unknown by value object:
# Pseudocode, not legal C
int process(Unknown x);
Caller would have to do:
sub rsp, ??? # unknown amount
rep movsb # copy ??? bytes from source to argument area
call process
add rsp, ??? # unknown amount
Because ??? is not a compile time constant, the compiler cannot encode the frame prolog and epilog. Even if it put the size in a register, the ABI would break because callee and caller would disagree on stack frame layout and who owns which bytes. This is why no mainstream ABI supports such a parameter passing mode for general code. The model is always fixed size values or pointer like references.
Connecting the dots
Every example above reduces to the same fixed idea. The ABI requires that the size and layout of by value parameters and returns are known to the compiler. Interfaces, traits, and abstract bases hide the concrete type, which hides layout and size. That breaks the ABI unless we shift to passing a fixed size descriptor. C++ uses pointers or references to base plus vtables. Rust uses fat pointers to trait objects and vtables and adds ownership through Box for unsized values. Go uses a two word interface value with a type pointer and a data pointer. When you want value like semantics without knowing the concrete type, you put a fixed size wrapper around the variable sized thing and forward through function pointers. All of these are different clothing on the same rule.
Short answers to common confusions
-
Do you always need the heap? No. Borrowed references to polymorphic objects use stack or existing storage. Heap shows up when you need ownership of unknown size or lifetimes that outlive the current frame.
-
Is indirection a performance problem? It is a tiny cost of one extra pointer read and an indirect branch. Many workloads tolerate it. Static dispatch removes both costs if you can accept monomorphization.
-
Could a different architecture avoid this? You would need a hardware and ABI model that can pass runtime sized opaque values while preserving stack invariants and interop. Mainstream CPUs and ABIs do not support this.
One last walk through the chain
Polymorphism hides type. Hidden type hides layout. Hidden layout hides size. Hidden size breaks the calling convention. To restore the convention we pass a fixed size handle and use a method table to recover behavior. If we also need ownership, we store the unknown sized object behind the handle in a heap or arena. The assembly listings above are the concrete footprints of that story.

Comments
Post a Comment