notes

Tail-call optimization

Optimizing tail recursion is trivial, because the stack is reused, but general TCO requires additional cleanup or resizing.

In assembly, TCO can be implemented by replacing a call followed by a ret with a jmp. There may be call frame setup between the two instructions, which would need to be adjusted.

Implementations

LLVM

LLVM has the -tailcallelim pass and optimizes tail calls and sibling calls.

It extends the basic algorithm to move trivial instructions from between the call and return and rewrites associative expressions to use an accumulator (as in naïve factorial or Fibonacci). It also performs it on functions that return void, the callee’s result, or a run-time constant on all exits, or if it can prove that callees don’t access the caller stack frame. TCO is only performed for the fastcc, GHC, HiPE, tailcc, and swifttailcc calling conventions.

MLIR

The Func dialect has func.call, func.call_indirect, and func.return, and the LLVM dialect has llvm.call and llvm.return, but their docs make no mention of tail calls. Presumably, MLIR leans on LLVM’s TCO.

WebAssembly

WebAssembly has the regular call instructions, call, call_indirect, and call_ref as well as their proposed tail-call counterparts, return_call, return_call_indirect, and return_call_ref, which unwind the frame before performing the call. The callee does not necessarily need to be the same as the caller and can have a different type, so is more general than tail self-recursive calls.

V8

V8 has implemented this proposal.

Turbofan (V8’s optimizing compiler) handles the stack and register manipulations for reusing the stack space by generating a list of moves that are semantically executed in parallel. The “gap resolver” then orders moves such that all sources are read before being overwritten, using temporary registers or stack slots to break cycles.

Liftoff (V8’s baseline compiler) does not optimize tail calls. It instead updates the stack frame as for a regular call, then shifts it downwards to discard the caller frame.

Wasmtime

Wasmtime details their implementation in an RFC.

Rust

In RFC 3407, it is proposed that Rust use the already-reserved become keyword to denote a tail call. It is used opt-in, in place of a return, and its argument must be a call. Tail calls to and from closures will not be supported, due to the high implementation effort.

Zig

Zig uses separate @call syntax to specify modifiers, such as to control tail calls or inlining.

JVM

The JVM eliminates tail-recursive calls, but not general tail calls or mutual tail recursion.

GraalVM

The Graal compiler has internal support for tail calls and limited support for Truffle languages.

Others