Rust, from first principles

March 24, 2026|

Rust began as a personal project by Graydon Hoare in 2006, became Mozilla Research-sponsored in 2009, and reached its 1.0 release in May 2015. The language was designed from the outset as a response to the memory safety failures endemic in C and C++: dangling pointers, use-after-free vulnerabilities, and data races that escape detection in large codebases. The stated goal was to deliver the same degree of systems-level control as C, with safety guaranteed by the type system rather than by programmer discipline or a garbage collector. That safety guarantee follows from a precise mathematical discipline: the type system is a substructural logic, the borrow checker is a constraint solver over region variables, and the async runtime is a state machine over poll-based future types. This post derives all of these from first principles, from the structural rules of logic to the operational semantics of an abstract machine, from affine ownership to the Pin invariant, from the serde data model to the no_std embedded contract. Each step follows from the one before.

1. Why Rust Needs a Formal Account

1.1. The Problem C and C++ Left Unsolved

Memory safety failures in C and C++ are the dominant source of security vulnerabilities across the industry. The NSA/CISA "The Case for Memory Safe Roadmaps" report (2023) documented that approximately 70% of CVEs in major codebases, including those of Microsoft, Google Android, and Chrome, trace to memory safety issues.^[1] The canonical failure modes are a short list: dangling pointers (accessing memory that has been freed), use-after-free (a specialisation with particularly severe security implications, since freed memory is often reallocated and may contain attacker-controlled data), buffer overflows (reading or writing past the end of an allocation), and data races (two threads concurrently writing to the same location without synchronisation, producing an indeterminate programme state).

Consider the minimal C example:

// C: no safety net
char *p = malloc(8);
free(p);
char c = *p;  // undefined behaviour: use-after-free

The C compiler emits this without complaint. The standard calls the behaviour undefined, meaning the compiler is permitted to assume it never occurs and may optimise under that assumption, sometimes producing security vulnerabilities that did not exist in a naive reading of the code. There is no runtime check, no trap, no diagnostic. The programme is simply wrong, and the wrongness may not manifest until an attacker controls the freed memory.

Rust's type system makes all of these a compile-time impossibility in safe code. The compiler rejects the programme before it runs, with no runtime overhead. The rest of this post explains why, starting from the mathematical structures that underlie the type system.

1.2. Rust in the Language Landscape: Go, Haskell, and the Safety Spectrum

To understand what Rust achieves, it is useful to locate it precisely among its contemporaries.

C and C++ offer maximum control. Memory management is entirely manual: the programmer allocates, manages, and frees every resource. There is no safety enforcement at the language level. The programmer is responsible for every invariant, and decades of CVEs show the cost of this arrangement. C++ adds RAII and smart pointers as conventions, but these are not enforced by the type system in any deep sense.

Go achieves safety through a garbage collector and a runtime scheduler. There is no manual memory management and no possibility of dangling pointers, because the collector ensures that memory is never freed while a reference to it remains live. The cost is a hidden runtime: a garbage collector that introduces non-deterministic pauses, a memory model that does not permit control over allocation layout, and an M:N green thread scheduler that sits between the programme and the OS. Go trades control for simplicity, and for many applications that is a sound trade.

Haskell achieves safety through purity and a strong type system. The IO monad separates effectful computation from pure computation, and lazy evaluation means values are only computed when demanded. There is no mutable state by default, which eliminates entire classes of error. The cost is a runtime with a garbage collector, no systems-level control over memory layout, and a lazy evaluation strategy that makes performance reasoning subtle. Haskell is excellent for writing correct programmes; it is not designed for writing programmes that control hardware.

Rust occupies a different position: safety and control, simultaneously. There is no garbage collector, no runtime scheduler, and no hidden allocations. Safety is enforced entirely at compile time by the type system. The programmer retains full control over memory layout and execution order. The trade-off is that the type system is more demanding to satisfy.

The safety spectrum runs from "enforce nothing at the language level" (C) through "enforce at runtime via a collector and a scheduler" (Go) to "enforce at compile time via a type system" (Rust, Haskell). Rust is unique in combining compile-time enforcement with zero-runtime cost.

1.3. What "Safe" Means Operationally

The word "safe" is used loosely in conversation. For this post we need a precise definition.

Definition (Memory safety).

A language implementation is memory-safe if it guarantees, for all well-typed programmes, that: (1) no read occurs from uninitialised memory, (2) no pointer is dereferenced after the pointee's lifetime has ended, and (3) no two threads concurrently write to the same memory location without synchronisation.^[2]

Rust's type system provides all three guarantees in safe code. Uninitialised memory cannot be read because the compiler requires definite assignment before a variable is used. Dangling pointer dereferences are impossible because the borrow checker enforces that every reference's lifetime is contained within the lifetime of the referent. Data races are impossible because the type system separates exclusive mutable access () from shared read-only access () and enforces the invariant that no two parties hold an exclusive mutable reference simultaneously.

Remark (unsafe blocks).

The unsafe keyword creates a bounded region in which the programmer re-assumes the proof obligation for memory safety. It does not disable the borrow checker; it merely unlocks four additional capabilities: dereferencing raw pointers, calling unsafe functions, accessing mutable statics, and implementing unsafe traits. The unsafety is strictly local and auditable. Every unsafe block is a claim by the programmer that the invariants hold by reasoning outside the type system, and that claim can be reviewed in isolation.

2. The Type System as a Logic

2.1. Substructural Logics: Linear, Affine, Relevant

Classical propositional logic is governed by three structural rules in addition to its logical axioms. Weakening permits unused hypotheses: from a proof of using hypothesis set , we may conclude in any larger context , even if is never used. Contraction permits hypotheses to be duplicated: if we can prove with two copies of , then one copy suffices, because we can always make two. Exchange permits hypotheses to be reordered freely. Together these rules mean that hypotheses behave like timeless facts: they are free, copyable, and order-independent.

Substructural logics drop one or more of these structural rules, and in doing so they acquire the power to reason about resources, which are consumed by use, cannot be freely duplicated, and may be order-sensitive.

Definition (Linear logic).

Linear logic (Girard, 1987)^[3] drops both weakening and contraction. Every hypothesis must be used exactly once. The logical connectives split into two families: multiplicatives (, ) for combining resources that coexist, and additives (, ) for choosing between resources. A proof in linear logic is a description of how resources flow through a computation, with no resource created or destroyed outside the explicit rules.

Definition (Affine logic).

Affine logic drops contraction but retains weakening. Every hypothesis is used at most once: it may be discarded (weakening is permitted) but not duplicated (contraction is forbidden). Affine logic models resources that can be abandoned but not copied.

Definition (Relevant logic).

Relevant logic drops weakening but retains contraction. Every hypothesis must be used at least once: it cannot be ignored, but it may be used multiple times. Relevant logic models resources that must be accounted for but are inexhaustible once present.

Remark (Structural rules summary).

Logic	Weakening	Contraction	Use constraint
Linear	No	No	Exactly once
Affine	Yes	No	At most once
Relevant	No	Yes	At least once
Classical	Yes	Yes	Any number

The Rust ownership system operates in the affine column: values may be dropped (weakening) but not duplicated (no contraction). The Copy trait restores contraction for a specific class of types, returning them to classical behaviour.

2.2. Ownership as Affine Types

Every Rust value has exactly one owner at any point in time. When a non-Copy value is assigned to a new binding or passed to a function, ownership is moved: the source binding becomes inaccessible and the compiler rejects any subsequent use of it. The value is dropped, and its memory freed, when the owner's scope ends. This is the affine discipline realised in a systems language: every value is consumed at most once as an owner. It can be borrowed any number of times, but a move is a one-way transfer.

Remark (Ownership as an affine type discipline).

The Rust ownership system is a realisation of the affine fragment of linear logic in a systems programming language, where the "consumption" of a value corresponds to its eventual deallocation. Moving a value is the act of consuming the affine resource. Dropping a value (allowing it to go out of scope without moving it) is the exercise of weakening: the resource is abandoned rather than transferred.

The comparison with C++ is instructive. std::unique_ptr approximates affine ownership through RAII and a deleted copy constructor: only one pointer owns the heap allocation at a time, and the destructor runs exactly once. But this invariant is a convention enforced by the class interface, not the type system. A programmer can std::move a unique_ptr incorrectly, pass a raw pointer extracted from it elsewhere, or write code that logically violates uniqueness in ways that no static analysis catches without specialised tooling. Rust enforces these constraints statically, for every heap-allocated type, without exception in safe code.

Copy types (integers, booleans, , fixed-size arrays of Copy types) are exempt from the affine discipline. They implement the Copy trait, which signals to the compiler that bitwise duplication is semantically sound. Assignment copies rather than moves. This is the contraction rule selectively restored for types where it is safe to do so.

2.3. The Curry-Howard Correspondence

The Curry-Howard correspondence (Howard, 1980)^[4] is one of the most productive structural insights in theoretical computer science. It identifies propositions with types and proofs with programmes. A function of type is simultaneously a programme that maps values of type to values of type , and a proof that the proposition implies the proposition : applying the function to a proof (value) of yields a proof (value) of .

The correspondence extends to the logical connectives. Products (, encoded as structs in Rust) correspond to conjunction: a value of type is a proof of both and . Coproducts (, encoded as enums) correspond to disjunction: a value of type enum E { Left(A), Right(B) } is a proof of , carrying the witness for whichever disjunct holds. The unit type (), which has exactly one value, corresponds to the trivially true proposition. The empty type ! (Rust's "never" type), which has no values, corresponds to falsehood: from a value of type ! any type can be produced, which is the Curry-Howard image of ex falso quodlibet.

Extending the correspondence to resource-sensitive types: a value of an affine type is a proof of an affine proposition, one that can be used at most once. Consuming the value (moving it) is the act of using the proof. The linear logic connectives then govern how resources flow through the programme, and the type checker's enforcement of ownership is the proof-checker's enforcement of the proof rules.

Remark (The ? operator as a proof rule).

The ? operator in Rust (early return on Err) is the Curry-Howard image of the disjunction elimination rule with a fixed error channel. Given a value of type Result<T, E> (a proof of ), the ? operator either extracts the T (the left disjunct holds) or returns from the enclosing function with the E (the right disjunct holds, and control exits immediately). This is realised via the Try trait, which abstracts the desugaring.

2.4. Comparison: Haskell's LinearTypes vs. Rust Ownership

GHC 9.0 introduced the LinearTypes extension,^[5] which adds multiplicity-annotated function arrows to Haskell: means the function consumes its argument exactly once (the linear arrow), and means it may consume the argument any number of times (the ordinary arrow). This is a direct, explicit realisation of linear logic in a mature production language, and it enables Haskell to provide the same kind of resource-tracking guarantees in annotated code.

The key difference from Rust is one of defaults. In Haskell, linearity is an opt-in annotation on function types, layered on top of a language whose default is classical: values can be used any number of times, and the programmer selectively opts into linear discipline where precision is needed. The type checker enforces multiplicity constraints within annotated functions, but the surrounding ecosystem is unaffected. In Rust, affine ownership is the default for all heap-allocated values. The programmer opts out by deriving Copy or calling .clone() explicitly. The burden of proof runs in opposite directions.

Both systems realise the same mathematical structure, the affine (Rust) and linear (Haskell LinearTypes) fragments of linear logic, instantiated in a production compiler. Rust prioritises safety-by-default in a systems context, where the cost of an accidental copy or use-after-free is measured in security vulnerabilities. Haskell prioritises expressiveness and positions linearity as a precision tool, because its primary concern is correctness of functional programmes rather than control of hardware resources.

2.5. RustCell: Use-After-Move Rejection

The affine discipline is not an abstract claim. The compiler enforces it concretely on every non-Copy type. The following programme is rejected at compile time because s is moved into _t on the second line, and the affine resource is consumed; the third line attempts to use a resource that no longer exists.

fn main() {
  let s = String::from("hello");
  let _t = s;          // s is moved into _t
  println!("{s}");     // error[E0382]: borrow of moved value: `s`
}

Affine type enforcement: the move makes s unavailable at its use site

The error message is precise: error[E0382]: borrow of moved value: s. The compiler's dataflow analysis tracks ownership through every control-flow path and reports the exact location where the already-moved value is used. There is no analogous check in C or C++. In Go and Java, values are reference-counted or garbage-collected and can be used freely after assignment. Only a language with an affine type system can produce this rejection, and it produces it without any runtime overhead.

3. Lifetimes and the Region Calculus

3.1. Region Variables and Constraints

Every reference in Rust carries a lifetime annotation, written 'a, 'b, and so on. A lifetime is a region variable: it stands for some contiguous span of the programme's execution during which a reference is valid. The borrow checker is a constraint solver that generates and discharges a system of lifetime constraints. Rather than tracking individual heap addresses at runtime (as a garbage collector does), the borrow checker reasons statically about the spans of validity, which are represented as sets of programme points, and checks at compile time that no reference is ever used outside the span for which it is valid.

Definition (Lifetime (region variable)).

A lifetime 'a is a region variable denoting a set of programme points. A reference &'a T is valid precisely at the programme points in 'a. The outlives relation 'a: 'b (read: 'a outlives 'b) means every programme point in 'b is also in 'a: formally, .

Definition (Lifetime constraint system).

For a programme to be accepted by the borrow checker, the compiler generates a set of constraints of the form 'a: 'b (region inclusion) and 'a ⊇ live(x) (the lifetime must cover the live range of value x). The borrow checker is satisfied when all constraints have a simultaneous solution under the subset partial order on regions.

The compiler solves this system using a least fixed-point iteration, starting each region at the empty set and growing it via union until a fixed point is reached. Least fixed-point semantics are significant here: by choosing the minimal region that satisfies all constraints, the compiler assigns each borrow the shortest possible live range, which maximises the set of well-formed programmes accepted, rejecting only those for which no valid assignment of regions exists. This formulation was formalised in the Non-Lexical Lifetimes RFC (RFC 2094).^[6]

3.2. Non-Lexical Lifetimes (NLL) and the Borrow Checker as Constraint Solver

Before NLL (pre-Rust 2018 edition), lifetimes were lexically scoped: a borrow lasted until the end of the enclosing block, even if the borrowed value was not used after an earlier point in that block. This produced a class of spurious rejections where the programmer's intent was clear and the programme was semantically sound, but the lexical lifetime of a borrow overlapped with a subsequent operation that the compiler conservatively refused. NLL, stabilised in Rust 2018 via RFC 2094,^[6] eliminates this class of false positives by computing lifetimes as the liveness of the borrow: a region is the set of programme points at which the reference may actually be used.

Formally, NLL computes liveness on the control-flow graph (CFG) of the function's MIR (mid-level intermediate representation). MIR is a simplified, explicitly typed IR produced after desugaring and name resolution, with explicit basic blocks and explicit borrow annotations. A reference is live at a programme point if there exists a path from that point to a future use of the reference. The borrow checker then checks, for each pair of potentially conflicting borrows, that their regions do not overlap at any programme point. Mutable and shared borrows are in conflict if they alias the same memory; two mutable borrows are in conflict unconditionally if they alias.

Theorem (NLL soundness (informal)).

If the NLL borrow checker accepts a programme, then no two simultaneously live borrows alias the same memory with at least one being mutable. This theorem is proved formally in the RustBelt framework by Jung et al. (POPL 2018)^[7] via a separation logic model in which the ownership discipline is encoded as an invariant on the resource algebra.

The practical implication is that the borrow checker is a dataflow analysis over the programme's CFG, computing the minimal region assignments that satisfy all constraints, and checking whether those assignments are consistent. When the checker rejects a programme, there is a genuine semantic reason: the constraints have no solution.

3.3. Variance, Covariance, Contravariance

Subtyping for lifetimes is induced by the outlives relation: if 'a outlives 'b, written 'a: 'b, then &'a T is a subtype of &'b T, because a reference that is valid for the longer span is certainly valid for the shorter one. Subtyping for type constructors depends on how they interact with the subtyping of their parameters, which is the concept of variance.

Definition (Variance).

A type constructor is covariant in a parameter if implies ; contravariant if the ordering reverses, so implies ; and invariant if neither implication holds in general. For lifetimes, the subtyping order is defined by the outlives relation: 'a: 'b.

The variance of Rust's built-in type constructors is determined by the compiler from their structure. The key entries are:

Type	Variance in lifetime `'a`	Variance in `T`
`&'a T`	covariant	covariant
`&'a mut T`	covariant	invariant
`*const T`	n/a	covariant
`*mut T`	n/a	invariant
`fn(T) -> U`	n/a	contravariant in `T`, covariant in `U`
`Cell<T>`	n/a	invariant

The invariance of &'a mut T in T is non-obvious and worth deriving. Suppose &'a mut T were covariant in T, so a &'a mut &'static str could be used where a &'a mut &'b str is expected, with 'b shorter than 'static. Through the &'a mut &'b str reference, one could write a short-lived 'b string slice into the slot. But the slot was originally typed as &'static str, so any code holding the original &'a mut &'static str would subsequently read back a reference it believes to be 'static, which is in fact only 'b. This is the canonical variance-extension example described in the Rustonomicon,^[16] and it shows that covariance for mutable references would allow lifetimes to shrink through indirection, producing dangling references.

3.4. Comparison: Go's Escape Analysis and GC vs. Rust's Static Lifetimes

Go achieves the same high-level guarantee (no dangling pointers, no use-after-free) through a completely different mechanism: escape analysis and a tracing garbage collector. The Go compiler performs escape analysis to determine, at compile time, whether a value's address escapes its allocating function. If it does, the value is heap-allocated and the collector manages its lifetime. If it does not, the value lives on the stack and is reclaimed automatically when the function returns.

The trade-off between the two approaches is a trade-off between where the cost is paid. Go pays no compile-time annotation cost: there are no lifetime variables, no borrow checker constraints, and no need for the programmer to reason explicitly about validity spans. The cost is paid at runtime. The garbage collector runs concurrently with the programme, periodically pausing all goroutines (the stop-the-world phase) to trace the live heap. Even with a low-latency collector, this introduces non-deterministic pause times. Escape analysis is also conservative: values may be heap-allocated when stack allocation would have been possible, and the programmer has no recourse. Memory layout is opaque to the programmer.

Rust pays the cost entirely at compile time. The borrow checker statically verifies every lifetime constraint before the programme runs. At runtime, there is no collector, no scheduler overhead from GC cooperation points, and no hidden allocation. The programmer provides lifetime annotations where necessary to help the constraint solver, but these annotations are checkable evidence, not comments. Both approaches are sound in the sense that neither permits dangling pointer dereferences in code written to the respective language's safety rules. The choice is a point on a three-way design triangle: annotation cost (Rust: higher), runtime cost (Go: higher), and degree of control over memory layout (Rust: complete, Go: limited).

3.5. RustCell: Dangling Reference Rejection

The lifetime constraint system rejects programmes where a reference would outlive its referent. The following example illustrates the canonical form: a function attempts to return a reference to a local variable, which is dropped at the end of the function's scope. The borrow checker determines that the returned reference's lifetime cannot be satisfied because the referent does not outlive the function call, and rejects the programme with a missing lifetime specifier error.

fn first_word(s: &str) -> &str {
  &s[..s.find(' ').unwrap_or(s.len())]
}

fn dangle() -> &str {         // error[E0106]: missing lifetime specifier
  let s = String::from("hello world");
  first_word(&s)             // s is dropped at end of this scope
}                              // but the returned reference would outlive it

fn main() {
  println!("{}", dangle());
}

Lifetime constraint violation: the returned reference outlives its referent

The compiler cannot find a lifetime annotation that would make this programme well-typed, because s is a local variable whose lifetime ends when dangle returns, but the return type &str must outlive the call site. The constraint has no solution, so the programme is rejected. In C, the analogous code compiles without warning and produces a dangling pointer at runtime.

4. Operational Semantics and the MiniRust Model

4.1. Small-Step Operational Semantics

Type systems give a static account of what programmes are: they classify expressions into types and enforce structural invariants. Operational semantics gives a dynamic account of what programmes do: it describes the step-by-step execution of a programme on an abstract machine. The standard tool for this is a small-step reduction relation on configurations, where each step models one primitive computation. A programme's execution is a sequence of configurations linked by reduction steps, terminating in a final value or diverging.

Definition (MiniRust configuration).

A MiniRust configuration is a triple where is the heap (a partial function from addresses to typed values), is the call stack (a list of frames, each a partial function from variable names to addresses), and is the expression currently being evaluated. MiniRust is the formal operational model for Rust developed by Ralf Jung,^[17] designed to be the authoritative reference for what Rust programmes mean.

Several representative reduction rules illustrate how ownership manifests operationally. For variable lookup, the rule is: , that is, x is resolved by looking up its address in the top stack frame and retrieving the value at that address from the heap. For a move operation, moving the value of variable x into variable y removes the entry for x from the current frame and creates a new entry for y. The heap cell at x's address is marked invalid (or the entry is simply removed from the frame's domain), so any subsequent lookup of x would be an error. This is the operational realisation of the affine discipline: the move is a one-way transfer, and the source is invalidated. For drop, when a variable goes out of scope, the value's destructor (Drop::drop) is called, the memory at the variable's address is freed, and the frame entry is removed. This is deterministic and immediate: there is no deferred reclamation.

Theorem (Type safety (Progress and Preservation)).

For any well-typed λRust configuration , either is a value (progress: evaluation has reached a final result), or there exists a configuration such that and is well-typed (preservation: the reduction step produces a configuration that satisfies the type invariants). This result is formally proved in the RustBelt framework by Jung et al. (POPL 2018)^[7] using an Iris-based separation logic in Coq, which provides machine-checked proofs of progress and preservation for a substantial fragment of safe Rust. MiniRust^[17] is a more recent and separate project designed to serve as a complete operational specification; the RustBelt proof is for the λRust core calculus.

The formal model of Rust's operational semantics serves as the specification against which compiler optimisations are judged: an optimisation is valid if and only if it preserves the observable behaviour of every programme under the abstract machine's semantics. Without a formal model, the boundary between defined and undefined behaviour is informal and contested.

4.2. Stacked Borrows: the Aliasing Discipline

The borrow checker provides a static aliasing discipline, enforced at compile time. But the question of what aliasing patterns are permissible also has an operational answer: given that the programmer can write unsafe code that bypasses the borrow checker, what invariant must hold at runtime for the code to be considered well-defined? Stacked Borrows, developed by Jung et al. (POPL 2020),^[8] provides this operational model. It defines, for each memory location, a dynamic invariant that tracks the current aliasing state, and it is implemented in the Miri interpreter, which can detect violations at runtime on concrete executions.

Definition (Stacked Borrows).

Each memory location maintains a borrow stack: an ordered list of items, each being either a Unique tag (representing a live mutable reference to that location) or a SharedReadOnly tag (representing a live shared reference). When a mutable reference is created, a new Unique tag is pushed onto the stack. When that reference is accessed, all items above it are invalidated (popped), because the access asserts exclusive ownership of the location at that point. When a shared reference is created, a SharedReadOnly tag is pushed above the existing Unique tag. Any subsequent write access through the Unique tag would attempt to pop the SharedReadOnly items above it; the model treats this as undefined behaviour, because the shared reference was derived from the mutable one and must remain valid.

This model formalises the intuition behind the borrow checker's rules. Two simultaneous mutable borrows are unsound because the second mutable borrow's Unique tag sits above the first's on the stack. Accessing the first borrow after the second exists would pop the second's tag, invalidating it. In Miri, this is reported as undefined behaviour. The operational model makes the violation detectable on concrete executions, complementing the static compile-time check.

The comparison with C++ is instructive. C++ has a strict aliasing rule: objects of distinct types may not alias the same memory (with explicit exceptions such as char and std::byte). Violations are undefined behaviour, but the rule is enforced only by the abstract machine model and compilers are free to assume it holds when optimising. There is no runtime enforcement mechanism analogous to Stacked Borrows, and no standard tool equivalent to Miri for detecting violations dynamically. Stacked Borrows is therefore a more precise and operationally checkable aliasing discipline than C++'s strict aliasing rule.

4.3. Undefined Behaviour as a Semantic Gap

Undefined behaviour is the most consequential concept in the semantics of systems languages, and it is routinely misunderstood. A programme crash and an error report are two of many possible outcomes, but neither is guaranteed. The precise meaning: an execution to which the abstract machine's semantics assigns no defined meaning.

Definition (Undefined behaviour).

Undefined behaviour (UB) is any execution that falls outside the domain of the abstract machine's semantics. The compiler is permitted to assume that UB never occurs in a well-formed programme. Consequently, any code path that would lead to UB may be pruned, reordered, or transformed in ways that are locally sound under the no-UB assumption but globally surprising. UB produces no trap, no signal, and no diagnostic. The programme's execution simply has no defined meaning from the point of UB onwards.

Rust's UB surface in safe code is minimal. The Rust Reference^[2] enumerates the sources: data races (concurrent unsynchronised writes to the same location), dereferencing a null or dangling raw pointer, violating the aliasing rules with raw pointers (the Stacked Borrows invariant, detectable by Miri), producing a value of an invalid type via transmute or pointer arithmetic (for instance, a bool with a bit pattern other than 0 or 1), and calling a function through a pointer with a mismatched ABI or signature. In safe code, the type system prevents all of these: raw pointers cannot be dereferenced without unsafe, data races are prevented by the Send and Sync traits, and transmute is an unsafe function.

The contrast with C++ is stark. C++'s UB catalogue is substantially larger: signed integer overflow, left-shifting into the sign bit, reading from uninitialised memory (any variable not explicitly initialised), violations of the strict aliasing rule, null pointer dereference, out-of-bounds pointer arithmetic (not merely dereference), lifetime extension violations through incorrect use of references, and many corner cases in the object model. In practice, a large fraction of C++ programmes contain latent UB that is masked by current compiler behaviour but may be exposed by a future optimisation. Rust restricts UB to unsafe blocks (with the narrow exception of integer overflow in debug builds, which panics rather than wrapping silently, though in release builds wrapping is defined). A programme with no unsafe blocks has a UB surface that is essentially empty for the correctness invariants that matter most in practice.

4.4. RustCell: Two Simultaneous Mutable Borrows (Borrow-Checker Rejection)

The borrow checker's enforcement of exclusive mutable access is a theorem of the type system: no well-typed safe Rust programme can have two simultaneously live mutable references aliasing the same location. The following programme is the minimal demonstration of this constraint being violated and rejected at compile time.

fn main() {
  let mut v = vec![1, 2, 3];
  let r1 = &mut v;
  let r2 = &mut v;  // error[E0499]: cannot borrow `v` as mutable more than once
  r1.push(4);
  r2.push(5);
}

Borrow-checker rejection: two simultaneous mutable borrows of v

The compiler's error is precise: error[E0499]: cannot borrow v as mutable more than once at a time. The NLL analysis determines that r1 is live at the point where r2 is created (because r1 is used on the line after r2's creation), so the regions of the two borrows overlap, violating the constraint that no two mutable borrows alias simultaneously. In the Stacked Borrows model, creating r2 would push a new Unique tag above r1's tag, and subsequently accessing r1 would invalidate r2's tag, producing undefined behaviour. The borrow checker catches this statically, before any code runs, with no runtime overhead.

5. Traits, Generics, and Type-Class Semantics

5.1. Traits as Bounded Universal Quantification

A trait bound T: Trait expresses bounded universal quantification: for all types T satisfying the Trait interface. The bound is a formal statement about the set of types that may be substituted at a call site, and the compiler enforces it by requiring a proof of satisfaction, an impl block, before accepting the substitution.

Definition (Trait).

A trait defines an interface: a set of associated types and method signatures. Formally, a trait is a record of typed declarations where each is a type scheme that may mention Self (the implementing type) and the associated types of .^[9]

Definition (Trait bound and impl witness).

A trait bound T: Trait is a bounded universal quantification over all types T that possess an implementation of Trait. An impl Trait for Type block is a witness: it provides concrete definitions for every method and associated type declared by Trait, and the compiler accepts it as proof that Type satisfies the bound. A generic function fn f<T: Trait>(...) is a function parametrised over all types T for which such a witness exists.

Coherence is the property that at most one impl exists per (trait, type) pair in a compiled programme. This prevents the situation where two different crates provide conflicting implementations, which would make trait dispatch ambiguous. The orphan rule enforces coherence at the crate boundary: an impl Trait for Type is permitted only if at least one of Trait or Type is defined in the current crate. A crate that defines neither may not provide an impl between them. This rule ensures that the set of impls for any (trait, type) pair is determined by a unique crate, so no two crates can conflict.

5.2. Monomorphisation vs. Dynamic Dispatch

Generic functions over trait bounds are compiled by monomorphisation: the compiler emits one specialised copy of the function for each concrete type that appears at a call site. The result is machine code identical to hand-written specialised code. There is no abstraction penalty at runtime.

Remark (Vtable layout for dyn Trait).

A dyn Trait value is a fat pointer: a pair of a data pointer (pointing to the value) and a vtable pointer (pointing to a statically allocated table of function pointers). The vtable contains one function pointer per method declared in Trait, arranged in a fixed, deterministic order. Calling a method through dyn Trait dereferences the vtable pointer and calls the appropriate function pointer. This is the same layout used by C++ virtual dispatch: a pointer to a virtual function table. The overhead is one pointer dereference per call, plus a potential branch misprediction if the callee is not in the instruction cache.

Rust makes the dispatch choice explicit in the type. A bound T: Trait or an argument-position impl Trait in a function signature produces static dispatch via monomorphisation. A dyn Trait type produces dynamic dispatch via a fat pointer and vtable. C++ has the same distinction (templates for static, virtual for dynamic) but does not encode it in the type: the programmer must remember to mark methods as virtual and callers have no syntactic indication of which regime applies. Go interfaces are always dynamically dispatched: every interface value is a fat pointer, and there is no way to request static dispatch. Rust's encoding makes the cost model explicit at the type level.

5.3. Higher-Kinded Patterns via GATs

Generic Associated Types (GATs, stabilised in Rust 1.65) allow type constructors as associated items in a trait. This enables higher-kinded patterns without requiring first-class higher-kinded types (HKT) in the language. The canonical example is a lending iterator, which yields items whose lifetimes are tied to the borrowing of the iterator itself:

trait LendingIterator {
    type Item<'a> where Self: 'a;
    fn next<'a>(&'a mut self) -> Option<Self::Item<'a>>;
}

Here, Item is not a concrete type but a type constructor parametrised by a lifetime 'a. Without GATs, this pattern cannot be expressed: there is no way to make the associated type depend on a lifetime introduced at the method level. With GATs, the trait can express that the item borrows from the iterator, enabling zero-copy iteration over data structures without heap allocation.

Rust does not have first-class HKT. The kind * -> * required to abstract over a type constructor F uniformly is not representable in the trait system. GATs are a targeted workaround: they solve the specific patterns that arise most often in practice (lifetime-parametrised associated types, size-parametrised associated types) without introducing the full complexity of a higher-kinded type system.

5.4. Comparison: Haskell Type Classes and Go Interfaces

Three systems address ad-hoc polymorphism with substantially different trade-offs.

Haskell type classes are nominal: a type must be declared an instance of a class to use class methods with that type. Resolution is at compile time by default (dictionary-passing style: the compiler selects and threads the appropriate dictionary implicitly, but each method call dispatches through the dictionary at runtime). Haskell has first-class HKT: Functor f and Monad m range over type constructors of kind * -> *. Rust cannot express the Functor abstraction directly because it lacks the kind * -> * in its trait system. Haskell's coherence is weaker: orphan instances (instances defined in a module (from a package) that owns neither the class nor the type) are permitted with a warning, and silent priority issues can arise when two libraries define conflicting orphan instances that a third library combines.

Go interfaces use structural typing. A type implicitly satisfies an interface if it provides the required methods, with no explicit declaration. There is no impl block and no orphan rule. Every interface value is always dynamically dispatched via a fat pointer: there is no opt-in to static dispatch. This is simpler to use but forces all interface abstraction to pay the vtable cost.

Rust traits are nominal (explicit impl required), static by default (monomorphised), and dynamic on explicit opt-in (dyn Trait). The coherence and orphan rules prevent conflicting impls from reaching a linked programme: any conflict is a compile-time error in the crate that introduces it. The trade-off is that Rust provides compile-time detection of conflicting implementations, Haskell defers to linking and reliance on convention, and Go avoids the problem by using structural typing with no global coherence to maintain.

5.5. RustCell: Static vs. Dynamic Dispatch

The following programme uses the same Display trait via two paths: a generic function that is monomorphised at each call site, and a function that accepts a dyn Display fat pointer. Both print the same output. The compiler emits two specialised copies of print_static (one for i32, one for &str) and one copy of print_dynamic that dispatches via the vtable at runtime.

use std::fmt::Display;

// Static dispatch: monomorphised, zero-cost
fn print_static<T: Display>(val: T) {
  println!("static:  {val}");
}

// Dynamic dispatch: fat pointer + vtable
fn print_dynamic(val: &dyn Display) {
  println!("dynamic: {val}");
}

fn main() {
  print_static(42_i32);
  print_static("hello");
  print_dynamic(&42_i32);
  print_dynamic(&"hello");
}

Static dispatch (monomorphised) vs. dynamic dispatch (vtable pointer): same interface, different cost

6. Categorical Structures

Sections 2 and 3 established that Rust's type system is a substructural logic and that lifetimes are region variables in a constraint system. Category theory is the language in which these structures are stated at their most precise, and in which their connections to each other become derivable rather than asserted. The objects of the type category are types; the morphisms are functions; composition is function composition; the identity morphism at each type is the identity function. This is not a metaphor. Every design decision in Rust's standard library that governs how types compose has an exact categorical explanation: why match must cover all constructors, why fold is the canonical operation on recursive types, why Iterator::zip and Future::join have fundamentally different semantics from and_then, and why the Copy bound exists as a separate trait rather than a compiler default. The subsections below develop these correspondences in order: initial algebras for ADTs (§6.1), terminal coalgebras for iterators (§6.2), natural transformations and parametricity for generic APIs (§6.3), the Kleisli category for the ? operator and async/await (§6.4), and symmetric monoidal categories for the ownership discipline (§6.5).

6.1. ADTs as Initial Algebras

Algebraic data types have a precise categorical interpretation as initial algebras of polynomial functors. A polynomial functor is built from constants, the identity, products, and coproducts, and its initial algebra is, up to isomorphism, the least fixed point: the smallest type closed under the functor's constructors from which a unique structure-preserving map (a fold) exists to any other algebra of the same functor. The relevance to Rust is immediate: the compiler's requirement that match be exhaustive, and that every ADT have a finite case analysis, is exactly the statement that the programmer is constructing the unique catamorphism out of an initial algebra.

Definition (Initial algebra of a functor).

Let be a functor. An -algebra is a pair where is a set and is the structure map. The initial -algebra is the algebra such that for every -algebra there exists a unique algebra homomorphism (catamorphism) .^[9]

Under this framing, Option<T> is the initial algebra of the functor , the coproduct of the unit type and a fixed type . Its least fixed point is exactly None | Some(T): a value is either nothing (the summand) or a wrapped value (the summand). Similarly, Result<T, E> is the initial algebra of , the coproduct of and (both Option<T> and Result<T, E> are non-recursive types, so their least fixed point is the type itself; the initial-algebra framework is most illuminating for recursive types such as List<T>, where the fixed-point computation is non-trivial). Structs are products of their field types (record types, the categorical product). Enums are coproducts of their variant types (sum types, the categorical coproduct).

Remark (Pattern matching as catamorphism).

Pattern matching on an ADT is the unique fold (catamorphism) out of its initial algebra. A match expression on Option<T> that provides a branch for None and a branch for Some(v) is exactly the unique -algebra homomorphism from to the result type. The existence and uniqueness of this homomorphism is the universal property of the initial algebra. This correspondence makes exhaustive pattern matching not merely a convenience but a categorical necessity: to consume an initial algebra, one must provide exactly one handler per constructor.

Theorem (Lambek's lemma).

The structure map of the initial -algebra is an isomorphism. Its inverse is the unique catamorphism from to the algebra .

Proof. The pair is an -algebra with structure map . By initiality of , there exists a unique -algebra homomorphism , satisfying the homomorphism condition

where the second equality is functoriality of .

Step 2 (). An -algebra endomorphism must satisfy . For :

where the last equality uses [eq:out-hom]. So is an -algebra endomorphism. Since is also one, initiality (which asserts a unique homomorphism from to any algebra) forces .

Step 3 (). Applying [eq:out-hom] and the result of Step 2:

Both compositions are identities, so is an isomorphism with two-sided inverse . □

For Rust types, Lambek's lemma is the statement that every recursive type can be unrolled one step without loss of information. The concrete list functor makes this precise: its initial algebra has structure map sending the left summand to Nil and the right summand to Cons(n, xs), while is the inverse operation that peels off the outermost constructor. One checks directly: , and for any , confirming ; the reverse, , follows by equation [eq:out-hom] and the Step 2 uniqueness argument. The isomorphism for is witnessed by the fold and unfold of a linked list: every list is either empty or a head-tail pair, and this decomposition is lossless and invertible. For Option<T> (a non-recursive fixed point where holds by type isomorphism), the result is immediate. For the recursive case, Lambek's lemma is the categorical reason why structural recursion terminates: each recursive call operates on a strictly smaller algebra, by applying once to peel off the outermost constructor. The dual statement is that the observation map of the terminal -coalgebra is also an isomorphism. The stream type from §6.2 satisfies this, and it is why coinductive proofs over streams are permitted: each proof step applies once to observe one element, and the process continues productively.

6.2. Iterator as a Coalgebra

Where ADTs are initial algebras (inductive, built up from constructors), iterators are coalgebras (coinductive, torn apart by destructors). A coalgebra of a functor is a pair where is the state space and is the observation map.

Definition (Stream coalgebra and the Iterator trait).

The Rust Iterator trait is a coalgebra of the functor . The state type is Self. The observation map is fn next(&mut self) -> Option<Self::Item>, which takes the current state and either produces a pair of an item and the next state (wrapped in Some) or signals exhaustion (returns None). This is the stream coalgebra: a potentially infinite sequence torn down one element at a time.^[11]

Haskell lazy lists are also a coalgebra of the same functor, but they are memoised via thunks: each unevaluated tail is a heap-allocated closure. This imposes one heap allocation per unconsumed element in a lazy spine. Rust iterators are zero-cost: the compiler represents the iterator state in registers or on the stack, and the adapter chain is compiled to a single tight loop with no allocation. The coalgebraic structure is the same; the operational cost is not.

6.3. Natural Transformations and Parametricity

A natural transformation between functors is a family of morphisms , one for each object in , satisfying the naturality condition: for every morphism in ,

In Rust, any fully parametric function of type for functor-shaped generic types is a natural transformation, by Reynolds's parametricity theorem.^[10] The naturality condition is not a proof obligation: it is a free theorem. Because generic functions are erased to concrete code per instantiation, and have no runtime mechanism to inspect which is in use, they cannot behave differently for different choices of . Any behaviour distinguishing two instantiations would require either a runtime type tag (impossible in safe Rust without explicit trait machinery) or casting through unsafe.

Theorem (Parametricity implies naturality).

Let and be functor-shaped generic types with map implementations satisfying the functor laws. Any function of type that does not use unsafe or runtime type inspection satisfies the naturality square [eq:naturality-square] for all functions .

Proof. Fix . Define the graph relation by . Reynolds's abstraction theorem assigns a relation to each type constructor by structural induction on the type. For a covariant functor, the relational lifting is

The abstraction theorem states that any parametric term of type preserves these relations: if then .

Instantiate with and , which satisfies by definition. Relation preservation gives , which unpacks to . Since was arbitrary, this is [eq:naturality-square] at every point of .

The exclusion of unsafe and runtime type inspection is necessary: std::any::TypeId::of::<T>() observes the identity of at runtime and invalidates the relational interpretation, as a term branching on can map to and to via different code paths, breaking relation preservation. In safe Rust without specialisation, no such observation is possible, so [eq:naturality-square] holds for all terms of the stated type. □

The parametric endomorphisms of Option<T> in safe Rust are exactly two: the identity (pass through every value unchanged) and the constant-None map (discard any value). These are also the only candidates up to naturality: the identity satisfies [eq:naturality-square] trivially, and const None satisfies it because both sides of the square evaluate to None regardless of . No third parametric endomorphism exists, because producing a Some from a None would require conjuring a value of an unknown type , which safe Rust cannot do. This exhaustive case analysis, rather than just the abstract proof, is why parametricity gives a complete classification of the natural transformations between common generic types.

The functor laws for Option::map are: identity, , and composition, . These are semantic invariants: the compiler does not verify them, and a map implementation violating either will produce surprising behaviour when composed with generic adapters that depend on them.

The monad unit (defined as Some) and the multiplication (defined as flatten, i.e., opt.and_then(|x| x)) are both natural transformations. One verifies directly from the definition: flatten does not inspect the value of , so the order in which map and flatten are applied is irrelevant. Together, is a monad in the categorical sense: a monoid in the monoidal category of endofunctors on , with multiplication given by the natural transformation and unit by .

6.4. The Kleisli Category

Given a monad on a category , the Kleisli category has the same objects as and morphisms defined as maps in . The identity morphism at is the unit , and Kleisli composition of with is

Definition (Kleisli composition and the monad laws).

The monad laws are exactly the category axioms for : left identity (), right identity (), and associativity (). In Rust, these read: Ok(a).and_then(f) == f(a) (left identity), f(a).and_then(Ok) == f(a) (right identity), and (f(a).and_then(g)).and_then(h) == f(a).and_then(|b| g(b).and_then(h)) (associativity).

For , a Kleisli morphism is a fallible computation . The ? operator chains such morphisms via Kleisli composition: a sequence of ?-terminated expressions is an unambiguous morphism in , and the associativity law guarantees that the chain's result is independent of how it is parenthesised. The ergonomic justification for ? is therefore not merely "less boilerplate" but a statement that the programmer is working in a well-defined category whose composition law exactly captures the semantics they want.

The async case admits an analogous interpretation. A morphism is a function returning impl Future<Output = B>. Kleisli composition is sequenced .await: the state machine generated by the compiler (§7.1) is the concrete realisation of this composition at the level of the abstract machine, where each await point is a Kleisli composition step with the implicit monad being the Tokio-driven Future monad. Because Rust lacks higher-kinded types (§5.3), this monad cannot be named at the type level, but its operational structure (identity, sequencing, associativity) is fully present.

6.5. Symmetric Monoidal Categories and Linear Types

The structural rules of substructural logic (§2.1) correspond precisely to properties of the monoidal structure on the type category. A symmetric monoidal category equips a category with a bifunctor , a unit object , coherence isomorphisms , , , and a symmetry , satisfying MacLane's pentagon and triangle coherence conditions.

Definition (Rust's affine monoidal type category).

The affine fragment of Rust's type system forms a symmetric monoidal category in which objects are types, morphisms are move-consuming functions, the tensor product is the pair type , the unit is , and the symmetry is the swap isomorphism . The affine discipline corresponds to the presence of discard maps (the drop function) for every type, but the absence of diagonal maps for non-Copy types.

The dichotomy between Copy and non-Copy types is a categorical statement. The Copy trait identifies a full sub-category in which the diagonal map exists for every object, making the monoidal structure cartesian on this sub-category. A cartesian monoidal category is one in which the tensor product is the categorical product with projection maps; this is precisely the situation for primitive scalars and types composed entirely of them. Outside , for types owning heap resources, file descriptors, or mutex guards, no diagonal exists. The borrow checker enforces this at compile time: the morphism simply does not typecheck.

A lax monoidal functor from to consists of a functor equipped with a natural transformation and a unit map , subject to associativity and unit coherence conditions. This is the categorical signature of the Applicative type class: an applicative functor is precisely a lax monoidal functor. In Rust, Iterator::zip witnesses the lax monoidal structure of the iterator functor: the transformation combines two independent iterators into an iterator of pairs without imposing a causal ordering. futures::future::join is the analogous structure for the Future functor: two independent futures may execute concurrently, and their results are combined once both resolve. The categorical distinction between zip/join (lax monoidal, no sequencing) and and_then/.await (Kleisli, sequential) is the precise formulation of the concurrency/sequencing boundary that async Rust makes syntactically explicit.

6.6. Monad-Like Patterns Without HKT

Rust cannot abstract over a generic Monad m constraint because the trait system lacks the kind * -> * required to write trait Monad: for<T> Functor<T> in a type-safe way. Instead, each concrete type (Option, Result, Future) implements its own and_then (bind) method independently. The monadic structure is present and fully usable: it is simply not abstractable over an arbitrary Monad parameter.

The ? operator is syntactic sugar for the monadic bind pattern applied to fallible types. For Result<T, E> in a function returning Result<U, E>, the desugaring is:

expr?

becomes:

match expr {
    Ok(v)  => v,
    Err(e) => return Err(e.into()),
}

For Option<T>, the analogous desugaring returns None on the absent case. The generalisation to arbitrary types is provided by the Try trait (currently unstable), which abstracts the short-circuiting behaviour without requiring a full Monad abstraction.

Remark (? as specialised monadic bind).

The ? operator is the same compositional intent as Haskell's >>= (bind), specialised to Result and Option. In Haskell, do { v <- expr; rest } desugars to expr >>= \v -> rest for any Monad instance. Rust's ? achieves the same effect for the two most important monadic types in systems programming, with no runtime overhead and no heap allocation, but without the ability to abstract over the monad parameter.

6.7. Comparison: Haskell do-notation and Monadic IO

Haskell's do-notation desugars to >>= (bind) and >> (sequence) for any type with a Monad instance. IO a is the monad for effectful computation: a value of type IO a is an action that, when executed by the runtime, produces a value of type a. Any type with a Monad instance can use do-notation, including user-defined types: parsers, state machines, error accumulators, probability distributions. The abstraction is truly general.

Rust's ? is specialised to Result, Option, and types implementing the Try trait. It cannot be used with an arbitrary user-defined monad. The trade-off is directional. Haskell's approach is more general: a single abstraction (Monad) covers all compositional fallible or effectful computations. Rust's approach is more explicit: each type defines its own and_then, and the compiler emits the same machine code as a hand-written sequence of match expressions and early returns, with no allocation. Both achieve the goal of composing fallible computations without deeply nested match expressions. The difference is that Haskell's generalisation enables polymorphic libraries (a parser combinator written generically over any monad), while Rust's specialisation ensures that the cost model is always transparent.

6.8. RustCell: Iterator Adapter Chain

The following programme demonstrates the stream coalgebra in action. Each adapter (filter, map, take, fold) wraps the previous iterator in a new state type. The compiler fuses all adapters into a single loop: no intermediate vectors are allocated, no heap is touched, and the generated code is equivalent to a hand-written for loop with a running accumulator.

fn main() {
  // Coalgebra in action: a chain of zero-cost iterator adapters.
  // The compiler fuses all adapters into a single loop with no allocation.
  let result: i64 = (1_i64..=100)
      .filter(|n| n % 2 == 0)       // keep evens: 2, 4, 6, ...
      .map(|n| n * n)                // square: 4, 16, 36, ...
      .take(10)                      // first 10 squared evens
      .fold(0, |acc, n| acc + n);    // sum

  println!("{result}");  // 1540 (4+16+36+64+100+144+196+256+324+400)
}

Iterator adapter chain (filter, map, take, fold): compiled to a single zero-allocation loop

7. Async Rust and Tokio

7.1. Futures as State Machines

An async fn is desugared by the compiler into an anonymous struct that implements the Future trait. Each .await point inside the function becomes a variant in an internal state-machine enum representing the suspension point where the future yielded.

Definition (The Future trait).

The Future trait is defined as:

fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>

Poll<T> has two variants: Poll::Ready(T), returned when the computation is complete, and Poll::Pending, returned when the computation has not yet completed and the future has arranged for the executor to be notified via the Waker stored in cx. A Waker is a handle to a task scheduler: calling waker.wake() instructs the executor to re-poll the future.^[13]

The desugaring makes suspension explicit at every step. A function containing two .await points is compiled into an enum with three variants: the state before the first .await, the state between the two .await points (holding any locals that are live across the first suspension), and the state after the second .await. Each call to poll advances the state machine to the next suspension point or, if the final computation has completed, returns Poll::Ready. No OS thread is blocked between suspension points. The entire state of the computation lives in the heap-allocated enum, consuming only as much memory as the live locals at each suspension point require.

7.2. The Waker Protocol

Definition (The Waker protocol).

When Future::poll returns Poll::Pending, the future is contractually required to have stored (cloned) the Waker from the supplied Context<'_>. The future must then arrange for waker.wake() to be called when the awaited resource becomes available. The executor will not re-poll the future until wake() is called. This contract ensures that the executor never spins (busy-waiting on an unready future) and never drops a future that has arranged a wakeup.^[13]

In practice, when an underlying I/O source becomes ready (a socket is readable, detected by epoll on Linux or kqueue on macOS), the OS notifies Tokio's reactor, which looks up the Waker registered for that I/O source and calls waker.wake(). This re-queues the waiting task on a worker thread's run queue. No OS thread is blocked during the wait. The cost of suspending and resuming a task is a single heap dereference and a queue push, not a kernel context switch.

7.3. Pin and Structural Pinning

A Future state machine may contain self-referential pointers: an internal reference to a local variable that is itself stored in the same state-machine struct. If the struct were moved (copied to a new address), that internal reference would point to the old location and become dangling. Pin<P> addresses this precisely.

Definition (Unpin and the Pin invariant).

Unpin is a marker trait. A type that implements Unpin is safe to move even when it is accessed through a Pin pointer, because it makes no self-referential use of its own address. Most types in Rust implement Unpin automatically. Compiler-generated async state machines do not implement Unpin, because they may contain self-referential fields across .await points.

The Pin<P> invariant: if P is a pointer type and T = *P does not implement Unpin, then once a value of type T is placed behind a Pin<P>, it will not be moved for the lifetime of the pin. Concretely, Pin<Box<T>> means "the T at this address will not move until the Box is dropped."^[16]

The poll signature takes self: Pin<&mut Self> precisely because the executor must be able to call poll repeatedly on the same future without moving it between calls. A future that is not Unpin can only be polled through a pinned pointer, making it a compile-time error to move the future after it has been polled for the first time.

7.4. Tokio's Work-Stealing Scheduler

Tokio uses an M:N threading model: M OS threads (one per logical CPU core by default) run N async tasks, where N can be millions. Each worker thread maintains a local run queue of ready tasks. When a thread's local queue is empty, it steals tasks from the back of another thread's queue. This is the same work-stealing algorithm used by the Rayon data-parallelism library and by the Go runtime scheduler. It achieves near-linear scaling with core count and eliminates the need for a global lock on the run queue.^[12]

A Tokio task has zero stack overhead. Its state is a single heap-allocated Future struct, the state machine described in section 7.1, consuming only as much memory as the live locals at the current suspension point. There is no per-task stack, no stack-growth check, and no guard page. Spawning a million Tokio tasks requires on the order of megabytes, not gigabytes.

The macro #[tokio::main] expands to a call to Runtime::new().block_on(async { ... }), which builds the multi-threaded scheduler, spawns the worker threads, and blocks the main OS thread until the async entry point returns Poll::Ready.

7.5. Comparison: Go Goroutines and the Go Runtime Scheduler

Go goroutines and Tokio tasks solve the same problem (M:N concurrency on top of OS threads) through two different designs. Three differences are worth examining.

Stack allocation. A goroutine begins with a minimum 2 KB stack that grows dynamically as needed, up to a default limit of 1 GB. Growth is managed by the Go runtime: when a goroutine's stack overflows, the runtime allocates a new, larger stack and copies the goroutine's frames to it. A Tokio task has no stack. Its state is entirely contained in the Future enum on the heap, sized at compile time to hold the live locals at each .await point. Zero stack management, zero copying.

Scheduling discipline. Go has used preemptive scheduling since Go 1.14. The runtime inserts safe preemption points (at function call sites and at certain loop back edges) and can suspend a goroutine at any of them, even if the goroutine has not performed an I/O operation. Tokio is cooperative: a task runs until it returns Poll::Pending. A task that computes intensively without ever awaiting blocks its worker thread for the duration of the computation, starving other tasks on that thread. The implication is that CPU-bound work in Tokio must be offloaded to a dedicated blocking thread pool (tokio::task::spawn_blocking) rather than running directly on the async executor.

Visibility of asynchrony. In Go, every blocking call (channel receive, network I/O, a time.Sleep) is transparently a goroutine yield: the programmer writes synchronous-looking code and the runtime handles the scheduling. In Rust, every yield is an explicit .await. The asynchronous structure of the programme is visible in the type system: an async fn returns a Future, and that Future must be driven by an executor. Every suspension point is named.

Both designs are valid. Go's is simpler to use: no async/await syntax, no Future types, no Pin. Rust's is zero-overhead (no hidden stack allocation, no runtime stack management) and auditable (every suspension point is explicit and reviewable).

7.6. Polybius: Async Order Execution over WebSockets

The Polybius trading engine uses Tokio for all network I/O. Each market feed runs as a tokio::spawn-ed task that maintains a persistent WebSocket connection to Polymarket's CLOB, parsing order-book updates and writing into shared state via Arc<RwLock<State>>. The design uses tokio::sync::RwLock for read-heavy shared state and tokio::sync::Mutex for write-critical sections, deliberately avoiding an actor model in favour of explicit lock-based coordination. The execution path is a direct instance of the Waker protocol operating in a production context: when a WebSocket frame arrives, the OS notifies Tokio's reactor via epoll, which calls waker.wake() on the waiting task, which is re-queued on a worker thread's run queue and re-polled to completion.

7.7. RustCell: async/await with Tokio

The following programme demonstrates the core async pattern: a producer task is spawned with ownership of the sender half of an mpsc channel (transferred via move into the async block), while the main task consumes messages from the receiver half using rx.recv().await. The main task suspends at each .await, allowing the scheduler to drive the producer task forward, and resumes when the channel delivers a message. The programme exits when the channel is closed (the sender is dropped at the end of the producer task), causing rx.recv() to return None and the while let loop to terminate.

use tokio::sync::mpsc;

#[tokio::main]
async fn main() {
  let (tx, mut rx) = mpsc::channel::<String>(8);

  // Spawn a producer task
  tokio::spawn(async move {
      for i in 0..3 {
          tx.send(format!("message {i}")).await.unwrap();
      }
  });

  // Consume on the main task
  while let Some(msg) = rx.recv().await {
      println!("{msg}");
  }
}

Tokio async/await: producer task + mpsc channel receive

8. Performance: Zero-Cost Abstractions and the FFI Bridge

8.1. What "Zero-Cost" Means Formally

Bjarne Stroustrup articulated the zero-cost abstraction principle as: "What you don't use, you don't pay for. And further: What you do use, you couldn't hand-code any better."^[18] In Rust, this principle has a precise operational meaning in terms of monomorphisation and LLVM IR.

Remark (The zero-cost abstraction principle).

A generic abstraction is zero-cost if, after monomorphisation and inlining by the compiler, the generated machine code is identical to hand-written specialised code. As a concrete instance: the generic function fn add<T: std::ops::Add<Output=T>>(a: T, b: T) -> T { a + b } instantiated at i64 produces LLVM IR identical to the non-generic fn add_i64(a: i64, b: i64) -> i64 { a + b }. The generic version has precisely zero overhead: the compiler erases the type parameter and emits the same integer addition instruction.

The zero-cost principle applies to safe, statically dispatched code paths. Two explicit exceptions apply. First, unsafe blocks transfer the proof obligation for memory safety to the programmer; the compiler trusts the programmer's claims and may not be able to optimise as aggressively. Second, dynamic dispatch (dyn Trait) is explicitly not zero-cost: every call through a fat pointer incurs a vtable lookup (one indirect branch) and prevents inlining, at the cost of runtime flexibility. Choosing between impl Trait (monomorphised, zero-cost) and dyn Trait (vtable, runtime cost) is therefore a deliberate performance decision.

8.2. LLVM IR and the Compilation Pipeline

rustc lowers Rust source through a sequence of intermediate representations before producing machine code. Each IR stage serves a distinct purpose:

AST: the surface syntax after parsing, representing the programme as a tree of syntactic forms.
HIR (high-level IR): produced after macro expansion, name resolution, and type-checking. Generic bounds are resolved here.
MIR (mid-level IR): a control-flow graph of typed basic blocks with explicit borrow annotations. The borrow checker and lifetime analysis operate entirely on MIR. Monomorphisation also occurs at this stage.
LLVM IR: produced from MIR after optimisation passes specific to Rust. LLVM then applies its own optimisation pipeline: inlining, loop-invariant code motion, auto-vectorisation, dead code elimination, and others.
Machine code: architecture-specific assembly, emitted by LLVM's back end for the target triple.

Remark (MIR as a control-flow graph).

MIR makes the control-flow structure of a programme explicit. A simple loop that accumulates a sum is represented as two basic blocks: bb0 initialises the accumulator and index, then jumps unconditionally to bb1. Block bb1 performs one iteration (read v[i], add to sum, increment i), then branches: if i < len, jump back to bb1; otherwise jump to bb2 (the exit). The borrow checker's task is to verify, at every edge in this graph, that the aliasing and lifetime invariants are satisfied. Because the CFG is explicit, the analysis is decidable and fast.

The MIR representation is also where Rust's safety guarantees are formally checked. A programme that passes the MIR borrow checker is, by the RustBelt soundness theorem,^[7] guaranteed to be memory-safe at every reachable programme point.

8.3. rayon and Data-Parallelism

rayon provides a fork-join, work-stealing parallel iterator that mirrors the sequential iterator API exactly. A computation expressed as v.iter().filter(...).map(...).reduce(...) becomes parallel by replacing .iter() with .par_iter(). The parallelism is entirely behind the API boundary: rayon divides the iterator into chunks, distributes them across a thread pool using work-stealing deques (each thread maintains a deque and steals from the back of another thread's queue when its own is empty), and joins the results. This is the same work-stealing algorithm used by Tokio's multi-threaded scheduler, applied to CPU-bound rather than I/O-bound work.

The pathwise project ( PyPI PyPI; an open-source Rust SDE solver described in §8.5) uses rayon internally to parallelise Monte Carlo path batches across CPU cores. On a 4-core machine, this produces an 8 to 15x speedup over a pure NumPy implementation, with no change to the Python-facing API.

8.4. PyO3 and maturin

PyO3 provides safe Rust bindings to the CPython C API.^[14] A Rust function annotated #[pyfunction] is compiled into a CPython extension module: a native shared object (.so on Linux, .pyd on Windows) that Python loads at import time. maturin is the build tool that orchestrates this pipeline: it compiles the Rust crate, links against libpython, and produces a Python wheel.^[15] A pip install of that wheel drops the native extension into the Python environment with no further configuration.

The performance advantage is direct. Python calls the Rust function via the C ABI: one pointer dereference into the extension module's function table, followed by native Rust execution at full machine speed. There is no interpretation layer and no serialisation overhead for numeric types (PyO3 converts Python floats and integers to Rust primitives at the boundary, not by marshalling through strings or JSON).

The PyO3 0.21+ API replaced the deprecated &PyModule argument type with &Bound<'_, PyModule>. The Bound type makes the relationship between the Rust reference and the Python GIL explicit in the type system, enabling PyO3 to enforce correct GIL handling at compile time rather than at runtime.

Compared to the two alternatives: hand-written C extensions achieve the same mechanism but require substantial boilerplate (PyArg_ParseTuple, reference counting, method table definitions) and are entirely unsafe; ctypes performs FFI at the Python level and is considerably slower because each call crosses from the CPython interpreter into C without any of the native compilation that PyO3 performs.

8.5. pathwise: SDE Solver as a Python Extension

pathwise is an open-source Rust SDE solver that exposes EulerMaruyama and Milstein path simulators as Python classes via PyO3. A user calls pathwise.euler(mu=0.05, sigma=0.2, x0=100.0, dt=0.01, n=1000) from Python and receives a NumPy array of simulated path values. The Rust implementation uses rayon internally to parallelise batch path simulation across CPU cores, achieving an 8 to 15x speedup over a pure NumPy implementation on a 4-core machine (measured benchmark). This is the pattern described in §8.4 operating in a real project: Python provides the ergonomic calling convention, Rust provides the compute, and PyO3 bridges the two with a single function-table indirection.

8.6. inferCNAsc: Copy Number Alteration Inference

inferCNAsc ( PyPI PyPI) is an open-source Rust crate for inferring chromosomal copy number alterations (CNAs) from single-cell RNA-sequencing (scRNA-seq) expression matrices. It illustrates a structural pattern absent from pathwise: Cargo feature flag gating of the entire PyO3 dependency, so that the Rust library builds and ships independently of any Python ABI, and the Python bindings are compiled only when the python feature is explicitly enabled.

The pipeline is a sequential three-stage computation over a cells × genes expression matrix represented as ndarray::Array2<f64>. The first stage, smooth_expression, applies a sliding-window mean within each chromosome, resetting at chromosome boundaries to prevent smoothing across genomically discontinuous regions. The second stage, find_cnas, computes z-scores per gene across cells and thresholds them, producing two indicator matrices for gains and losses. The third stage, assign_cnas_to_cells, applies a run-length scan over consecutive flagged genes on the same chromosome, merging them into CnaRecord structs that record the chromosome, genomic coordinates, flagged gene names, and alteration direction.

At the Python boundary, the numpy PyO3 extension handles zero-copy array bridging: PyReadonlyArray2<'py, f64> borrows a NumPy array without copying its data, and Array2<f64>::into_pyarray_bound(py) transfers ownership of the output array back to Python. A type-system boundary constraint arises for the indicator matrices: the numpy crate (0.22) does not implement IntoPyArray for Array2<bool> directly, so find_cnas emits Array2<u8> (values 0 or 1) across the FFI boundary, with the expectation that Python callers cast via .astype(bool).

The feature-flag design applies the same structural idiom as the alloc feature in cartan-manifolds (§10.2), but on a different axis: rather than gating heap-allocation types, it gates an entire foreign-language ABI. Without the python feature, cargo build produces a standard Rust library with no link-time dependency on libpython. With it, maturin compiles the crate into a Python wheel via the same pipeline described in §8.4. Every #[pyfunction] wrapper and the module registration function are uniformly guarded by #[cfg(feature = "python")], making the boundary explicit and auditable in the same sense as an unsafe block: a clearly demarcated region where the proof obligations for correct behaviour across a foreign calling convention rest entirely with the programmer.

8.7. RustCell (non-runnable): PyO3 Module Structural Example

The snippet below shows the structural pattern. #[pyfunction] marks a Rust function as directly callable from Python. #[pymodule] registers functions in the Python module object. wrap_pyfunction! generates the glue code that bridges the Python and Rust calling conventions: it produces a PyCFunction descriptor that the CPython runtime can invoke, handling argument parsing and return-value conversion automatically.

// This snippet is structural only and cannot run on the Playground.
// PyO3 requires a Python interpreter and the maturin build pipeline.
// Uses the PyO3 0.21+ API: Bound<'_, PyModule> replaces the deprecated &PyModule.
use pyo3::prelude::*;

#[pyfunction]
fn euler(mu: f64, sigma: f64, x0: f64, dt: f64, n: usize) -> Vec<f64> {
  // ... SDE integration ...
  vec![x0; n]  // placeholder
}

#[pymodule]
fn pathwise(m: &Bound<'_, PyModule>) -> PyResult<()> {
  m.add_function(wrap_pyfunction!(euler, m)?)?;
  Ok(())
}

Not executable

PyO3 module structure: #[pyfunction] exposed via #[pymodule] (not runnable on the Playground)

9. Serialisation and the serde Data Model

9.1. The serde Data Model

serde defines a universal intermediate representation between Rust types and serialisation formats.^[19] This intermediate representation, called the serde data model, consists of 29 types organised into two conceptual groups. The first group covers primitives: the boolean, signed and unsigned integers from 8 to 64 bits, 32- and 64-bit floats, char, string, and byte sequences. The second group covers composite containers: option (presence or absence of a value), unit types (zero-sized structs, unit enum variants, the unit value itself), newtype wrappers (single-field structs and newtype enum variants), sequences, tuples and tuple structs, tuple enum variants, maps, named structs, and named enum variants. Every Rust type that serde can handle maps to exactly one of these 29 forms.

Definition (The serde data model).

The serde data model is a set of 29 abstract types that forms the universal intermediate representation for Rust serialisation. A Serializer is a mapping from the data model to a concrete wire format (JSON, TOML, Bincode, MessagePack, and others). A Deserializer is the inverse mapping from a concrete format back to the data model. Because every format communicates through this shared IR, a Rust type that implements Serialize and Deserialize is automatically compatible with every format that provides a conforming Serializer or Deserializer, without any format-specific code in the type itself.

9.2. Derive Macros: Zero-Cost Serialisation

Annotating a struct or enum with #[derive(Serialize, Deserialize)] instructs the Rust compiler to expand a complete, type-specific implementation of the Serialize and Deserialize traits at compile time. There is no reflection and no runtime type information (RTTI) involved. The generated code calls the serialiser's methods directly, naming each field by its string representation and dispatching to the exact data-model type for each field's Rust type. The result is code that is structurally identical to what a programmer would write by hand, making this the zero-cost abstraction principle from §8.1 applied directly to serialisation: the derive macro adds no overhead whatsoever to the hot path.

Remark (Derive macros as compile-time code generation).

#[derive(Serialize)] is a procedural macro (proc-macro) that runs as part of compilation. It receives the token stream of the annotated type definition and emits a token stream containing a complete, type-specific impl Serialize block. No generic dispatch, no vtable, and no type-erased container is present in the output. The generated impl names every field explicitly, calls the serialiser's typed methods (e.g., serialize_f64, serialize_str), and does so in the same order and with the same calling convention that a hand-written impl would use. All generic overhead is eliminated before the LLVM IR stage.

9.3. Custom Serialisers and Deserialise-with Patterns

serde provides three mechanisms for overriding the default derived behaviour. The #[serde(with = "module")] attribute delegates both serialisation and deserialisation for a field to free functions defined in a named module, which must expose serialize and deserialize functions with the expected signatures. For finer control, #[serde(serialize_with = "fn")] and #[serde(deserialize_with = "fn")] allow per-field overrides of one direction only, leaving the other direction to the derived implementation. For types that cannot be derived at all (for example, a duration value encoded on the wire as an ISO 8601 string such as "PT1H30M"), serde provides the Visitor pattern: the programmer implements a stateful Visitor struct that receives a sequence of events from the format driver (one event per token, analogous to a SAX parser for XML) and accumulates them into the target Rust value. The Visitor approach handles arbitrarily complex wire representations without sacrificing type safety.

9.4. Comparison: C/C++ and Go

C/C++: nlohmann/json uses runtime polymorphism and type erasure: values are stored in a variant container and dispatched at runtime. protobuf requires a separate schema file (.proto) and a dedicated code-generation step before any C++ code can be compiled; the schema and the C++ struct are two artefacts that must be kept in sync by the programmer. Manual struct serialisation with memcpy is entirely unsafe: it relies on platform-specific layout assumptions, breaks on any struct with padding, and is undefined behaviour if alignment is incorrect. None of these approaches provide compile-time type-safe serialisation without either extra tooling or unsafe code.

Go: encoding/json uses the reflect package to introspect field names, types, and struct tags at runtime. No code generation is required, which simplifies the toolchain, but the reflection path is slower than compile-time dispatch, and the mechanism requires that all serialised fields be exported (capitalised). Type safety in Go's JSON layer is enforced at runtime, not compile time: a type mismatch surfaces as an error value or a silent zero-value assignment during json.Unmarshal, not as a compilation failure.

Rust/serde: compile-time code generation via proc-macro derive, no reflection, no schema file required for JSON or TOML, no unsafe code required by the user, and no exported-field constraint. The generated code is fully inlined and optimised by LLVM, resulting in zero overhead on the hot path. Format support is open-ended: any crate that implements Serializer or Deserializer against the serde data model is immediately compatible with all existing types.

9.5. RustCell: serde_json Round-Trip

The demo below shows all three phases of serde in action. #[derive(Serialize, Deserialize)] generates all serialisation and deserialisation code for Point at compile time, with no runtime type lookup. serde_json::to_string maps the struct through the serde data model to a JSON string via the serde_json Serializer. serde_json::from_str runs the inverse path through the Deserializer and reconstructs the struct. The final assert_eq! verifies that the round-trip preserves the value exactly, demonstrating that the derive-generated paths are inverses of one another with no information loss.

use serde::{Deserialize, Serialize};

#[derive(Debug, Serialize, Deserialize, PartialEq)]
struct Point {
  x: f64,
  y: f64,
  label: String,
}

fn main() {
  let p = Point { x: 1.5, y: -2.3, label: String::from("origin") };

  // Struct -> JSON string (zero-copy serialisation path)
  let json = serde_json::to_string(&p).unwrap();
  println!("{json}");

  // JSON string -> Struct (derive-generated Deserialize)
  let p2: Point = serde_json::from_str(&json).unwrap();
  assert_eq!(p, p2);
  println!("round-trip OK: {p2:?}");
}

serde_json round-trip: struct to JSON string and back, zero-copy derive-generated path

10. Rust in Constrained Environments: no_std and Embedded

10.1. The no_std Contract

Adding #![no_std] to a crate removes the implicit link against Rust's standard library. This is the mechanism by which Rust programmes can target bare-metal microcontrollers, operating system kernels, and other environments where no OS is present to provide the services on which std depends.

What is lost: heap allocation (Box, Vec, String, and all types that allocate dynamically), OS abstractions (threads, file I/O, networking), and the std::error::Error trait, which requires allocation to support trait objects. What remains: the full Rust language, the core crate (iterators, Option, Result, all primitive types, the fmt infrastructure), and, optionally, the alloc crate if a global allocator is provided.

Two additional obligations arise for a bare no_std binary. The programmer must provide a #[panic_handler]: a function that takes a &PanicInfo and diverges (-> !), replacing the default panic runtime. If heap allocation is desired, the programmer must also provide a #[global_allocator]: a static value that implements the GlobalAlloc trait, supplying alloc and dealloc for the runtime.

The analogous contract in C bare-metal is to avoid all calls to malloc, free, and the C standard library's I/O functions. The difference is enforcement: in C, nothing prevents an accidental #include <stdio.h> followed by a printf call from compiling silently and then failing at link time or at runtime on a target with no libc. In Rust, #![no_std] is a compile-time directive: any use std::... is a compile error immediately, catching the mistake before it reaches the linker.

Definition (The no_std contract).

A Rust crate annotated with #![no_std] links against core (and, if declared, alloc) but not against std. The crate therefore has no access to OS-backed services: no heap allocator, no thread primitives, no file system, and no network stack, unless these are provided explicitly by the programmer through a #[global_allocator], a platform-specific runtime crate, or direct hardware register access. A no_std binary must additionally define a #[panic_handler]. The boundary is enforced at compile time: any reference to an item in std that does not exist in core or alloc is a type error.

10.2. The alloc Crate

The alloc crate is a subset of std that provides heap-using types (Box, Vec, String, Arc, Rc, BTreeMap, and others) without requiring a full OS. To use it in a no_std crate, the programmer declares extern crate alloc (or exposes it via a Cargo feature) and provides a #[global_allocator] that implements GlobalAlloc. On embedded targets, the allocator is typically backed by a fixed-size static byte array, for example via the embedded-alloc crate, which implements a simple heap within a region of RAM chosen at link time.

cartan-manifolds follows this pattern precisely. Types that require heap allocation, namely Grassmann, Spd, and Corr, are gated behind #[cfg(feature = "alloc")]. When the alloc feature is disabled, those types are not compiled, and the crate remains usable on targets with no heap. When the feature is enabled, the types become available. This allows a single codebase to serve both embedded and hosted environments without any code duplication or conditional branching at runtime.

10.3. libm

In a std-linked binary, functions such as f64::sin, f64::sqrt, and f64::exp delegate to the platform's math library (libm on Linux, compiler-rt intrinsics on LLVM targets). In a no_std environment, no such library is available. The libm crate provides pure-Rust implementations of the C99 floating-point math functions with no OS dependency, making trigonometric and transcendental functions available to no_std crates. cartan-core lists libm as an unconditional dependency and calls its functions (libm::sin, libm::cos, libm::sqrt, and others) wherever float math is required outside of std.

10.4. cartan: Geometry on the Edge of no_std

cartan-core is genuinely no_std-compatible. Its crate root carries #![cfg_attr(not(feature = "std"), no_std)], uses libm for all float math, and gates any heap-using code behind the alloc feature. The pattern #![cfg_attr(not(feature = "std"), no_std)] is the standard idiom for conditional no_std: the crate compiles with std by default for ergonomics and test support, and becomes no_std when the std feature is disabled via cargo build --no-default-features. The current limitation is in cartan-manifolds: it depends on nalgebra with features = ["std"], because nalgebra's RealField trait (required for all matrix operations) is only available under the std feature in its current release. The path to full embedded support would require either a custom matrix type or a no_std-compatible RealField implementation in nalgebra. The architecture is ready; the dependency is not yet.

10.5. RustCell (non-runnable): no_std Structural Example

The snippet below shows the cfg_attr idiom in its minimal form. The crate is no_std when the std feature is disabled. libm::sin replaces the OS math library for the exponential normalisation calculation on the sphere. The #[panic_handler] is the minimal obligation that a no_std binary must fulfil: without it, the crate will not link on a bare-metal target. The Playground cannot execute this snippet because it has no linker configuration for a bare binary; it is included here as a structural illustration of the pattern used in cartan-core.

// This snippet is structural only. A bare no_std binary requires a
// panic_handler and linker configuration that the Playground cannot provide.
//
// The cfg_attr pattern mirrors cartan-core exactly: the crate is no_std
// when the "std" feature is disabled, and std-linked otherwise.
#![cfg_attr(not(feature = "std"), no_std)]
#![no_main]

use core::panic::PanicInfo;

// libm provides sqrt and other float math without linking the OS math library.
fn sphere_exp_norm(v_norm: f64) -> f64 {
  // Taylor expansion for small v_norm; libm::sin for large values.
  if v_norm < 1e-7 {
      1.0 - v_norm * v_norm / 6.0
  } else {
      libm::sin(v_norm) / v_norm
  }
}

#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
  loop {}
}

Not executable

no_std + libm: the cfg_attr pattern used in cartan-core for embedded targets

11. Network I/O and Zero-Cost Buffering

11.1. Packets, Sockets, and the Buffering Problem

A network socket is, in the OS model, a file descriptor backed by a kernel receive queue. When a UDP datagram arrives at the NIC, the driver DMAs the packet bytes into a kernel buffer and appends a metadata record to a per-socket receive queue. When the user process calls recv(), the kernel copies the datagram bytes into the user-supplied buffer, removes the record from the queue, and returns the byte count. If no datagram is available and the socket is non-blocking, the call returns immediately with WouldBlock; the async Tokio layer (§7.2) arranges for epoll to notify the executor when the socket becomes readable again, so no thread is blocked during the wait.

Three structural properties of UDP shape the buffering problem. First, discreteness: recv() delivers exactly one datagram per call. There is no API to read half a datagram or two datagrams atomically; the kernel's receive queue is a sequence of discrete, bounded chunks. Second, unknown size at call time: the application must supply a buffer large enough to hold the largest possible datagram before calling recv(). If the buffer is too small, the OS silently truncates the datagram and the excess bytes are permanently lost. The maximum UDP payload is 65,507 bytes (the IP maximum minus 20-byte IP header minus 8-byte UDP header), though Ethernet-constrained networks cap practical datagrams at 1,472 bytes (1,500-byte MTU minus headers). Third, arrival unpredictability: datagrams arrive at timestamps determined by the network, not by the application. A high-frequency publisher (a market data feed, a sensor bus, a game server) can burst thousands of datagrams in a millisecond. If the application processes each datagram synchronously before calling recv() again, the kernel queue fills and the OS begins silently dropping packets. Packet loss in this scenario is self-inflicted: the bottleneck is the application's processing loop, not the network.

The buffering goal follows directly from these three constraints. Given a sequence of datagrams of variable, unknown size arriving at an unpredictable rate, store them in a fixed, pre-allocated memory region with constant-time insert and remove, without copying bytes more than once, and without any per-datagram heap allocation. A naive implementation, allocating a fresh Vec<u8> per received datagram, satisfies correctness but incurs an allocator round-trip on every receive and every consume. Under load this is measurable: the allocator must maintain free lists, acquire locks, and occasionally trigger the OS to map new pages. For a 10 Gbps network link sustaining 8 million datagrams per second, an allocation per datagram imposes millions of allocator calls per second on what should be a memory-copy pipeline.

The solution is a ring buffer: a statically allocated circular array in which the producer (the receive loop) and the consumer (the processing loop) advance independent pointers around the same contiguous memory, wrapping modulo the capacity. No allocation occurs after startup; the memory is fixed. Insert and remove are constant time. The only remaining question is how to handle the two complications that a naive ring buffer cannot address: variable-length datagrams (a plain ring buffer is typed at a fixed element type, not at variable-length byte sequences) and fallible I/O (the receive may fail, and the buffer must remain consistent). These two complications are precisely what the designs explored in this section resolve.

11.2. Ring Buffers: Structure and Correctness

A ring buffer of capacity over a type is a triple where is the backing array, is the head index (the position of the oldest unconsumed element), and is the current occupancy. The tail index (the first free slot) is . This representation is equivalent to the two-pointer representation, but it resolves the classical ambiguity: in the two-pointer form, the conditions and both arise, the first indicating empty and the second indicating full, which requires sacrificing one slot. In the representation, empty is and full is , with no wasted slot.

Theorem (Ring wrapping invariant).

Let be a ring buffer of capacity . For any , the index of the -th occupied element (counting from the head) is . The buffer is nonempty iff . A push of one element increments by 1 and writes to index (the tail). A pop decrements by 1 and increments by 1 modulo .

Proof. The invariant holds initially (empty buffer, ). Suppose it holds before a push. The new element is written to the tail index , and is incremented. For (old elements), the index is unchanged. For (the new element), the index is as specified. The invariant is preserved. For a pop, advances to and decrements. For , the new index of the -th element is , which equals the old : each element that was at position is now at position under the new head, consistent with the invariant at the new . □

Theorem (FIFO ordering).

A ring buffer with the operations defined above implements a first-in-first-out queue: elements are dequeued in the same order they were enqueued.

Proof. Denote the sequence of enqueue operations as writing values at times . By the wrapping invariant, at any point in time the occupied slots are exactly those at indices for , in the order they were written. The oldest element (smallest enqueue time among those not yet consumed) is always at index , since each pop advances by 1 and decrements by 1 without disturbing any other element. Therefore the element returned by each pop is the least recently enqueued element among those remaining, establishing FIFO ordering. □

A representative implementation represents the buffer as:

pub struct Ring<'a, T> {
    head: u32,
    len:  u32,
    buffer: &'a mut [T],
}

The lifetime ties the buffer borrow to the caller's scope: the ring does not own its backing memory and performs no heap allocation. The backing slice is typically a Box<[T]> or a stack-allocated array allocated once at startup. All subsequent operations are pure arithmetic on head and len with index computations modulo buffer.len().

The following implementation uses an owning Vec for clarity on the Playground (the production version borrows a pre-allocated slice). It demonstrates the wrapping invariant via get(k), forces the array boundary crossing by popping two elements before refilling, and ends with a drain that serves as the coalgebra morphism into a finite stream:

struct Ring<T> {
  buf:  Vec<Option<T>>,
  head: usize,
  len:  usize,
}

impl<T> Ring<T> {
  fn new(capacity: usize) -> Self {
      let buf = (0..capacity).map(|_| None).collect();
      Ring { buf, head: 0, len: 0 }
  }
  fn capacity(&self) -> usize { self.buf.len() }
  fn is_full(&self)  -> bool  { self.len == self.capacity() }
  fn is_empty(&self) -> bool  { self.len == 0 }

  fn push(&mut self, val: T) -> bool {
      if self.is_full() { return false; }
      let tail = (self.head + self.len) % self.capacity();
      self.buf[tail] = Some(val);
      self.len += 1;
      true
  }

  fn pop(&mut self) -> Option<T> {
      if self.is_empty() { return None; }
      let val = self.buf[self.head].take();
      self.head = (self.head + 1) % self.capacity();
      self.len -= 1;
      val
  }

  // k-th element from the head — directly encodes the wrapping invariant
  fn get(&self, k: usize) -> Option<&T> {
      if k >= self.len { return None; }
      self.buf[(self.head + k) % self.capacity()].as_ref()
  }

  // Drain: the unique coalgebra morphism Ring<T> -> Vec<T> (finite stream prefix)
  fn drain(mut self) -> Vec<T> {
      let mut out = Vec::with_capacity(self.len);
      while let Some(v) = self.pop() { out.push(v); }
      out
  }
}

fn main() {
  let mut ring: Ring<i32> = Ring::new(4);

  ring.push(10); ring.push(20); ring.push(30);
  assert_eq!(ring.get(0), Some(&10));
  assert_eq!(ring.get(1), Some(&20));
  assert_eq!(ring.get(2), Some(&30));

  // Pop two: head advances, next pushes wrap past the array boundary
  assert_eq!(ring.pop(), Some(10));
  assert_eq!(ring.pop(), Some(20));
  ring.push(40); ring.push(50); ring.push(60);
  assert!(ring.is_full());

  // Drain proves FIFO holds across the wrap boundary
  let out = ring.drain();
  assert_eq!(out, vec![30, 40, 50, 60]);
  println!("drain (forced wrap): {:?}", out);
  println!("Ring invariants hold.");
}

Ring<T>: wrapping invariant, forced boundary crossing, and FIFO drain as coalgebra morphism

Remark (Ring<T> as a finite coalgebra).

The pair , where if and otherwise (with the ring after advancing the head), is a coalgebra of the functor , the same functor whose terminal coalgebra is the stream type from §6.2. Every -coalgebra admits a unique coalgebra morphism into the terminal coalgebra; here that morphism is the drain function that repeatedly applies until the buffer empties, producing a finite prefix of the stream. The ring is a finite, bounded approximation to the infinite stream coalgebra, and it is exactly this finiteness that makes it useful as a buffer: an infinite stream accumulates without bound, while the ring discards old data when full.

11.3. Views as Linear Borrows

The variable-length receive problem requires a more subtle interface than a plain push. The application must hand the OS a pointer into the ring's free region, let the kernel write directly into that region (eliminating one copy), and only then tell the ring how many bytes were written. If the interface exposed a mutable slice and required the caller to return the written count separately, nothing would prevent the caller from forgetting to commit, leaving the ring's occupancy count out of sync. The view types resolve this by encoding the protocol in the type system.

Definition (View types as capability tokens).

The view types form a capability hierarchy. RingView<'a, T> holds a shared reference (&[T]) into the occupied region: multiple views can coexist, and none can mutate. RingMutView<'a, 'b, T> holds an exclusive reference (&mut T) into a single occupied slot: it grants mutation of an existing element but cannot extend the buffer's length. RingManyMutView<'a, 'b, T> holds an exclusive reference to the largest contiguous free region at the tail: it grants mutation of uninitialised slots and, via set_modified(n), commits a length increment of to the ring.

Theorem (View uniqueness).

At any instant, at most one RingManyMutView exists for a given Ring. Moreover, while a RingManyMutView is live, no other reference (shared or exclusive) to the ring's interior can be created.

Proof. RingManyMutView is constructed by a method taking &mut Ring<T>. By the aliasing rules of §2.1, an exclusive borrow of the ring for the duration of the view's lifetime is required, which the borrow checker enforces. A second &mut Ring<T> borrow cannot be created while the first is live (the diagonal map does not exist for a non-Copy type, as discussed in §6.5). Therefore no second view can be constructed. A shared borrow &Ring<T> would also conflict with the outstanding exclusive borrow, so it too is rejected at compile time. The proof is a direct application of the borrow checker's aliasing invariant: the view is the unique holder of the exclusive reference for as long as it lives. □

The correspondence with linear logic (§2.1) is exact. A RingManyMutView is a linear resource: it must be consumed exactly once (either by calling set_modified to commit, or by dropping it to roll back the pre-allocation). Rust's ownership system enforces this without any runtime check. The region of the backing array that the view covers is, in the language of §6.5, the unique morphism from the ring's current state to the caller's write destination: it cannot be shared because the ring's wrapping position depends on knowing exactly how much was written, a fact that only the caller (the recv() call) determines.

The following example distils the pattern to its essence. push_with accepts a closure over the free tail of a byte ring. The closure receives exclusive access to the write window; on Ok(n) exactly bytes are committed; on Err the ring state is unchanged. The borrow checker enforces that no second reference to the ring's interior exists while the closure holds the slice, making rollback automatic and zero-cost:

struct ByteRing {
  buf:  Vec<u8>,
  head: usize,
  len:  usize,
}

impl ByteRing {
  fn new(capacity: usize) -> Self {
      ByteRing { buf: vec![0u8; capacity], head: 0, len: 0 }
  }
  fn capacity(&self) -> usize { self.buf.len() }

  // Closure owns the write window exclusively. Ok(n) commits n bytes.
  // Err leaves the ring unchanged — no explicit rollback needed.
  fn push_with<E, F>(&mut self, f: F) -> Result<usize, E>
  where
      F: FnOnce(&mut [u8]) -> Result<usize, E>,
  {
      let cap    = self.capacity();
      let tail   = (self.head + self.len) % cap;
      let window = (cap - self.len).min(cap - tail);
      let result = f(&mut self.buf[tail..tail + window]);
      if let Ok(n) = result { self.len += n; }
      result
  }

  fn pop_bytes(&mut self, n: usize) -> Vec<u8> {
      let cap = self.capacity();
      let out = (0..n).map(|i| self.buf[(self.head + i) % cap]).collect();
      self.head = (self.head + n) % cap;
      self.len -= n;
      out
  }
}

fn main() {
  let mut ring = ByteRing::new(16);

  // Successful write: closure fills the slice and commits 5 bytes
  let n = ring.push_with::<&str, _>(|buf| {
      buf[..5].copy_from_slice(b"hello");
      Ok(5)
  }).unwrap();
  println!("committed {} bytes; ring.len = {}", n, ring.len);
  assert_eq!(ring.len, 5);

  // Failed write: Err propagates, ring is untouched
  let err: Result<usize, &str> = ring.push_with(|_buf| Err("WouldBlock"));
  assert!(err.is_err());
  assert_eq!(ring.len, 5); // unchanged — automatic rollback
  println!("after failed write: ring.len = {} (unchanged)", ring.len);

  // Second successful write
  ring.push_with::<&str, _>(|buf| { buf[..5].copy_from_slice(b"world"); Ok(5) }).unwrap();

  let p0 = ring.pop_bytes(5);
  let p1 = ring.pop_bytes(5);
  println!("p0 = {:?}, p1 = {:?}",
           std::str::from_utf8(&p0).unwrap(),
           std::str::from_utf8(&p1).unwrap());
}

ByteRing::push_with: closure-owned write window, commit on Ok, automatic rollback on Err

11.4. The UnorderedBuffer: Phase-Commit as Kleisli Composition

The UnorderedBuffer lifts the ring from a typed element store to a variable-length byte store by maintaining two rings in tandem: a data ring of u8 holding raw datagram bytes, and a metadata ring of packet descriptors each carrying a byte offset, a byte length, and a padding count. The metadata ring is indexed by packet number; the data ring is indexed by byte position. A reader reconstructs datagram by looking up its descriptor in the metadata ring and slicing the data ring at the recorded offset and length.

The receive operation follows a three-phase commit protocol. In the first phase (allocation), the buffer checks whether the data ring's free tail has room for the maximum possible datagram. If not, it pads the remaining tail bytes to force the next allocation to the head, ensuring that all datagram bytes are contiguous (no packet straddles the wrap boundary). In the second phase (receive), the buffer hands a RingManyMutView over the allocated region to the caller via a closure, which calls socket.recv_from(view.data_mut()) and reports the number of bytes actually written via view.set_modified(n). In the third phase (commit), if and only if the receive succeeded, the metadata ring appends a descriptor for the new packet and the data ring advances its tail by bytes.

Theorem (Phase-commit as Kleisli chain).

Let be an error type. Define:

as the allocation step (always succeeds unless the buffer is full),
as the receive step (may fail with an I/O error),
as the metadata commit (infallible once the receive succeeds).

The composed operation in the Kleisli category satisfies: (i) if any step returns , the chain short-circuits with and no subsequent step executes; (ii) the metadata ring is updated if and only if both and succeed; (iii) the observable packet sequence seen by a concurrent reader is unchanged unless the full chain returns .

Proof. By the Kleisli composition law ([eq:kleisli-comp]), , the chain is right-associative: . Property (i) is the monad law for Result: Err(e).and_then(f) = Err(e) for any , so a failure in any step propagates without calling subsequent steps. This is the left-zero law of the Result monad. Property (ii) follows because is only reached if returned (providing the view) and returned (providing the byte count); the metadata ring's is incremented only inside . Property (iii) follows from property (ii): a concurrent reader traverses the metadata ring's occupied slots, whose count changes only at the step. Because is not reached on failure, the reader observes no new packet. No lock is required to enforce this: the sequencing is enforced by the type system via Kleisli composition and by the view uniqueness theorem above, which prevents any reader from holding a reference to the in-progress write region simultaneously. □

The following programme makes the three-phase structure concrete. Each phase is its own function returning Result; the receive method chains them with ?. A simulated WouldBlock short-circuits at phase 2, leaving the committed packet count unchanged, and the programme prints the committed packets in the order they arrived:

#[derive(Debug)]
enum NetError { BufferFull, WouldBlock }

struct PacketBuffer {
  data:      Vec<u8>,
  write_pos: usize,
  packets:   Vec<(usize, usize)>, // (offset, len)
}

impl PacketBuffer {
  fn new(capacity: usize) -> Self {
      PacketBuffer { data: vec![0u8; capacity], write_pos: 0, packets: Vec::new() }
  }

  // Phase 1: reserve space — the only phase that checks buffer capacity
  fn alloc(&self, max_size: usize) -> Result<usize, NetError> {
      if self.write_pos + max_size > self.data.len() {
          return Err(NetError::BufferFull);
      }
      Ok(self.write_pos)
  }

  // Phase 2: write into the reserved region — simulates recv_from
  fn recv_into(&mut self, offset: usize, payload: Option<&[u8]>)
      -> Result<(usize, usize), NetError>
  {
      let bytes = payload.ok_or(NetError::WouldBlock)?;
      self.data[offset..offset + bytes.len()].copy_from_slice(bytes);
      Ok((offset, bytes.len()))
  }

  // Phase 3: commit the descriptor — only reached if both prior phases succeeded
  fn commit(&mut self, offset: usize, len: usize) -> Result<(), NetError> {
      self.write_pos = offset + len;
      self.packets.push((offset, len));
      Ok(())
  }

  // The ? chain: alloc >>= recv_into >>= commit
  fn receive(&mut self, payload: Option<&[u8]>) -> Result<(), NetError> {
      let offset     = self.alloc(1472)?;
      let (off, len) = self.recv_into(offset, payload)?;
      self.commit(off, len)?;
      Ok(())
  }

  fn read(&self, i: usize) -> &str {
      let (off, len) = self.packets[i];
      std::str::from_utf8(&self.data[off..off + len]).unwrap()
  }
}

fn main() {
  let mut buf = PacketBuffer::new(4096);

  buf.receive(Some(b"market-tick-001")).unwrap();
  println!("packet 0: {}", buf.read(0));

  // WouldBlock: ? short-circuits at phase 2, commit is never reached
  let err = buf.receive(None);
  assert!(matches!(err, Err(NetError::WouldBlock)));
  assert_eq!(buf.packets.len(), 1); // unchanged
  println!("after WouldBlock: {} packet(s) committed (unchanged)", buf.packets.len());

  buf.receive(Some(b"market-tick-002")).unwrap();
  buf.receive(Some(b"market-tick-003")).unwrap();
  println!("packet 1: {}", buf.read(1));
  println!("packet 2: {}", buf.read(2));
  println!("Total committed: {} — left-zero law holds.", buf.packets.len());
}

Three-phase commit as a ? chain: alloc >>= recv_into >>= commit, with WouldBlock short-circuit

The ? operator from §6.4 is the concrete syntax for this chain. The closure passed to push_with uses ? to propagate any I/O error from recv_from, and the enclosing function uses ?? (double-propagation through the closure's own Result and the outer function's Result) to surface errors to the caller. The result is that the three-phase protocol is expressed in three lines of application code, the type system guarantees commit-on-success and no-commit-on-failure, and the generated machine code is identical to a hand-written match chain with no allocation and no overhead.

The five abstract structures from §6 are all present and load-bearing in these twenty lines of API: the ring buffer's state is a finite -coalgebra (§6.2), the view types are linear propositions enforced by the affine monoidal structure (§6.5), and the three-phase commit is a Kleisli chain (§6.4) whose short-circuit property is the left-zero law of the Result monad. None of these properties were designed into the API by appeal to category theory; they are consequences of the type system's rules, which the designer followed for independent engineering reasons. The categorical vocabulary makes the correspondence precise enough to prove, rather than merely to observe.

References

National Security Agency / Cybersecurity and Infrastructure Security Agency. The Case for Memory Safe Roadmaps. November 2023.
The Rust Reference. doc.rust-lang.org/reference. Accessed 2026.
Girard, J.-Y. "Linear Logic." Theoretical Computer Science 50(1):1-101, 1987.
Howard, W.A. "The formulae-as-types notion of construction." In To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism. Academic Press, 1980.
GHC LinearTypes extension. GHC User's Guide. ghc.gitlab.haskell.org. Accessed 2026.
Matsakis, N.D. and Turon, A. RFC 2094: Non-Lexical Lifetimes. rust-lang.github.io/rfcs/2094-nll. 2017.
Jung, R., Jourdan, J.-H., Krebbers, R., and Dreyer, D. "RustBelt: Securing the Foundations of the Rust Programming Language." POPL 2018.
Jung, R., Dang, H.-H., Kang, J., and Dreyer, D. "Stacked Borrows: An Aliasing Model for Rust." POPL 2020.
Pierce, B.C. Types and Programming Languages. MIT Press, 2002.
Wadler, P. "Propositions as Types." Communications of the ACM 58(12):75-84, 2015.
Jacobs, B. Introduction to Coalgebra: Towards Mathematics of States and Observation. Cambridge University Press, 2017.
Tokio contributors. Tokio: An asynchronous Rust runtime. tokio.rs. Accessed 2026.
Rust Language Team. Asynchronous Programming in Rust. rust-lang.github.io/async-book. Accessed 2026.
PyO3 contributors. PyO3 documentation. pyo3.rs. Accessed 2026.
maturin contributors. maturin documentation. maturin.rs. Accessed 2026.
The Rust Programming Language Team. The Rustonomicon: The Dark Arts of Unsafe Rust. doc.rust-lang.org/nomicon. Accessed 2026.
Jung, R. MiniRust specification. github.com/RalfJung/minirust. Accessed 2026.
Stroustrup, B. The Design and Evolution of C++. Addison-Wesley, 1994.
serde contributors. serde documentation. serde.rs. Accessed 2026.