(These notes are adapted from a lecture I gave for a studentrun class on type theory. See also the pdf version)
The purpose of this text is to provide a highlevel overview of the motivations behind and basic theory of a subsumptive subtyping semantics over a simplytyped lambda calculus similar to ML. In the interest of brevity, we will elide a formal discussion of syntax forms, dynamics, etc, and simply discuss what properties this system should have if embedded into ML. As such, you should have some basic knowledge of ML and the formalities of MLlike typesystems.
Disclaimer: The few “proofs” given are not completely rigorous. The biggest elephant in the room is that we have not formally defined the behavior of our system, and instead nebulously pretend that we’ve grafted things onto OCaml (or SML). The sections marked as “proof” are intended as sketches that may be superimposed onto a formal system with similar rules. Please contact me if you spot an error; a major reason I set out to write this was to confirm to myself that I properly understand these concepts.
Motivation
Typesafe Lists
Consider the function hd : 'a list > 'a
that returns the first element of a
list. Typically, this function is only defined on nonempty lists, raising some
form of error when given an empty input. One way around this is to instead use
a type such as hd_opt : 'a list > 'a option
, which forces the programmer to
handle the empty case explicitly. However, this is no better than pattern
matching on the list originally, and indeed only serves to delay the
boilerplate in cases in which the list is provably nonempty. For example,
consider the following (contrived) example:
let v : int list = map foo [1,2,3,4] in
match hd_option v with
 Some x > do_something x
 None > assert false
It is a feature of MLinspired languages that, in cases like this, we generally
have no choice but to pattern match and think about what to do when
hd_option v
is None
. In this case, however, we know that the value v
is
nonempty – map f
is
a function returning a list of the same length as its input, so using this
function on [1,2,3,4]
gives a list of length exactly 4. However, this fact
isn’t actually recorded in the type of map
anywhere – it is opaque to the
typesystem. Because of this, the typesystem cannot prove that v
is nonempty,
and so the “safe” thing to do is pattern match and explicitly mark the
extraneous case as unreachable. As you might expect, this boilerplate becomes
annoying very quickly, and so nearly every standard library for these languages
will provide unsafe function wrappers for this pattern (Option.valOf
in
OCaml, for example). But in using these functions, we remove the safety net
provided by the compiler and typesystem, and invite runtime exceptions that
such constructs are designed to avoid (see: option types vs pervasive use of
null
in other languages).
Instead, it would be ideal to restrict the type of hd
to only act on lists
that are inhabited, so the compiler will throw an error if we attempt to
blindly take the first element of a potentially empty list. By lifting the
information about whether a list is empty into the typesystem, we can then
allow the typechecker to construct proofs that a given invocation of hd
is
definitely safe (if we adjust other functions to match, of course).
Attempt 1: Generalized Algebraic Datatypes
A somewhat commonlyseen ``solution’’ to this issue that you might see uses GADTs to attach type information to the particular constructor. For example:
type empty
type nonempty
type ('a, _) list =
 Nil : ('a, empty) list
 (::) : 'a * ('a, 'b) list > ('a, nonempty) list
Then, we can note that map
does not change the emptiness of its argument by
typeannotating it like so:
val map : ('a > 'b) > ('a, 'e) list > ('b, 'e) list
where, because the output type 'e
must be the same as the input type 'e
, we
know that if the input list is nonempty
, the output must also be nonempty
.
This works great if we know that the output emptiness is fixed, or is the same
as the input list. However, we begin to run into issues where the relationship
between the input and output emptiness is not constant or identity. For
example, how would we encode the type of append
?
val append: ('a, 'e1) list > ('a, 'e2) list > ('a, ???) list
In fact, we do know something about the output type in this case – the
output of append
is empty
iff both inputs are. However, MLstyle
typesystems lack support for typelevel lambdas, and so we have no way to
express this type without employing more trickery.
Even worse, what about functions where the output has no relation to the input?
val filter: ('a > bool) > ('a, 'e) list > ('a, ???) list
Here, we’re completely up the creek – there’s no way for us to know whether the filtered list is empty.
One way around this is to use
higherranked polymorphism and
continuation passing style to ``capture’’ the output. By making the type of
filter
look like
val filter: ('a > bool) > ('a, 'e) list >
(forall b . ('a, b) list > 'c) >
'c
we can force consumers of this function to provide a handler that is able to handle both cases. The extra function argument at the end is the “continuation” that consumes the list output by filter.
This is stylistically jarring, however, and MLstyle languages generally don’t support or require jumping through some hoops to encode rank N types.
Attempt 2: Sum Types
The fundamental problem with append
and filter
in the previous example is
that, in different cases, these functions must return different types. This
is generally forbidden^{1}, and so we’re stuck.
On the other hand, we have a mechanism by which we can join two disparate types – sum types! A generic list is either an empty list, or an inhabited list. So we come up with
type 'a emptylist = nil
and 'a inhablist = (::) of 'a * 'a list
and 'a list = E of 'a emptylist
 N of 'a inhablist
and so we can express functions like filter
by returning an 'a list
,
whereas functions like hd
must take in 'a inhablist
as input.
This is a lot of code for a relatively simple idea, though! In addition to the mutually recursive types, here’s an example of using this type:
let rec map f L =
match L with
 E nil > E nil
 N (x :: xs) > N (f x :: map f xs)
Notice the extra uses of E
and N
. This adds clutter without really adding
meaning, and gets increasingly cumbersome as we write more complex functions.
In addition, you may have noticed that this doesn’t even solve the original
problem! When we call map f [1,2,3,4]
, we get back a value of type list
,
which doesn’t tell us that the list is nonempty!
Really, what we want is a way to express that a nonempty list is the same as a regular list, but with some extra information (particularly, that the list is nonempty).
The Subtyping Relation
Let us introduce a new judgment into our typesystem. Let \(\tau_1 <: \tau_2\) (pronounced “extends”, as in “\(\tau_1\) extends \(\tau_2\)”) be a judgment claiming pretty much exactly what we said above, that a value of type \(\tau_1\) is equivalent to a value of type \(\tau_2\), with some more specific information attached. This leads to the following typing rule:
\[\frac{\Gamma \vdash e : \tau_1\ \ \ \tau_1 <: \tau_2}{\Gamma \vdash e : \tau_2}\]This is known as the subsumption rule, where we allow terms to automatically
“upcast” from a subtype to a supertype. Using our intuition about what <:
means, this should make sense – if a \(\tau_1\) is a \(\tau_2\) with some extra
information, then we should be able to recover the \(\tau_2\) by discarding that
extra specificity.
Another important property to maintain with this relation is its transitivity. In fact, if we are only concerned with typing specific expressions, then this property is irrelevant, as we can already show that \(e:\tau_1\), \(\tau_1 <: \tau_2\) and \(\tau_2 <: \tau_3\) implies \(e:\tau_3\).
Exercise: Give a derivation showing this.
When discussing the types themselves, however, it is often convenient to add a rule stating this directly:
\[\frac{\tau_1 <: \tau_2\ \ \ \tau_2 <: \tau_3}{\tau_1 <: \tau_3}\]To solve our list problem, then, we want 'a nlist
to be a subtype of
'a list
, and restructure all our functions accordingly. Before we address
this directly, however, we need to explore how, exactly, this relation should
behave.
Structural Subtyping
In any particular subtyped calculus, we might consider baking in forms of
semantic subtyping, where the relationship between two given types is based
on the fundamental properties of our primitive types. For example, convention
claims that \(\mathbb{N}\), the set of natural numbers, is a subset of
\(\mathbb{R}\), the set of reals. Then, assuming we have analogous types nat
and float
in our language (using floating point numbers to approximate the
reals), we might decide that nat <: float
^{2}.
However, in the absence of such specificity about our basic types, let us instead examine what properties might be derived structurally, examining only the form of the types themselves.
Products
Consider the following two types:
type t1 = { l1 : a type t2 = { l1 : a
; l2 : b ; l2 : b
} ; l3 : c
}
(note that we’re using labeled products, otherwise known as “records” instead of standard tuples, because it makes the semantics much clearer)
In keeping with our intuition that “the subtype is the same as the supertype
but with more information”, we notice that any value of type t2
has all the
fields of t1
with the same types, but also has some data l3
. We thus add
a “widthsubtyping” rule for records:
saying that a record \(\rho_1\) is a subtype of another record \(\rho_2\) when \(\rho_1\) contains all the same fields+types as \(\rho_2\).
Sums
Similarly, suppose we have the following two labeled variants:
type t1 = L1 of a type t2 = L1 of a
 L2 of b  L2 of b
 L3 of c
Which direction should the subtyping go? We might claim that a value of type
t2
contains “extra” information compared to t1
by noting that t2
has an
extra variant, so a t2
is a t1
but may be different. However, this is
wrong, as demonstrated by the following snippet:
match v with
 L1 x > do_something x
 L2 y > do_something_else y
If t2 <: t1
truly, then it would be valid for v
to be of the form L3 z
,
as this is certainly of type t2
and therefore type t1
. But this would be
disastrous, as the match statement does not specify what to do in the L3
case!
We conclude by noting that the other direction is fine –
match v with
 L1 x > do_something_1 x
 L2 y > do_something_2 y
 L3 z > do_something_3 z
If v : t1
, then we know that it is restricted to be an L1
or L2
variant,
so this match statement can never get stuck. The “extra information” gained
from having a t1
over a t2
, then, can be viewed as the information that
a t1
is not an L3
variant! So t1 <: t2
.
Another way to view this is that, in many ways, sum types are “flipped around” compared to product types. Categorically speaking, we claim that sums and products are dual to each other (in fact, you’ll often find sums referred to as “coproducts”, to make this relationship explicit). It makes sense, then, that a “bigger” sum is related to a “smaller” sum in the opposite of the way that bigger and smaller products relate.
Variance
The next topic to consider relates to our type constructors. The exact question we will be addressing is
When is \(\tau_1\ t <: \tau_2\ t\), for a particular type constructor \(t\)?
To answer this question, we typically must examine not only the internals of the constructor \(t\), but also the relationship between \(\tau_1\) and \(\tau_2\). The way that the relationship between \(\tau_1\) and \(\tau_2\) affects the relationship between \(\tau_1\ t\) and \(\tau_2\ t\) is known as variance.
Lists
As we began this discussion by talking about lists, it seems only right that we
consider first the variance properties of the (idealized) type 'a list
.
The naive answer might be that \(\tau_1\ list <: \tau_2\ list\) when \(\tau_1 <: \tau_2\). This is actually the correct thing to do – if \(\tau_1 <: \tau_2\), then we can view all elements of a \(\tau_1\ list\) as a \(\tau_2\), thus giving us a \(\tau_2 \ list\), so we might gain the rule
\[\frac{\tau_1 <: \tau_2}{\tau_1\ list <: \tau_2\ list}\]In fact, we can generalize this to “containers” in general.
Theorem (Covariance of sources) If there exists a function of most general type \(\forall a.(a\ t \rightarrow a)\) that returns a value for at least one input, then \(\tau_1\ t <: \tau_2\ t\) implies \(\tau_1 <: \tau_2\).
Proof.
Let \(f, v, t, \tau_1, \tau_2\) be such that \(f: \forall a.(a\ t \rightarrow a)\), \(v: \tau_1\ t\) and \(\tau_1\ t <: \tau_2\ t\).
By subsumption, \(v: \tau_2\ t\), so \(f\ v : \tau_2\).
Alternatively, \(f\ v : \tau_1\).
Because \(f\) is universally quantified, it cannot rely on any properties of \(\tau_1\) or \(\tau_2\). But then, \(f\ v\) is arbitrary. In particular, any object of type \(\tau_1\) can be written as \(f\ v\) for an appropriate \(v\).
But \(f\ v\) is the same object regardless of whether \(v\) is viewed as a \(\tau_1\ t\) or a \(\tau_2\ t\), so we must be able to view arbitrary objects of type \(\tau_1\) as objects of type \(\tau_2\).
\[\square\]Functions
Functions are a strange case, because a function arrow really has two type variables, the input and the output. We’ll examine these one at a time.
Consider types \(\tau \rightarrow \tau_1\) and \(\tau \rightarrow \tau_2\), where \(\tau\) is fixed. If \(\tau\) is a type inhabited with an object \(\sigma\), then certainly \(\lambda f : f \sigma\) is a function of type \(\forall a . ((\tau \rightarrow a) \rightarrow a)\), so by the above we know that function arrows should be covariant in their second argument. In fact, by a slightly modified argument, we can show this even when \(\tau\) is the empty type \(\bot\) (hint: suppose that there existed some function of type \((\bot \rightarrow \tau_1) \rightarrow \rho\) and consider what behavior this function could possibly have).
The first argument, on the other hand, actually flips this behavior. If we have a function of type \(\tau_1 \rightarrow \tau\), we’re eventually going to want to call it on a value of type \(\tau_1\) (formally we might say that we are allowed to eliminate this function by passing it an input). For this to be safe, we must know that any subtype of the type \(\tau_1 \rightarrow \tau\) must be able to safely handle a \(\tau_1\). But this means that subtypes of \(\tau_1 \rightarrow \tau\) must have supertypes of \(\tau_1\) on the left (as otherwise they might perform an operation that is only implemented by the subtype). This gives us the rule
\[\frac{\tau_2 <: \tau_1}{\tau_1 \rightarrow \tau <: \tau_2 \rightarrow \tau}\]This phenomenon, in which we reverse the relationship between \(\tau_1\) and \(\tau_2\) to \(\tau_1\ t\) and \(\tau_2\ t\) is known as contravariance.
By duality, we might expect to be able to produce a similar theorem to the one given above, perhaps by the existence of a function of type \(\forall a. (a \rightarrow a\ t)\). Unfortunately, this doesn’t quite work – the function \(\lambda x.[x]\) has type \(\forall a.(a \rightarrow a\ list)\), but lists are definitely not contravariant.
Proposition (Contravariance of sinks) If there exists a type \(c\) such that there exists a function of type \(f\) such that \(f : \forall a . (a \rightarrow a\ t \rightarrow c)\), then \(\tau_1\ t <: \tau_2\ t\) implies \(\tau_2 <: \tau_1\).
This is marked as a proposition, rather than a theorem, because it is false as stated. For example, the function \(\lambda \__1.\lambda \__2.()\) has type \(\forall a . \forall b . (a \rightarrow b \rightarrow unit)\), which of course specializes to \(\forall a . (a \rightarrow a\ t \rightarrow c)\) for any \(t\). It may be enough to assert that \(f\) “meaningfully uses” its arguments, but this is difficult to formalize. That said, I’m not actually sure that this is true either – if you can prove it (even semiinformally in our MLlike world), please let me know!
Mutable References
It turns out that mutable state (important to imperative programming) interacts in an interesting way with subtyping variance. Consider the types \(\tau_1\ ref\) and \(\tau_2\ ref\), referring to the types of mutable reference cells.
Notice that the dereference function !
has type \(\forall a . (a\ ref
\rightarrow a)\), so by covariance of sources (I promise I will explain the
terms “source” and “sink” shortly), we have that \(\tau_1\ ref <: \tau_2
\ ref\) implies \(\tau_1 <: \tau_2\).
However, \(\tau_1 <: \tau_2\) is not sufficient to conclude that \(\tau_1\ ref
<: \tau_2\ ref\). Let A <: B
be types, and consider the following code:
let magic (r: A ref) (v: B) : A =
(r : B ref) < v;
!r
Remember that the rule of subsumption means that annotating r
as type B ref
is a noop (just a hint to the compiler). But !r
(= v
) is definitely of
type A
, as r : A ref
, even though v
is already of type B
! The only way
this is safe is if B <: A
.
In fact, we need both directions to assert that \(\tau_1\ ref <: \tau_2\ ref\). This can only happen if \(\tau_1\) and \(\tau_2\) are the same type up to some definition of “same” (reordering of labels, etc)^{3}. This property is known as invariance.
One way to think of covariance and contravariance is to discuss what you can
do with an 'a t
. If having an 'a t
possibly gives you an 'a
, it
is called a source, and is therefore covariant by the proof above. Otherwise,
if an 'a t
consumes an 'a
, then it is called a sink, which is
contravariant using the same reasoning as with function arguments. However,
references are both a source (by dereferencing) and a sink (by writing a
value to the cell). We don’t run into this issue with immutable containers
because growing a container necessarily changes the type of the (resulting)
container, but writing a value to a ref cell doesn’t retroactively change its
type.
Bivariance
Thus far, we’ve seen examples of both covariant and contravariant type constructors, along with a type constructor that is both (invariant). You may then ask whether there exists a type of variance that is neither – that is, if there exists some \(t\) such that \(\tau_1\ t <: \tau_2\ t\) regardless of the relation between \(\tau_1\) and \(\tau_2\).
In fact, there is! This is the case when an object of type \(\tau\ t\) is entirely unrelated to objects of type \(\tau\). This is most often seen with phantom type variables intended to aid typechecking. Such types are known as bivariant in the given input.
We can show that this is the only possibility for a type to be bivariant by appealing to the source theorem and sink proposition above. Regarding the latter case, if \(\tau_1\ t <: \tau_2\ t\) always, then we can sink any value into a value of type \(\tau\ t\) (for any \(\tau\)!). This can only possibly be safe if sinking a value is a noop.
Exercise: Provide a similar argument to the above without the source theorem to show that a bivariant type constructor cannot be a source. (Hint: let \(\tau = \bot\))
Bounded Quantification
You may notice that, after all that, we still haven’t solved the problem we
initially set out to! Even if we construct a type 'a nlist
such that 'a
nlist <: 'a list
, we still can’t tell the typechecker that map
only
returns an nlist
if given an nlist
!
The solution is to adjust how we approach polymorphism. Previously, MLstyle languages supported only unbounded quantification in their types. Polymorphism was allornothing; either you had a specific type or you must be generic over all possible types. This is denoted via the implicit \(\forall\) seen in polymorphic ML types. However, now we have interesting things to say about types and how they relate to each other, which in turn enrichens our type language.
In particular, we can now have types of the form \(\forall\{t : t <: s\}.\tau\) and \(\forall\{t : s <: t\}.\tau\), where we can upper or lowerbound the range of types we are quantifying over.
This allows us to solve our list problem relatively elegantly via a combination
of subtyping and GADTs by declaring types \(unknown\), \(nonempty <: unknown\) and
\(empty <: unknown\). Then map
can have type \(\forall\{e : e <: unknown\} .
\forall a . ((a \rightarrow b) \rightarrow (a,e) list \rightarrow (b,e) list)\).
Exercise: Show that you can express the type of append
under this scheme.
You may assume any reasonable extension of the syntax to, say, provide an upper
and lower bound at the same time, or to allow multiple bounds to apply at once.
Many modernday subtyping schemes allow quantification in this way. For
example, in Java, there exists a type List<? extends T>
for \(\forall\{t:t <:
T\}.t\ list\) and List<? super T>
for \(\forall\{t:T <: t\}.t\ list\). However,
adding expressiveness to a typelevel language often leads to trouble. In fact,
typechecking a system with bounded quantification in both directions is
undecidable.
References
[1] Pierce, Benjamin C. Types and Programming Languages. The MIT Press, 2002.
[2] Reynolds, John C. “Design of the Programming Language Forsythe.” Algollike Languages, 28 June 1996.
[3] Grigore, Radu. “Java Generics Are Turing Complete.” ACM SIGPLAN Notices, vol. 52, no. 1, 1 Jan. 2017, pp. 73–85.

There are exceptions stemming from type shenanigans (oftentimes via GADTs!), but this isn’t one of them. ↩

This may seem alarming –
nat
is certainly not a subset offloat
on any real computer architecture! We will gloss over this issue with regards to subsumption for now, as it is largely inconsequential without discussion of syntax forms and dynamic semantics. Attempting to address this leads to “coercive” subtyping, where casts must be made explicit. ↩ 
We should be careful that we don’t claim that \(\tau_1 <: \tau_2\) and \(\tau_2 <: \tau_1\) implies \(\tau_1 \cong \tau_2\) (isomorphism) unduly, as isomorphism often works strangely with subsumptive subtyping. For example, \((a,(b,c)) \cong ((a,b),c)\), but in these cases the term \(v.0\) is ambiguous. The difference is technical and difficult to reason about without formally defining semantics. ↩