(These notes are adapted from a lecture I gave for a student-run class on type theory. See also the pdf version)
The purpose of this text is to provide a high-level overview of the motivations behind and basic theory of a subsumptive subtyping semantics over a simply-typed lambda calculus similar to ML. In the interest of brevity, we will elide a formal discussion of syntax forms, dynamics, etc, and simply discuss what properties this system should have if embedded into ML. As such, you should have some basic knowledge of ML and the formalities of ML-like typesystems.
Disclaimer: The few "proofs" given are not completely rigorous. The biggest elephant in the room is that we have not formally defined the behavior of our system, and instead nebulously pretend that we've grafted things onto OCaml (or SML). The sections marked as "proof" are intended as sketches that may be superimposed onto a formal system with similar rules. Please contact me if you spot an error; a major reason I set out to write this was to confirm to myself that I properly understand these concepts.
Motivation
Typesafe Lists
Consider the function hd : 'a list -> 'a
that
returns the first element of a list. Typically, this function is
only defined on non-empty lists, raising some form of error when
given an empty input. One way around this is to instead use a type
such as hd_opt : 'a list -> 'a option
, which
forces the programmer to handle the empty case explicitly.
However, this is no better than pattern matching on the list
originally, and indeed only serves to delay the boilerplate in
cases in which the list is provably non-empty. For example,
consider the following (contrived) example:
let v : int list = map foo [1,2,3,4] in
match hd_option v with
Some x -> do_something x
| None -> assert false |
It is a feature of ML-inspired languages that, in cases like
this, we generally have no choice but to pattern match
and think about what to do when hd_option v
is
None
. In this case, however, we know that
the value v
is non-empty -- map f
is a
function returning a list of the same length as its input, so
using this function on [1,2,3,4]
gives a list of
length exactly 4. However, this fact isn't actually recorded in
the type of map
anywhere -- it is opaque to
the typesystem. Because of this, the typesystem cannot prove that
v
is non-empty, and so the "safe" thing to do is
pattern match and explicitly mark the extraneous case as
unreachable. As you might expect, this boilerplate becomes
annoying very quickly, and so nearly every standard library for
these languages will provide unsafe function wrappers for this
pattern (Option.valOf
in OCaml, for example). But in
using these functions, we remove the safety net provided by the
compiler and typesystem, and invite runtime exceptions that such
constructs are designed to avoid (see: option types vs pervasive
use of null
in other languages).
Instead, it would be ideal to restrict the type of
hd
to only act on lists that are inhabited, so the
compiler will throw an error if we attempt to blindly take the
first element of a potentially empty list. By lifting the
information about whether a list is empty into the typesystem, we
can then allow the typechecker to construct proofs that a given
invocation of hd
is definitely safe (if we adjust
other functions to match, of course).
Attempt 1: Generalized Algebraic Datatypes
A somewhat commonly-seen ``solution'' to this issue that you might see uses GADTs to attach type information to the particular constructor. For example:
type empty
type nonempty
type ('a, _) list =
list
| Nil : ('a, empty) list -> ('a, nonempty) list | (::) : 'a * ('a, 'b)
Then, we can note that map
does not change the
emptiness of its argument by type-annotating it like so:
val map : ('a -> 'b) -> ('a, 'e) list -> ('b, 'e) list
where, because the output type 'e
must be the same
as the input type 'e
, we know that if the input list
is nonempty
, the output must also be
nonempty
.
This works great if we know that the output emptiness is fixed,
or is the same as the input list. However, we begin to run into
issues where the relationship between the input and output
emptiness is not constant or identity. For example, how would we
encode the type of append
?
val append: ('a, 'e1) list -> ('a, 'e2) list -> ('a, ???) list
In fact, we do know something about the output type in
this case -- the output of append
is
empty
iff both inputs are. However, ML-style
typesystems lack support for type-level lambdas, and so
we have no way to express this type without employing more
trickery.
Even worse, what about functions where the output has no relation to the input?
val filter: ('a -> bool) -> ('a, 'e) list -> ('a, ???) list
Here, we're completely up the creek -- there's no way for us to know whether the filtered list is empty.
One way around this is to use higher-ranked
polymorphism and continuation passing style to
`capture'' the output. By making the type of
filter`
look like
val filter: ('a -> bool) -> ('a, 'e) list ->
list -> 'c) ->
(forall b . ('a, b) 'c
we can force consumers of this function to provide a handler that is able to handle both cases. The extra function argument at the end is the "continuation" that consumes the list output by filter.
This is stylistically jarring, however, and ML-style languages generally don't support or require jumping through some hoops to encode rank N types.
Attempt 2: Sum Types
The fundamental problem with append
and
filter
in the previous example is that, in different
cases, these functions must return different types. This
is generally forbidden1, and so we're
stuck.
On the other hand, we have a mechanism by which we can join two disparate types -- sum types! A generic list is either an empty list, or an inhabited list. So we come up with
type 'a emptylist = nil
and 'a inhablist = (::) of 'a * 'a list
and 'a list = E of 'a emptylist
of 'a inhablist | N
and so we can express functions like filter
by
returning an 'a list
, whereas functions like
hd
must take in 'a inhablist
as
input.
This is a lot of code for a relatively simple idea, though! In addition to the mutually recursive types, here's an example of using this type:
let rec map f L =
match L with
| E nil -> E nil | N (x :: xs) -> N (f x :: map f xs)
Notice the extra uses of E
and N
.
This adds clutter without really adding meaning, and gets
increasingly cumbersome as we write more complex functions. In
addition, you may have noticed that this doesn't even solve
the original problem! When we call
map f [1,2,3,4]
, we get back a value of type
list
, which doesn't tell us that the list is
non-empty!
Really, what we want is a way to express that a non-empty list is the same as a regular list, but with some extra information (particularly, that the list is non-empty).
The Subtyping Relation
Let us introduce a new judgment into our typesystem. Let \(\tau_1 <: \tau_2\) (pronounced "extends", as in "\(\tau_1\) extends \(\tau_2\)") be a judgment claiming pretty much exactly what we said above, that a value of type \(\tau_1\) is equivalent to a value of type \(\tau_2\), with some more specific information attached. This leads to the following typing rule:
\[\frac{\Gamma \vdash e : \tau_1\ \ \ \tau_1 <: \tau_2}{\Gamma \vdash e : \tau_2}\]
This is known as the subsumption rule, where we allow
terms to automatically "upcast" from a subtype to a supertype.
Using our intuition about what <:
means, this
should make sense -- if a \(\tau_1\) is a \(\tau_2\) with some extra information,
then we should be able to recover the \(\tau_2\) by discarding that extra
specificity.
Another important property to maintain with this relation is its transitivity. In fact, if we are only concerned with typing specific expressions, then this property is irrelevant, as we can already show that \(e:\tau_1\), \(\tau_1 <: \tau_2\) and \(\tau_2 <: \tau_3\) implies \(e:\tau_3\).
Exercise: Give a derivation showing this.
When discussing the types themselves, however, it is often convenient to add a rule stating this directly:
\[\frac{\tau_1 <: \tau_2\ \ \ \tau_2 <: \tau_3}{\tau_1 <: \tau_3}\]
To solve our list problem, then, we want 'a nlist
to be a subtype of 'a list
, and re-structure
all our functions accordingly. Before we address this directly,
however, we need to explore how, exactly, this relation should
behave.
Structural Subtyping
In any particular subtyped calculus, we might consider baking
in forms of semantic subtyping, where the relationship
between two given types is based on the fundamental properties of
our primitive types. For example, convention claims that \(\mathbb{N}\), the set of natural
numbers, is a subset of \(\mathbb{R}\), the set of reals. Then,
assuming we have analogous types nat
and
float
in our language (using floating point numbers
to approximate the reals), we might decide that
nat <: float
2.
However, in the absence of such specificity about our basic types, let us instead examine what properties might be derived structurally, examining only the form of the types themselves.
Products
Consider the following two types:
type t1 = { l1 : a type t2 = { l1 : a
; l2 : b ; l2 : b
} ; l3 : c }
(note that we're using labeled products, otherwise known as "records" instead of standard tuples, because it makes the semantics much clearer)
In keeping with our intuition that "the subtype is the same as
the supertype but with more information", we notice that any value
of type t2
has all the fields of t1
with
the same types, but also has some data l3
. We thus
add a "width-subtyping" rule for records:
\[\frac{}{\{ \ell_1 : \tau_1, \dots, \ell_n : \tau_n, \dots \} <: \{ \ell_1 : \tau_1, \dots, \ell_n : \tau_n\}}\]
saying that a record \(\rho_1\) is a subtype of another record \(\rho_2\) when \(\rho_1\) contains all the same fields+types as \(\rho_2\).
Sums
Similarly, suppose we have the following two labeled variants:
type t1 = L1 of a type t2 = L1 of a
of b | L2 of b
| L2 of c | L3
Which direction should the subtyping go? We might claim that a
value of type t2
contains "extra" information
compared to t1
by noting that t2
has an
extra variant, so a t2
is a t1
but
may be different. However, this is wrong, as demonstrated
by the following snippet:
match v with
| L1 x -> do_something x | L2 y -> do_something_else y
If t2 <: t1
truly, then it would be valid for
v
to be of the form L3 z
, as this is
certainly of type t2
and therefore type
t1
. But this would be disastrous, as the match
statement does not specify what to do in the L3
case!
We conclude by noting that the other direction is fine --
match v with
| L1 x -> do_something_1 x
| L2 y -> do_something_2 y | L3 z -> do_something_3 z
If v : t1
, then we know that it is restricted to
be an L1
or L2
variant, so this match
statement can never get stuck. The "extra information" gained from
having a t1
over a t2
, then, can be
viewed as the information that a t1
is not
an L3
variant! So t1 <: t2
.
Another way to view this is that, in many ways, sum types are "flipped around" compared to product types. Categorically speaking, we claim that sums and products are dual to each other (in fact, you'll often find sums referred to as "coproducts", to make this relationship explicit). It makes sense, then, that a "bigger" sum is related to a "smaller" sum in the opposite of the way that bigger and smaller products relate.
Variance
The next topic to consider relates to our type constructors. The exact question we will be addressing is
When is \(\tau_1\ t <: \tau_2\ t\), for a particular type constructor \(t\)?
To answer this question, we typically must examine not only the internals of the constructor \(t\), but also the relationship between \(\tau_1\) and \(\tau_2\). The way that the relationship between \(\tau_1\) and \(\tau_2\) affects the relationship between \(\tau_1\ t\) and \(\tau_2\ t\) is known as variance.
Lists
As we began this discussion by talking about lists, it seems
only right that we consider first the variance properties of the
(idealized) type 'a list
.
The naive answer might be that \(\tau_1\ list <: \tau_2\ list\) when \(\tau_1 <: \tau_2\). This is actually the correct thing to do -- if \(\tau_1 <: \tau_2\), then we can view all elements of a \(\tau_1\ list\) as a \(\tau_2\), thus giving us a \(\tau_2 \ list\), so we might gain the rule
\(\frac{\tau_1 <: \tau_2}{\tau_1\ list <: \tau_2\ list}\)
In fact, we can generalize this to "containers" in general.
Theorem (Covariance of sources) If there exists a function of most general type \(\forall a.(a\ t \rightarrow a)\) that returns a value for at least one input, then \(\tau_1\ t <: \tau_2\ t\) implies \(\tau_1 <: \tau_2\).
Proof.
Let \(f, v, t, \tau_1, \tau_2\) be such that \(f: \forall a.(a\ t \rightarrow a)\), \(v: \tau_1\ t\) and \(\tau_1\ t <: \tau_2\ t\).
By subsumption, \(v: \tau_2\ t\), so \(f\ v : \tau_2\).
Alternatively, \(f\ v : \tau_1\).
Because \(f\) is universally quantified, it cannot rely on any properties of \(\tau_1\) or \(\tau_2\). But then, \(f\ v\) is arbitrary. In particular, any object of type \(\tau_1\) can be written as \(f\ v\) for an appropriate \(v\).
But \(f\ v\) is the same object regardless of whether \(v\) is viewed as a \(\tau_1\ t\) or a \(\tau_2\ t\), so we must be able to view arbitrary objects of type \(\tau_1\) as objects of type \(\tau_2\).
\[\square\]
Functions
Functions are a strange case, because a function arrow really has two type variables, the input and the output. We'll examine these one at a time.
Consider types \(\tau \rightarrow \tau_1\) and \(\tau \rightarrow \tau_2\), where \(\tau\) is fixed. If \(\tau\) is a type inhabited with an object \(\sigma\), then certainly \(\lambda f : f \sigma\) is a function of type \(\forall a . ((\tau \rightarrow a) \rightarrow a)\), so by the above we know that function arrows should be covariant in their second argument. In fact, by a slightly modified argument, we can show this even when \(\tau\) is the empty type \(\bot\) (hint: suppose that there existed some function of type \((\bot \rightarrow \tau_1) \rightarrow \rho\) and consider what behavior this function could possibly have).
The first argument, on the other hand, actually flips this behavior. If we have a function of type \(\tau_1 \rightarrow \tau\), we're eventually going to want to call it on a value of type \(\tau_1\) (formally we might say that we are allowed to eliminate this function by passing it an input). For this to be safe, we must know that any subtype of the type \(\tau_1 \rightarrow \tau\) must be able to safely handle a \(\tau_1\). But this means that subtypes of \(\tau_1 \rightarrow \tau\) must have supertypes of \(\tau_1\) on the left (as otherwise they might perform an operation that is only implemented by the subtype). This gives us the rule
\[\frac{\tau_2 <: \tau_1}{\tau_1 \rightarrow \tau <: \tau_2 \rightarrow \tau}\]
This phenomenon, in which we reverse the relationship between \(\tau_1\) and \(\tau_2\) to \(\tau_1\ t\) and \(\tau_2\ t\) is known as contravariance.
By duality, we might expect to be able to produce a similar theorem to the one given above, perhaps by the existence of a function of type \(\forall a. (a \rightarrow a\ t)\). Unfortunately, this doesn't quite work -- the function \(\lambda x.[x]\) has type \(\forall a.(a \rightarrow a\ list)\), but lists are definitely not contravariant.
Proposition (Contravariance of sinks) If there exists a type \(c\) such that there exists a function of type \(f\) such that \(f : \forall a . (a \rightarrow a\ t \rightarrow c)\), then \(\tau_1\ t <: \tau_2\ t\) implies \(\tau_2 <: \tau_1\).
This is marked as a proposition, rather than a theorem, because it is false as stated. For example, the function \(\lambda \__1.\lambda \__2.()\) has type \(\forall a . \forall b . (a \rightarrow b \rightarrow unit)\), which of course specializes to \(\forall a . (a \rightarrow a\ t \rightarrow c)\) for any \(t\). It may be enough to assert that \(f\) "meaningfully uses" its arguments, but this is difficult to formalize. That said, I'm not actually sure that this is true either -- if you can prove it (even semi-informally in our ML-like world), please let me know!
Mutable References
It turns out that mutable state (important to imperative programming) interacts in an interesting way with subtyping variance. Consider the types \(\tau_1\ ref\) and \(\tau_2\ ref\), referring to the types of mutable reference cells.
Notice that the dereference function !
has type
\(\forall a . (a\ ref
\rightarrow a)\), so by covariance of sources (I promise I
will explain the terms "source" and "sink" shortly), we have that
\(\tau_1\ ref <: \tau_2
\ ref\) implies \(\tau_1 <:
\tau_2\).
However, \(\tau_1 <:
\tau_2\) is not sufficient to conclude that \(\tau_1\ ref
<: \tau_2\ ref\). Let A <: B
be types,
and consider the following code:
let magic (r: A ref) (v: B) : A =
ref) <- v;
(r : B !r
Remember that the rule of subsumption means that annotating
r
as type B ref
is a no-op (just a hint
to the compiler). But !r
(= v
) is
definitely of type A
, as r : A ref
, even
though v
is already of type B
! The only
way this is safe is if B <: A
.
In fact, we need both directions to assert that \(\tau_1\ ref <: \tau_2\ ref\). This can only happen if \(\tau_1\) and \(\tau_2\) are the same type up to some definition of "same" (reordering of labels, etc)3. This property is known as invariance.
One way to think of covariance and contravariance is to discuss
what you can do with an 'a t
. If having an
'a t
possibly gives you an 'a
,
it is called a source, and is therefore covariant by the
proof above. Otherwise, if an 'a t
consumes
an 'a
, then it is called a sink, which is
contravariant using the same reasoning as with function arguments.
However, references are both a source (by dereferencing)
and a sink (by writing a value to the cell). We don't run
into this issue with immutable containers because growing
a container necessarily changes the type of the (resulting)
container, but writing a value to a ref cell doesn't
retroactively change its type.
Bivariance
Thus far, we've seen examples of both covariant and contravariant type constructors, along with a type constructor that is both (invariant). You may then ask whether there exists a type of variance that is neither -- that is, if there exists some \(t\) such that \(\tau_1\ t <: \tau_2\ t\) regardless of the relation between \(\tau_1\) and \(\tau_2\).
In fact, there is! This is the case when an object of type \(\tau\ t\) is entirely unrelated to objects of type \(\tau\). This is most often seen with phantom type variables intended to aid typechecking. Such types are known as bivariant in the given input.
We can show that this is the only possibility for a type to be bivariant by appealing to the source theorem and sink proposition above. Regarding the latter case, if \(\tau_1\ t <: \tau_2\ t\) always, then we can sink any value into a value of type \(\tau\ t\) (for any \(\tau\)!). This can only possibly be safe if sinking a value is a no-op.
Exercise: Provide a similar argument to the above without the source theorem to show that a bivariant type constructor cannot be a source. (Hint: let \(\tau = \bot\))
Bounded Quantification
You may notice that, after all that, we still haven't
solved the problem we initially set out to! Even if we construct a
type 'a nlist
such that
'a nlist <: 'a list
, we still can't tell
the typechecker that map
only returns an
nlist
if given an nlist
!
The solution is to adjust how we approach polymorphism. Previously, ML-style languages supported only unbounded quantification in their types. Polymorphism was all-or-nothing; either you had a specific type or you must be generic over all possible types. This is denoted via the implicit \(\forall\) seen in polymorphic ML types. However, now we have interesting things to say about types and how they relate to each other, which in turn enrichens our type language.
In particular, we can now have types of the form \(\forall\{t : t <: s\}.\tau\) and \(\forall\{t : s <: t\}.\tau\), where we can upper- or lower-bound the range of types we are quantifying over.
This allows us to solve our list problem relatively elegantly
via a combination of subtyping and GADTs by declaring types \(unknown\), \(nonempty <: unknown\) and \(empty <: unknown\). Then
map
can have type \(\forall\{e : e <: unknown\} .
\forall a . ((a \rightarrow b) \rightarrow (a,e) list \rightarrow
(b,e) list)\).
Exercise: Show that you can express the type
of append
under this scheme. You may assume any
reasonable extension of the syntax to, say, provide an upper and
lower bound at the same time, or to allow multiple bounds to apply
at once.
Many modern-day subtyping schemes allow quantification in this
way. For example, in Java, there exists a type
List<? extends T>
for \(\forall\{t:t <:
T\}.t\ list\) and List<? super T>
for
\(\forall\{t:T <: t\}.t\
list\). However, adding expressiveness to a type-level
language often leads to trouble. In fact, typechecking a system
with bounded quantification in both directions is
undecidable.
References
[1] Pierce, Benjamin C. Types and Programming Languages. The MIT Press, 2002.
[2] Reynolds, John C. “Design of the Programming Language Forsythe.” Algol-like Languages, 28 June 1996.
[3] Grigore, Radu. “Java Generics Are Turing Complete.” ACM SIGPLAN Notices, vol. 52, no. 1, 1 Jan. 2017, pp. 73–85.
There are exceptions stemming from type shenanigans (oftentimes via GADTs!), but this isn't one of them.↩︎
This may seem alarming --
nat
is certainly not a subset offloat
on any real computer architecture! We will gloss over this issue with regards to subsumption for now, as it is largely inconsequential without discussion of syntax forms and dynamic semantics. Attempting to address this leads to "coercive" subtyping, where casts must be made explicit.↩︎We should be careful that we don't claim that \(\tau_1 <: \tau_2\) and \(\tau_2 <: \tau_1\) implies \(\tau_1 \cong \tau_2\) (isomorphism) unduly, as isomorphism often works strangely with subsumptive subtyping. For example, \((a,(b,c)) \cong ((a,b),c)\), but in these cases the term \(v.0\) is ambiguous. The difference is technical and difficult to reason about without formally defining semantics.↩︎