The Great Theorem Prover Challenge I

I was having a discussion with a coworker over lunch the other day about code reasoning and informal correctness (things like “it feels correct” and “it passes a very extensive array of tests”). At some point, the conversation strayed to the relative merits of imperative vs functional paradigms, and which one “feels” easier, as it always does. As a fairly loud proponent of functional-styled programming, I, of course, made the standard arguments about the heap and global state, etc, and was rebuffed by “the overhead of juggling abstractions”.

Now, I won’t get into why I think the response is bullshit (tl;dr if you have to “juggle” an abstraction, it’s not really a very good abstraction), but it did leave me somewhat at a loss as to how we were defining “feeling” easy. It’s a very fuzzy, subjective term, and not really one I’m prepared to go into at any depth.

This brings us to the subject of formal verification. One argument for powerful and expressive static typesystems is to give strong compile-time guarantees, which can aid in correctness. For example, in Python, if we create a function that operates primarily on integers, it may not work correctly or even just blow up if given, say, a string. This introduces a vector for bugs if we don’t properly check the return type of some other function, and can cause crashes or (worse) subtly wrong behavior. However, with a static typesystem, we can know that the function is never called on anything but a number – because otherwise the function wouldn’t typecheck, and the code wouldn’t compile at all!

Formal verification is taking this idea to its logical conclusion – what if we encode all correctness properties inherent to the code in some way that can be checked alongisde the code at compile time? Well, the first thing you’ll find is that you start running into Rice’s Theorem (or more generally, the halting problem), but you also find that there are a surprising number of things that can be done and ways to do them.

A few months ago, I found Hillel Wayne’s Great Theorem Prover Showdown on reddit, and found the included discussion quite interesting. After that argument with my friend, I decided that I’d also bite the bullet and try it for myself.

“After all,” I thought, “it’s functional programming. We do proofs like that for the entire semester in 15-150; how hard can it be to write an equivalent in a theorem proving language?”

Now, before I go any further, let me outline my previous personal experience with formal verification:

Note that none of the above even involved touching a theorem prover (the closest I had gotten was brute-forcing some logic problems using the Z3 Haskell bindings). Not that I’m about to let that stop me. So let’s dive into Leftpad, the first challenge. I chose to approach this with Liquid Haskell because it seemed to be the simpler of the two (Idris being the other option) to just pick up and get running.

The Challenge

You can follow my progress in this repository.

Our goal is to replicate the classic leftPad function that pads a string to a desired length.

The first thing we need to do is write the code itself. Most people attempting this challenge did some variant on a replicate solution, but I opted for a more naturally-recursive version (this ended up being a good decision).

leftPad :: Int -> a -> [a] -> [a]
leftPad n x xs = leftPad' k x xs
  where k = max 0 (n - size xs)

leftPad' 0 _ xs = xs
leftPad' n x xs = x : leftPad' (n-1) x xs

Now, let’s get onto actually proving this “correct”. Everyone with a hat in this game (including me!) will talk your ears off about how “correctness” doesn’t really mean anything without a “specification” to guide it. In this case, we’ve been helpfully provided an informal spec, but not really anything by way of a formalized contract to follow. That’s fine, part of our job is to turn informal specs into formal ones.

The specification for LeftPad, as given is:

Takes a padding character, a string, and a total length, returns the string padded to that length with that character. If length is less than the length of the string, does nothing.

Ooookay. That’s a little vague. Breaking it down, we start by looking at the inputs – “given a padding character, a string, and a total length”. This gives us our input types:

A First Try

{-@ leftPad :: Int -> a -> [a] -> [a]
  @-}

It’s a start, but it’s not particularly useful. To really leverage the power of LH, we need to actually use our types to prove something! The point of leftPad is to ensure that a string is the correct length, so let’s try to prove that:

{-@ reflect leftPad @-}
{-@ leftPad :: n:Int -> x:a -> xs:[a] ->
      result:{ [a] |
        size result = max n (size xs)
      }
  @-}

What this means is that the result of leftPad will be of length n or the same length of the original string, whichever is larger. We also reflect our function, so it becomes a proper term in the logic.

According to some people, this is enough. The function is “obviously” correct. Let’s test that assumption – does Liquid Haskell think that we’re done?

Error: Liquid Type Mismatch

33 | leftPad n x xs = leftPad' k x xs
                       ^^^^^^^^^^^^^^^

    Inferred type
      VV : {v : [a] | [SNIPPED: big mess] }

    not a subtype of Required type
      VV : {VV : [a] | Main.size VV == Main.max n (Main.size xs)}

    In Context
      xs : {v : [a] | Main.size v >= 0 && len v >= 0}

      n : {n : Int | n >= 0}

      x : a

Error: Liquid Type Mismatch

44 | leftPad' n x xs = x : leftPad' (n-1) x xs
                                     ^^^

    Inferred type
      VV : {v : Int | v == ?b - (1 : int)}

    not a subtype of Required type
      VV : {VV : Int | VV < ?b && VV >= 0}

    In Context
      ?b : Int

(if you just tried this for yourself, you may need to also define size and max and reflect them into the logic)

Okay, there are two errors here. The first is that big mess of types on the function we actuallly annotated. I’ll get there in a bit. The second error, however, seems to be saying that our types are wrong for leftPad' as well! What gives?

It turns out that LH is actually smart enough to recognize that the function won’t terminate if given an input n < 0. Because we like our programs to be decidable, this lets LH decide that the type of the first argument to leftPad' should be positive, as seen by the VV >= 0. In fact, a similar argument can be used to show an upper bound as well! Notice that our only base case is n = 0. This means that, in order to make progress, we should have n decrease every time – thus VV < ?b. This seems like a less hard rule than the previous, but I’m not really familiar enough to say what would happen if we had multiple axes (or even just a nonstandard termination condition). However, we have no conditions on the original input, and there’s nothing stopping us from just passing in a negative n. Thus the type mismatch.

The crux of the issue is that LH can’t (or won’t try to) understand what leftPad' is doing, or what theorems it upholds. We can fix that by adding some manual annotations of our own:

{-@ reflect leftPad' @-}
{-@ leftPad' :: n:Nat -> x:a -> xs:[a] ->
      result:{[a] |
        size result = n + size xs
      }
  @-}

The only new thing is Nat, which means n:{Int | n >= 0}1.

We run it through liquid and… it’s safe! Wait, what?

This blew my mind the first time I saw this. The only effort really expended by me was to tell Liquid Haskell what my functions should do, and it was able to tell me with no2 other information!

The key here is induction, the idea that properties hold over certain operations. If that seems a little abstract, let’s walk it through.

First, let’s make sure that the function is even well-formed. We have

leftPad n x xs = leftPad' k x xs
  where k = max 0 (n - size xs)

max probably has a type like x:Int -> y:Int -> r:{Int | r >= x && r >= y} 3. This means that max 0 i is a subtype of r:{Int | r >= 0} automatically… which means that k must have type Nat, and is thus a valid input to leftPad'! x and xs have no special conditions, so those just pass through.

Onto leftPad'. We know that, given valid inputs, our result must satisfy the type r:{[a] | size r = n + size xs}. For our base case, we find that

leftPad' 0 x xs = xs

so r = xs in this case. We also have n = 0, so we want to show that

size xs = 0 + size xs

which is a no-brainer.

Next, our inductive step.

leftPad' n x xs = x : leftPad' (n-1) x xs

For this to be well-typed, we need to know a few things, namely that n-1 is a Nat, or that n-1 >= 0. Luckily, we already know that n >= 0 because we’re given n:Nat as a precondition. But wait, we also know that n != 0, because we pattern matched on it earlier! Thus, n > 0 for sure, so n - 1 >= 0 is fine.

Next, we can use our “inductive hypothesis” – we can assume that an recursive uses of leftPad' are well-typed. That means, in this case, that we know that leftPad' (n-1) x xs returns something of type r:{[a] | size r = n-1 + size xs}. We want to exhibit a term of type r:{[a] | size r = n + size xs}. Well, by the definition of size, x:l has type r:{[a] | size r = 1 + size l}, so our returned expression x : leftPad' (n-1) x xs should have type r:{[a] | size r = 1 + size (leftPad' (n-1) x xs) }… which is r:{[a] | size r = 1 + (n-1) + size xs}, or exactly what we wanted.

All of this reasoning, by the way, was decided by LH with no prompting from me. The only thing I did was provide the code and the types, the typechecking happened automagically. The phrase “obviously correct” has a bit more weight after that.

Not so Obvious

You may notice that we haven’t actually fulfilled the spec given by Hillel. For example, a function that took a random n elements from the input would also fulfill the spec we’ve given so far. We need to also encode that the original string is padded with the pad character. The way this is formalized is as follows:

Given a list xs, fill element x and a length n, for any index i,

  • If i < n - size xs, the list element should be x.
  • Otherwise, the list element should be the corresponding element of xs

We can represent this in Haskell by introducing a separate predicate as follows:

{-@ reflect leftPadElt @-}
{-@ leftPadElt :: n:Int -> x:a -> xs:[a] -> i:{Nat | i < n} -> a @-}
leftPadElt :: Int -> a -> [a] -> Int -> a
leftPad Elt n x xs i
  | i < k = x
  | otherwise = x !! (i-k)
  where k = max 0 (n - size xs)

Because LH doesn’t have “forall” quantification (this leads to a nasty thing called “the halting problem”), we can’t really encode something like

“For all i from 0 to n, leftPad n x xs !! i == leftPadElt n x xs i

into the type of leftPad itself. Instead, we need to show this by demonstrating a term of type

n:Int -> x:a -> xs:[a] -> i:{Nat | i < n} -> {leftPad n x xs !! i == leftPadElt n x xs i}

So, how do we do that? Well, any proof with no actually associated “data” (like the list elements) can be seen as having a refinement on the type (), a term that fulfills all the conditions we need. If that’s a little hard to follow, don’t worry – it will (hopefully!) make more sense in a moment.

There are two parts to the elements definition, one for the first k elements and one for the last size xs elements, so it makes sense to split our “proof” into two parts. Let’s take a look at the first part.

{-@ thmLeftPadA :: n:Nat -> x:a -> xs:[a] -> i:{Nat | i < n} ->
      { leftPad' n x xs !! i == x }
  @-}

Notice that I’m using leftPad' here instead of leftPad. That’s because we’ve already shown (defined) leftPad to be a special case of leftPad', so it makes sense to prove something about the easier function to reason about.

So what would a term like this look like? Well, we can start with something simple: by definition, leftPad' n x xs = x:[something]4. But wait, that means leftPad' n x xs !! 0 == x always… so the result of thmLeftPadA n xs 0 already has the right type.

thmLeftPadA :: Int -> a -> [a] -> Int -> ()
thmLeftPadA n x xs 0 = ()

What about the other case? Same as any other proof by induction!

thmLeftPadA :: Int -> a -> [a] -> Int -> ()
thmLeftPadA n x xs 0 = ()
thmLeftPadA n x xs i = thmLeftPadA (n-1) x xs (i-1)

By the same reasoning as before, we know that i > 0. We actually know n > 0 as well, because the type i:{Nat | i < 0} is uninhabited. Thus the recursive call to thmLeftPadA is well-founded. This gives us a term of type {leftPad' (n-1) x xs !! (i-1) == x}. Again by definition, leftPad' n x xs is equal to x:leftPad' (n-1) x xs and (x:xs) !! i is xs !! i-1, which means that this is, in fact, equivalent to a term of the type we want!

The second part looks like this (the derivation is left to the reader):

{-@ thmLeftPadB ::
      n:Nat -> x:a -> xs:[a] ->
      i:{Nat | i >= n && i < n + size xs} ->
        { leftPad' n x xs !! i == xs !! (i-n) }
  @-}
thmLeftPadB :: Int -> a -> [a] -> Int -> ()
thmLeftPadB 0 x xs i = ()
thmLeftPadB n x xs i = thmLeftPadB (n-1) x xs (i-1)

All that’s left is to string these two parts together into our final proof:

{-@ thmLeftPad :: n:Int -> x:a -> xs:[a] -> i:{Nat | i < n} ->
      { leftPad n x xs !! i == leftPadElt n x xs i }
  @-}
thmLeftPad :: Int -> a -> [a] -> Int -> ()
thmLeftPad n x xs i
  | i < k = thmLeftPadA k x xs i
  | otherwise = thmLeftPadB k x xs i
  where k = max 0 (n - size xs)

And there you have it! We have a formally verified leftPad.

  1. I’m a computer scientist. Of course 0 is a natural number. 

  2. Okay, with a little extra. I did have to define size and max

  3. Because I just let LH infer it for me, I don’t actually know for sure. 

  4. Technically, n = 0 is the exception, but that actually violates the precondition on thmLeftPadA