Views: Pattern Matching Arrays as Lists

For performance reasons, we sometimes find ourselves implementing the interface of a more basic structure in terms of a much more complicated one. In functional programming languages, this often comes up in the context of needing to build a structure through mutation and then "seal" it for further processing as an immutable object. While this often comes with great performance gains, the encapsulation needed to hide the internal mutability or unsaftey, or even the nature of the interior mutability itself, can deprive us of the ability to meaningfully pattern match on the results. Views and dependent pattern matching can overcome this limitation by allowing us to redefine the notion of pattern matching for a type.

Whether or not the efficiency difference actually matters for a given usecase, working with arrays is almost always more efficient than working with the FP-favorite linked lists. This is unfortunate for us functional programmers, arrays don't play nearly as well with the rest of our toys as lists do. When we encounter a usecase where the efficiency gains can not be ignored 1, be it dealing with very large quantities of data, or the mutating algorithm for the task just having a way better time signature, it's common to write at least some part of the codebase in terms of mutable arrays.

Once we have our mutable array in its final state, we are faced with a choice. We can either convert it to a list, wasting a potentially very large number of precious cycles, or continue passing it around as an array2, losing pattern matching and all the other things that make linked lists nice to deal with, at least from the functional programmers perspective. Idris, by way of Epigram, gives us a new tool, views and dependent pattern matching3, that allow us to have our cake and eat it too, lettings us keep passing our structure around as an array, but interacting with it as if it was a list.

To keep things simple, and only reference types that are in the somewhat sparse Idris standard library, we'll implement our own wrapper around a mutable IO array that "seals" it into an immutable collection. To keep ourselves honest, we'll open up a new namespace for this part of the article so we can encapsulate our structure:

Note

The namespace will encapsulate everything up to, but not including, the first non-intended block of code as if it were in its own module.

namespace IArray

The structure of this wrapper is pretty simple, containing only the IOArray being wrapped. To keep things interesting, and flex the type system a bit more, we will index the type of our wrapper with the length of the array. Worry not, however, we will still commit all manner of horrors and unsafePeformIO4 under the hood as we forge ahead:

  export
  record IArray (n : Nat) (t : Type) where
    constructor MkIArray
    arr : IOArray t

The functions for working with the built in IOArray type in Idris return Maybes, encoding the result of the bounds checks performed by those operations 5. Since we are indexing the type of our wrapper by the length of the contained array, we can rely on type checking to ensure all indexing is within bounds and soundly assume that none of our calls to IOArray functions will return Nothing. We'll make a little unsafe function for unconditionally converting a Maybe a to an a:

Tip

The %unsafe pragma is purely cosmetic, it just instructs the compiler to highlight calls to this function with a different color to draw the programmers attention.

  %unsafe
  unwrap : Maybe a -> a
  unwrap Nothing = assert_total $ idris_crash "Oops all nothings"
  unwrap (Just x) = x

What is an array without the usual constant time indexing function? As indexing doesn't mutate the array, we can safely hide the IO under the hood. We will also accept an erased auto implicit proof that the provided index is in bounds to exclude the possibility of an out-of-bounds access:

  export
  index : (i : Nat) -> (arr : IArray n a) -> {auto 0 _ : i `LT` n} -> a
  index i arr = unwrap . unsafePerformIO $ readArray arr.arr (cast i)

With careful use of linear types, we could provide a rich, safe interface for building these arrays without leaking the fact that they use IO under the hood6, but that is beyond the scope of this article. Instead, for the sake of having examples to work with, we'll provide a few functions for generating sample arrays:

  ||| Convert a `Vect` into an IArray
  export
  fromVect : {n : Nat} -> (xs : Vect n a) -> IArray n a
  fromVect xs = unsafePerformIO fromVectIO
    where
      loadVect : (arr : IOArray a) -> (idx : Nat) -> (ys : Vect m a) -> IO ()
      loadVect arr idx [] = pure ()
      loadVect arr idx (y :: ys) = do
        _ <- writeArray arr (cast idx) y
        loadVect arr (S idx) ys
      fromVectIO : IO (IArray n a)
      fromVectIO = do
        arr <- newArray (cast n)
        loadVect arr 0 xs
        pure $ MkIArray arr
  
  ||| Create an `IArray` containing all the `Nat`s less than or equal to `n`
  export
  range : (n : Nat) -> IArray (S n) Nat
  range n = unsafePerformIO rangeIO
    where
      buildRange : (arr : IOArray Nat) -> (idx : Nat) -> IO ()
      buildRange arr 0 = do
        _ <- writeArray arr 0 0
        pure ()
      buildRange arr (S k) = do
        _ <- writeArray arr (cast (S k)) (S k)
        buildRange arr k
      rangeIO : IO (IArray (S n) Nat)
      rangeIO = do
        arr <- newArray (cast (S n))
        buildRange arr (S n)
        pure $ MkIArray arr

We will also need a few functions for handling arrays in a list-like manner to write our view later. We'll start with a head and a tail function with the same general semantics as their list counterparts, an empty function for constructing the empty array, and an arrayCons function that adds a new element to the front of an array much like :: on lists:

Note

We won't actually be calling empty or arrayCons once we get into the later parts of this article, we only actually need their type signatures for later, but their implementations are provided for the sake of completeness.

  export
  head : IArray (S n) a -> a
  head x = unsafePerformIO headIO
    where
      headIO : IO a
      headIO = do
        val <- readArray x.arr 0
        pure $ unwrap val
  
  export
  tail : {n : Nat} -> IArray (S n) a -> IArray n a
  tail {n} x = unsafePerformIO tailIO
    where
      copyContents : (newArr : IOArray a) -> (idx : Nat) -> IO ()
      copyContents newArr 0 = pure ()
      copyContents newArr (S k) = do
        val <- map unwrap $ readArray x.arr (cast (S k))
        _ <- writeArray newArr (cast k) val
        copyContents newArr k
      tailIO : IO (IArray n a)
      tailIO = do
        newArr <- newArray (cast n)
        copyContents newArr n
        pure $ MkIArray newArr
  
  export
  empty : IArray 0 a 
  empty = unsafePerformIO $ do 
    arr <- newArray 0
    pure $ MkIArray arr
  
  export
  arrayCons : {n : Nat} -> (head : a) -> (tail : IArray n a) -> IArray (S n) a
  arrayCons head tail = unsafePerformIO $ do 
    newArr <- newArray (cast (S n))
    copyContents newArr n
    pure $ MkIArray newArr
    where
      copyContents : (newArr : IOArray a) -> (idx : Nat) -> IO ()
      copyContents newArr 0 = do
        val <- map unwrap $ readArray tail.arr 0
        _ <- writeArray newArr 1 val
        pure ()
      copyContents newArr (S k) = do
        val <- map unwrap $ readArray tail.arr (cast (S k))
        _ <- writeArray newArr (cast (S (S k))) val
        pure ()

Its also common to need to know the lengths of arrays, and while the IArray type is indexed over the length of the contained array, the length may be erased, and we may need to compute it at runtime. We'll provide not just the length function, but a companion, unsafely implemented, proof that the result of the length function corresponds to the length argument to the type constructor:

  export
  length : IArray n a -> Nat
  length x = cast (max x.arr)
  
  export
  lengthCorrect : (0 xs : IArray n a) -> length xs = n
  lengthCorrect _ = believe_me (the (1 = 1) Refl)

We can now exit our namespace and demonstrate that our immutable encapsulated array works as expected:

Unit Test
index 5 (range 10) == 5
Unit Test
head (fromVect ['a', 'b', 'c']) == 'a'
Unit Test
head (tail (fromVect ['a', 'b', 'c'])) == 'b'
Unit Test
length (range 10) == 11

A View is essentially a data structure that provides us with an alternate notion of pattern matching for another data structure. For a given type t, views over that type generally have a type constructor of form t -> Type, a type indexed over a specific value of t, and the data type itself essentially provides proof that the given value of type t can be broken apart in a specific way, allowing you to use the with rule7 to pattern match based on arbitrary, user defined information about the data instead of just its nominal structure.

One commonly used view form the standard library is AsList from Data.String. In Idris, String is an implementation defined bag of bytes with array-like internal structure, much like our example here. This has numerous benefits over the traditional functional programming representations based around linked lists of characters, but makes traditional pattern matching based algorithms annoying to implement. AsList bridges this gap by allowing us to process strings as if they were lazily linked lists8.

Consider the following example of AsList which, someone redundantly, converts a String to a List Char:

Tip

Notice how in the recursive call, instead of simply calling stringToList str, we instead write stringToList str | xs. The vertical pipe in the expression side of the match indicates that the item on the right hand side of the pipe, in this case xs, should be used for value of the right hand side of the with statement in the recursive call instead of recomputing it from scratch.

stringToList : String -> List Char
stringToList str with (asList str)
  stringToList "" | [] = []
  stringToList (strCons c str) | (c :: xs) = c :: stringToList str | xs

The function asList has type asList : (str : String) -> AsList str, producing a value of the AsList view type indexed by the value of the input string. Functions such as this one which produce values of view types are referred to as covering functions. Note how the values on the left hand side of the statements in the with block are a literal and a function call, the (dependent) type of the right hand side has given us information about the form of values in the left hand side that goes beyond simply identifying which constructor was used to produce that value.

The view we'll be implementing for IArray will be very similar to AsList, so we'll name it accordingly. We'll also overload the list constructors to make the redefined pattern matching feel as list-like as possible:

data ArrayAsList : IArray n a -> Type where
  Nil : ArrayAsList IArray.empty
  (::) : {xs : IArray n a} 
    -> (head : a) -> (tail : Lazy (ArrayAsList xs)) -> ArrayAsList (arrayCons head xs)

In this definition, [] translates to a proof that the indexed array is the empty array, and x :: xs translates to a proof that the indexed array can be built by calling arrayCons on the value x and the array xs. We need not actually call arrayCons to make use of this type, only establish that the value could be produced with an arrayCons invocation.

The covering function for ArrayAsList is pretty straight forward, if the array has length zero, return Nil, and if it has a non-zero length, separate out the head and tail of the array and cons the head onto the result of calling arrayAsList on the tail. We'll implement two versions, one where the length parameter of the type is known, and another where it is erased and recovered at runtime:

Warning

Since the head, tail, and empty operations are defined using unsafePerformIO under the hood, they are opaque to the type checker, and we have to use the hideously unsafe believe_me function, which has type believe_me : a -> b. believe_me essentially disables the type checker, functioning much like transmute does in Rust, and will cause serious undefined behavior if used carelessly.

This isn't how you would usually write views, normally some aspect of the structure would allow you to actually prove the properties in question, but is representative of writing views for FFI and IO based types. Additionally, it serves to really drive home the point that views allow pattern matching over arbitrary facts about the type, and not just structural facts.

arrayAsList' : {n: Nat} -> (xs : IArray n a) -> ArrayAsList xs
arrayAsList' {n = 0} xs = 
  believe_me (the (ArrayAsList (the (IArray 0 a) IArray.empty)) Nil)
arrayAsList' {n = (S k)} xs = 
  let rest : Lazy (ArrayAsList (tail xs)) = Delay (arrayAsList' (tail xs))
  in believe_me $ head xs :: rest

arrayAsList : (xs : IArray n a) -> ArrayAsList xs
arrayAsList xs = 
  rewrite sym $ lengthCorrect xs in arrayAsList' (convert xs)
  where
    convert : (ys : IArray m a) -> IArray (length ys) a
    convert ys = rewrite lengthCorrect ys in ys

Now that we have our nice arrayAsList covering function, lets show it off. We'll start with a simple use case just to get a feel for it, converting an array to a list:

arrayToList : (xs : IArray n a) -> List a
arrayToList xs with (arrayAsList xs)
  arrayToList IArray.empty | [] = []
  arrayToList (arrayCons head xs) | (head :: tail) = 
    head :: (arrayToList xs | tail)

Much like our example with String's AsList view, this is somewhat redundant, but lets us get a feel for using the view, and shows the equivalence to pattern matching on lists quite well.

We can also demonstrate that it works as expected:

Unit Test
arrayToList (fromVect ["Idris", "is", "fun", "!"]) == ["Idris", "is", "fun", "!"]

To show off a slightly more in depth application of our new view, we can use it to implement the Foldable interface, the basic abstraction over collections that can be consumed one element at a time, in a fairly idiomatic way.

First, we'll get the implementation of null out of the way, since that just needs to compare the length against 0:

arrayNull : IArray n a -> Bool
arrayNull xs = length xs == 0

More interesting is foldr, which consumes collections starting from the end, otherwise known as the "right hand side" due to the left-to-right writing convention. foldr on lists can be thought of as replacing the conses (::) with the provided function, and replacing the terminator (Nil/[]) with the provided accumulator value. Our ArrayAsList view allows us to cleanly translate that concept over to arrays, despite arrays having neither conses nor terminators. Note how this mirrors arrayToList above with f in place of :: on the expression side of the match:

arrayFoldr : (f : e -> acc -> acc) -> acc -> IArray n e -> acc
arrayFoldr f x xs with (arrayAsList xs)
  arrayFoldr f x IArray.empty | [] = x
  arrayFoldr f x (arrayCons head xs) | (head :: tail) = 
    head `f` (arrayFoldr f x xs | tail)

To demonstrate that arrayFoldr works as expected, we can use it to convert an array to a list:

Unit Test
arrayFoldr (::) [] (range 4) == [0, 1, 2, 3, 4]

Even though foldr is sufficient to provide an implementation of Foldable, we will also consider foldl, which consumes elements starting from the beginning, or the left hand side, of the collection. foldl on lists can also be thought of as replacing the conses (::) with the provided function, but because the list is being consumed head to tail the order of the arguments is flipped, and instead of replacing the end of the list with the provided accumulator, the start is replaced. ArrayAsList also allows us to naturally carry over this logic to arrays:

arrayFoldl : (f : acc -> e -> acc) -> acc -> IArray n e -> acc
arrayFoldl f x xs with (arrayAsList xs)
  arrayFoldl f x IArray.empty | [] = x
  arrayFoldl f x (arrayCons head xs) | (head :: tail) = 
    arrayFoldl f (f x head) xs | tail

When :: is left-folded over a list, it reverses the list. We can do the same to an array to demonstrate that our arrayFoldl works as expected:

Unit Test
arrayFoldl (flip (::)) [] (fromVect ['a', 'b', 'c', 'd', 'e']) == ['e', 'd', 'c', 'b', 'a']

We can now put together our implementation of Foldable for IArray:

Foldable (IArray n) where
  null = arrayNull
  foldr = arrayFoldr
  foldl = arrayFoldl

And test drive it by using the standard library toList function, which is implemented in terms of Foldable:

Unit Test
toList (fromVect ["yet", "another", "array", "of", "things"]) == ["yet", "another", "array", "of", "things"]

While making non-list structures behave as if they were lists is certainly an interesting and very practical application of views, they can do a lot more for us. We'll explore just a few examples of how views can reframe problems.

The Divides view from Data.Primitives.Views allows us to reframe division into pattern matching, resulting in either a "divide by zero" case, or multiplication plus a remainder:

Tip

When using Idris's interactive editing functionality, when using the case-split operation on the right hand side of the with block, the compiler will automatically perform the destructuring for you, in this case rewriting numerator as ((denominator * div) + rem).

exampleDivides : (numerator, denominator : Integer) -> Maybe Integer
exampleDivides numerator denominator with (divides numerator denominator)
  exampleDivides numerator 0 | DivByZero = Nothing
  exampleDivides ((denominator * div) + rem) denominator | (DivBy div rem prf) = 
    Just div

You may have noticed that Divides provides us with a total view of division, explicitly handling the divide by zero case. Another upside of the Divides view is that it allows us to ignore the divide by zero case when the divisor is known to be non-zero, without losing the totality of the operation:

isEven : (x : Integer) -> Bool
isEven x with (x `divides` 2)
  isEven ((2 * div) + rem) | (DivBy div rem prf) = rem == 0

This also works when the divisor isn't know at compile time, but we have a proof that it isn't zero at runtime:

Tip

In the DivByZero branch, the prf argument is revealed to have type 0 = 0 -> Void, which is an impossible situation.

Since a value of 0 = 0 is easily constructed, we can pass it to the prf function to 'realize' a value of Void (no Voids are actually created, since this is an impossible situation). This can then be passed into absurd, which has type signature Uninhabited a => a -> b, and is basically beleive_me, but what if it is actually completely safe because it's impossible to actually call.

absurd is commonly used to, as the docs refer to it, "discharge proof obligations". Basically, it gets us out of having to implement a branch when we have proof we can't actually be in that branch, even when that fact is a bit too non obvious for the impossible keyword to work.

nonZeroDivides : (x, y : Integer) -> (prf : Not (y = 0)) -> Integer
nonZeroDivides x y prf with (x `divides` y)
  nonZeroDivides x 0 prf | DivByZero = absurd $ prf (the (0 = 0) Refl)
  nonZeroDivides ((y * div) + rem) y prf | (DivBy div rem prf1) = div

The SplitRec view allows us to pattern match on a list by splitting it into a left half and a right half, preserving the ordering of the elements, and doing so recursively. This can be used to, for example, implement merge sort in a rather ergonomic way:

Note

++ is the normal list concatenation operation. For a given list xs, a value of the SplitRec xs type with the form SplitRecPair lefts rights _ _ is a proof that the original list xs can be produced as the result of evaluating lefts ++ rights.

mergeSort : Ord a => (xs : List a) -> List a
mergeSort xs with (splitRec xs)
  mergeSort [] | SplitRecNil = []
  mergeSort [x] | (SplitRecOne x) = [x]
  mergeSort (lefts ++ rights) | (SplitRecPair lefts rights lrec rrec) = 
    merge (mergeSort lefts | lrec) (mergeSort rights | rrec)
  where
    merge : (ys, zs : List a) -> List a
    merge [] [] = []
    merge [] (z :: zs) = z :: zs
    merge (y :: ys) [] = (y :: ys)
    merge (y :: ys) (z :: zs) = 
      if y < z
        then y :: (merge ys (z :: zs))
        else z :: (merge (y :: ys) zs)

And we can see that it works as expected:

Unit Test
mergeSort [4, 1, 5, 3, 2] == [1, 2, 3, 4, 5]

The VList view allows recursion on the middle of a list, separating out both the first and the last element in each step:

isPalindrome : Eq a => List a -> Bool
isPalindrome xs with (vList xs)
  isPalindrome [] | VNil = True
  isPalindrome [x] | VOne = True
  isPalindrome (x :: (xs ++ [y])) | (VCons rec) = 
    if x == y
      then isPalindrome xs | rec
      else False

For the sake of demonstrating isPalindrome, we'll revisit String's AsList, using it to make a function that converts all the characters in a string to lowercase and skips over periods and spaces:

prepareString : String -> List Char
prepareString str with (asList str)
  prepareString "" | [] = []
  prepareString (strCons c str) | (c :: x) = 
    if isYes $ isElem c [' ', '.']
      then prepareString str | x
      else toLower c :: (prepareString str | x)

And stringing them together for a demonstration:

Unit Test
isPalindrome (prepareString "Rat Star.") && (not . isPalindrome . prepareString) "Palindrome."

  1. A practical inevetiability in any sufficently large codebase↩︎

  2. Possibly sealing it into an immutable array if the language in use has good support for that↩︎

  3. https://idris2.readthedocs.io/en/latest/tutorial/views.html↩︎

  4. While it still takes great care to use unsafePerformIO in a sound manner, unlike in Haskell, laziness by default doesn't come around to bite us in Idris, being a strict-by-default language after all. Using unsafePerformIOin Idris in a sound manner is more comparable to, say, unsafe in Rust, tricky, but doable.↩︎

  5. We could avoid this by using the primitive array operations, which assume bounds checking has already been performed, but PrimIO is a story for another day↩︎

  6. https://github.com/stefan-hoeck/idris2-array↩︎

  7. https://idris2.readthedocs.io/en/latest/tutorial/views.html↩︎

  8. Lazy lists in functional programming languages are largely equivlant to iterators in procedural programming languages. The AsList view is morally equivlant to .chars() from Rust, for instance.↩︎