Row Polymorphic Programming

Sometimes, especially when dealing with business logic, we have to deal with data that comes in from the real world, or elsewhere, naturally very messily typed, leading to nasty type signatures and messes of macros or code generation to build data structures from a schema. Row polymorphism can help us by abstracting over the fields contained in a record type in the type signature, letting us define records based on data, concatenate records together, and define functions that are generic across any record containing the required fields, all without macros.

This article is inspired by my work on implementing the Brown Benchmark for Table Types1 for my Structures2 library, an update for which will be released soon, providing a much more comprehensive collection of types for row-polymorphic programming.

At time of writing, the Wikipedia page for row polymorphism starts with:

In programming language type theory, row polymorphism is a kind of polymorphism that allows one to write programs that are polymorphic on row types such as record types and polymorphic variants.

leading into some text dense with mathematical notation that doesn't do much in explaining what any of this looks like, and what the hell does "polymorphic on row types" even mean?

Consider a basic record in Idris as an example:

record ImACat where
  constructor MkCat
  name : String
  age : Nat
  meows_a_lot : Bool

This certainly seems like it might fit the bill, it says record right there, and the wikipedia snippet did mention records. But what are these "row types"?

"Row Type" and "Record Type" are sometimes used interchangeably in this corner of type theory. While "record" may be a somewhat more intuitive name for someone approaching this from the perspective of many popular programming languages, we will mostly be using the name "Row Type", as it will make more sense in the tabular data context that will serve as framing for this article.

Imagine some tabular data, say a table in a database or a spread sheet. Tabular data is composed of a 2d grid of rows and columns, typically with each column having a name and potentially a type constraint on the values in that column. Consider this example:

Cats
name → String age → Nat meows_a_lot → Bool
Cricket 11 False
Cookie-Dough 4 True
Mango 0 False
Creampuff 5 True

"Row Types" are constructions for describing individual rows in such a structure 3. They are parameterized by a list of (label, Type) pairs, represented by the header of the table and referred to as a schema. We can see that the labels and types in our Cats table are identical to the fields of our ImACat record from earlier. As such, each row in the table can be thought of as an instance of the ImACat record, and the table itself as a list of such records.

That dependence on a schema, a piece of data describing the structure of the type, makes row types a bit more flexible, as well as a bit more complicated. We can perform the usual operations, like accessing a given column in a row by name, analogous to field access like calling name on value of ImACat. More interestingly, we can also perform new operations that enable a sort of limited, type-safe reflection over row types, such as selecting a specific set of columns from a row to produce a new row type, or concatenating the schemas of two rows to create a new type expressing the result of joining the rows.

Imagine you are helping your friend move and they need to figure out what furniture will fit in what rooms. The previous occupant of their new place has graciously provided a table containing the length and width of each room, and their furniture supplier has provided a table detailing the lengths, widths, and heights of the bounding cuboids for each piece of furniture:

Rooms
room → String length → Double width → Double
Bedroom 3.65 3.35
Living Room 4.87 3.35
Dining Room 2.44 2.44
Kitchen 3.35 2.44
Bathroom 3.65 2.44
Furniture
item → String length → Double width → Double height → Double
Dresser 1.00 0.60 0.76
Bed 2.00 1.50 0.50
Nightstand 0.60 0.48 0.75
Dinner Table 1.80 0.90 0.80
TV Stand 1.93 0.42 0.90
Coffee Table 1.81 0.81 0.46

One thing that's going to be critical to determine is how much floor space is left in a room after the furniture's been placed in it. After all, a room isn't of much use if there's not even enough free floor space to stand in it. You notice that both of these tables contain the information needed to calculate the available area and the foot prints of the furniture, the length and width columns.

Its tempting to define a Room data type and a Furniture data type containing each table's respective columns as fields, and describe the area/footprint via an Area interface. However, this approach has several drawbacks. It requires implementing the interface manually for each data type4, can hide certain categories of error, such as inadvertently using the wrong field to calculate the area, and is potentially quite fragile if the structure of the tables changes on us or we need to add more tables to our system.

We can do better. With our parameterized row types we can write an area function that is defined to work on any row type that has a length and a width column of type Double. This is the essence of row polymorphism, writing code that's generic not over what interfaces a type implements, but instead is generic over what fields the type contains.

While row polymorphism does come with the relative downside of loosing proof-of-intentionality5, so to speak, it does come with some major upsides. Two columns in two different tables having the same names and types doesn't always mean that they are semantically equivalent, but its often a pretty good bet, especially when they are coming from related sources. More importantly, thinking about data types in terms of what values they contain rather than what behaviors they implement is often a much better way to think about messy real-world data that was laid out with a dynamically typed language in mind on a good day, and not laid out with computers in mind at all on a bad day.

Like so many things in programming, its a trade off. Low level systems programming, where almost all the data you are working with was generated by the computer, often by systems you also control, isn't going to get much benefit from row polymorphic programming. Other tasks like frontend work, business logic, and video games quite frequently deal with extremely messy data that has only loosely defined shape and doesn't want to fit neatly into a type hierarchy, leaving you with a lot to gain from row polymorphism. In the former category, you are more likely to hurt from the loss of semantic guarantees and end up needlessly weakening your type system and tripping over yourself more than you would without it. In the messier situation, you are instead likely to both end up saving type fighting with the type system trying to fit your data to it, and also run into the downsides of row polymorphism much less frequently.

Row polymorphism is also a fantastic way to think about assigning types to code that was originally written for a dynamically typed programming language6. In my personal experience I have found that when an experienced programmer who really seems like they should know better is making some weird sounding arguments about how type systems decrease their productivity and don't prevent enough bugs to be worth it, they seem to usually be complaining about the lack of row polymorphism (or the closely related structural subtyping) in popular statically typed languages, just without having the technical vocabulary for it.

One of the core features of row polymorphism is that the labels and types of the columns are determined by a piece of data, a list like structure containing pairs of labels and types. In languages built specifically to be row polymorphic, this data describing the column structure is often implicit. Idris, however, isn't built to be row polymorphic. It's type system just happens to be powerful enough that we can build it on our own, so we will have to build an explicit schema composed of our own pairs of labels and types.

A more developed library would use bespoke types for most of these components, but for the sake of keeping things concise, we'll use regular ol' lists as much as possible.

To make things just a little bit cleaner, we'll define our own data type for the pairs, which we will call Header, and an infix constructor for it, :>:

Note

Remember that the Idris concept called a record is a distinct concept, and Idris records are not the row types we are going to be building.

public export
record Header where
  constructor MkHeader
  label: String
  type: Type

export infixr 8 :>
public export
(:>) : String -> Type -> Header
(:>) str x = MkHeader str x

For this article, we will keep the schema simple and just use a basic list of headers:

public export
Schema : Type
Schema = List Header

We will also need a way to compute the column types from our schema, turning a list of headers into a list of types. This amounts to mapping the type accessor over our list of Headers, but we will implement this with a bespoke function to avoid negative interactions with proof search resulting from the standard library's implementation of map:

Tip

This function has an UpperCamelCase name because it is intended to be called in type signatures. This is both conventional, and reduces ambiguity as it can't be confused for a lowercase, implicitly defined, argument name.

public export
ColumnTypes : Schema -> List Type
ColumnTypes [] = []
ColumnTypes (h :: hs) = h.type :: ColumnTypes hs

A row is a list of the same length as the schema where every list element has a type determined by the header in the same position of the schema:

Tip

HList is a standard library type with type constructor List Type -> type. Each type in the list of types determines the type of value in that position of the list. Notice this also constrains the length of the list, it must be exactly the same as the length of the list of types. HList can be thought of as a more primitive version of the Row type we are building.

public export
Row : Schema -> Type
Row hs = HList (ColumnTypes hs)

We will also want a way of indexing the values in a row by their columns, instead of by their positions. We only want this function to be callable when the row actually has that column, so we also require an inclusion proof as an auto-implicit, so the compiler can generate it for us when possible:

Tip

A value of type Elem x xs is a proof that the value x is included in the list xs, which also encodes its position in the list. The type signature of the prf argument thus reads "h is an element of hs with known position in hs"

namespace RowIndex
  export
  index : {hs : Schema} -> (h : Header) -> (r : Row hs) -> {auto prf: h `Elem` hs} -> h.type
  index {hs = (_ :: hs)} h (x :: xs) {prf = Here} = x
  index {hs = (_ :: hs)} h (y :: ys) {prf = (There x)} = index h ys

A table is a list of rows all sharing the same schema, and, with our new tools, can easily be defined as such:

public export
Table : Schema -> Type
Table hs = List (Row hs)

Now that we have a way to describe the type signature and structure of a table, let's go ahead and translate our previous table examples into code:

Rooms : Schema
Rooms = ["room" :> String, "length" :> Double, "width" :> Double]

rooms : Table Rooms
rooms = 
  [ ["Bedroom",     3.65, 3.35]
  , ["Living Room", 4.87, 3.35]
  , ["Dining Room", 2.44, 2.44]
  , ["Kitchen",     3.35, 2.44]
  , ["Bathroom",    3.65, 2.44]
  ]
Furniture : Schema
Furniture = ["item" :> String, "length" :> Double, "width" :> Double, "height" :> Double]

furniture : Table Furniture 
furniture = 
  [ ["Dresser",      1.00, 0.60, 0.76] 
  , ["Bed",          2.00, 1.50, 0.50]
  , ["Nightstand",   0.60, 0.48, 0.75]
  , ["Dinner Table", 1.80, 0.90, 0.80]
  , ["TV Stand",     1.93, 0.42, 0.90]
  , ["Coffee Table", 1.81, 0.81, 0.46]
  ]

With our tables defined, it's time to revisit our area function. The schema will be an implicit variable in the type signature, and we'll constrain it such that both the length and width fields are elements of the schema. These constraints allow us to simply call the index function with the given fields on the provided row and have the guarantee that if our call type checks, we'll get a value back, no need for Maybe here7:

Tip

In a type signature, x => is equivalent to {auto _ : x} ->

area : {hs : Schema} -> "length" :> Double `Elem` hs => "width" :> Double `Elem` hs => 
  Row hs -> Double
area x = 
  let length = ("length" :> Double) `index` x
      width = ("width" :> Double) `index` x
  in length * width

Without any further implementation, we can now call this function on both our room and furniture tables and it works as expected:

Note

In this specific context, we do have to explicitly provide the schema to the area function to disambiguate, but this can be avoided with more careful design and should be considered more of an artifact of the limited implementation in this article then an intrinsic ergonomics concern with this approach to programming in Idris.

Unit Test
map (area {hs = Rooms}) rooms == [12.2275, 16.314500000000002, 5.9536, 8.174, 8.905999999999999]
Unit Test
map (area {hs = Furniture}) furniture == [0.6, 3.0, 0.288, 1.62, 0.8106, 1.4661000000000002]

Just being able to access the fields in a row polymorphic way is pretty useful, but its also not really that impressive. We have much more interesting capabilities at our disposal, like adding new fields. As a first example, we can take a function that consumes a row and produces a new value, along with a name for the new value, and use that to add a new column to each row in the table:

buildColumn : {hs : Schema} 
  -> (f : Row hs -> t) -> (name : String) -> Table hs -> Table (hs ++ [name :> t]) 
buildColumn f name table = map buildColumnRow table
  where
    addToEnd : {name' : String} -> {hs' : Schema} 
      -> t -> Row hs' -> Row (hs' ++ [name' :> t])
    addToEnd {hs' = []} x [] = [x]
    addToEnd {hs' = (y :: ys)} x (x' :: xs) = x' :: addToEnd x xs
    buildColumnRow : {name' : String} -> Row hs -> Row (hs ++ [name' :> t])
    buildColumnRow row = addToEnd (f row) row

With our new tool, we can make new tables derived from our previous ones, such as adding an area column to our previous two tables:

roomsWithArea : Table (Rooms ++ ["footprint" :> Double])
roomsWithArea = buildColumn {hs = Rooms} (area {hs = Rooms}) "footprint" rooms
Unit Test
roomsWithArea ==
  [ ["Bedroom",     3.65, 3.35, 12.2275]
  , ["Living Room", 4.87, 3.35, 16.314500000000002]
  , ["Dining Room", 2.44, 2.44, 5.9536]
  , ["Kitchen",     3.35, 2.44, 8.174]
  , ["Bathroom",    3.65, 2.44, 8.905999999999999]
  ]
furnitureWithArea : Table (Furniture ++ ["footprint" :> Double])
furnitureWithArea = buildColumn {hs = Furniture} (area {hs = Furniture}) "footprint" furniture
Unit Test
furnitureWithArea ==
  [ ["Dresser",      1.0,  0.6,  0.76, 0.6]
  , ["Bed",          2.0,  1.5,  0.5, 3.0]
  , ["Nightstand",   0.6,  0.48, 0.75, 0.288]
  , ["Dinner Table", 1.8,  0.9,  0.8, 1.62]
  , ["TV Stand",     1.93, 0.42, 0.9, 0.8106]
  , ["Coffee Table", 1.81, 0.81, 0.46, 1.4661000000000002]
  ]

We can also implement a function that takes only certain specified fields from a table, analogous to an SQL SELECT statement. This function only makes sense to apply when all the fields in the new schema are also in the old schema, so we take an auto implicit argument containing a proof of inclusion for each element of the new schema, which we will use to index into the input table. select makes sense to define both for individual rows and whole tables, so we create two different versions of it in two different namespaces to allow for overloading:

Tip

All is a more general version of HList. All p xs can be thought of as being equivalent to HList (map p xs), but in the Idris standard library, HList xs is actually an alias for All id xs.

namespace SelectRow
  public export
  select : {hs : Schema} -> Row hs -> (is : Schema) -> {auto allIn : All (`Elem` hs) is}
    -> Row is
  select r [] {allIn = []} = []
  select r (x :: xs) {allIn = (this :: rest)} = 
    let this = x `index` r
    in this :: select r xs

namespace SelectTable
  public export
  select : {hs : Schema} -> Table hs -> (is : Schema) -> {auto allIn : All (`Elem` hs) is}
    -> Table is
  select xs is = map (\r => select r is) xs

Lets give this a test drive, we only really wanted to know how big the rooms are8, and now that we have the areas calculated, we can discard the length and width fields:

roomAreas : Table ["room" :> String, "footprint" :> Double]
roomAreas = 
  select {hs = Rooms ++ ["footprint" :> Double]}
    roomsWithArea ["room" :> String, "footprint" :> Double]
Unit Test
roomAreas ==
  [ ["Bedroom",     12.2275]
  , ["Living Room", 16.314500000000002]
  , ["Dining Room", 5.9536]
  , ["Kitchen",     8.174]
  , ["Bathroom",    8.905999999999999]
  ]

One interesting thing to note about select is that you don't actually need to know the complete schema of the table it was called on for your call to type check. This caveat allows select calls to still work correctly after rows have been added to tables, reused on tables that have had some sort of join operation applied to them, or work across entire families of related tables.

Implementing types as Rows doesn't lock us out of more traditional programming methods. Say, for instance, in interacting with someone else's code, it turns out that we actually do have to provide area information through a HasArea interface:

interface HasArea t where
  getArea : t -> Double

Not only can we implement this for a specific Row type, we can implement it generically for any Row that has the appropriate fields in one shot:

[rowHasArea]
{hs : Schema} -> All (`Elem` hs) ["length" :> Double, "width" :> Double] 
  => HasArea (Row hs) where
  getArea row = 
    let [length, width] = select row ["length" :> Double, "width" :> Double]
    in length * width

Note

This code is a bit needlessly ugly, the implicit arguments can be made amenable to proof search by using bespoke types instead of generalist standard library types, but that would have made this article a lot longer than it needed to be.

Unit Test
map (getArea @{rowHasArea {hs = Rooms}}) rooms == 
    map head 
      (select {hs = Rooms ++ ["footprint" :> Double]} roomsWithArea ["footprint" :> Double])

I've made a number of simplifications compared to my implementation in Structures9 and implementations in other languages to keep the implementation side concise. If Structures hasn't been updated yet by the time you are reading it, you can check out the status at the time I was writing this article here. I also recommend checking out the Brown Benchmark For Table Types for a more complete treatment of the topic.

To summarize a few of the changes relative to Structures:


  1. https://github.com/brownplt/B2T2↩︎

  2. https://git.sr.ht/~thatonelutenist/Structures↩︎

  3. Though row types need not inhabit tables, they can also be used effectively on their own↩︎

  4. At best, we might be able to get away with something like a derive macro, but that's still something we have to invoke manually↩︎

  5. Generally speaking, with "traditional" data types, an interface typically requires some manual intervention from the programmer to be implemented for a given type. This provides a level of assurance that the implementation over that data type is intentional, and thus likely semantically valid.↩︎

  6. For instance, typescript's structural subtyping is a very good approximation of row polymorphism, and this is something I believe has contributued just as much to its success as a javascript replacement as its backwards compatibility. A statically typed langauge that could run existing javascript code, but made it a battle to assign types to that existing javascript code, wouldn't be very useful as a transition language.↩︎

  7. In other languages, feilds in row types may be considered to be implicitly nullable. In Structures, I provide explicit control over nullability of fields in the schema.↩︎

  8. Because that's totally all the information you could ever need to determine what all will fit in a room, right?↩︎

  9. https://git.sr.ht/~thatonelutenist/Structures↩︎