Refinement types for input validation
Most applications need to deal with user input in some capacity; data submitted via a web form, incoming JSON payloads, command line arguments, and so on. This data needs to be deserialized into a format that the application can understand, and it may also need to be validated—for example, checking that a text input looks like a valid email address.
In dynamic languages like Python or Ruby we can use meta-programming to abstract away a lot of the boilerplate needed to deal with user input. Here’s an example of a Django form, taken from the official Django documentation1:
from django import forms
class ContactForm(forms.Form):
subject = forms.CharField(max_length=100)
message = forms.CharField(widget=forms.Textarea)
sender = forms.EmailField()
cc_myself = forms.BooleanField(required=False)
From this definition, Django can automatically derive:
- A HTML form, with inputs that correspond to the form’s properties
- Validation for the different input fields
- Given a matching “model object”, it can also derive a constructor that gets called with the validated form data
This is very convenient, but hard to recreate in a statically typed functional programming language (at least the ones in wide use today). The closest we can get is usually some kind of template or reflection-based serialization / deserialization, which comes with a major caveat: We can only perform this translation for types that have a natural mapping to the input data format, e.g JSON. This introduces two major problems:
- Validation becomes distinct from deserialization; this is problematic if, for example, we want to treat missing values and invalid values in a uniform way.
- We are limited to deserializing “primitive” data types
The latter problem is exacerbated by the fact that functional programmers are quite “type-happy”; instead of modelling data with primitives like strings and enums, they usually reach for sum types or newtype wrappers. For example, given that an input was indeed a valid email address, we may wish to promote it to an Email
type:
type Email = Email of string
let parseEmail input =
if Regex.IsMatch(input, ".+@.+")
then Ok (Email input)
else Error $"{input} is not a valid email address"
Another common example is deseralizing enums:
type House =
| Hufflepuff
| Gryffindor
| Ravenclaw
| Slytherin
let parseHouse input =
match input with
| "hufflepuff" -> Ok Hufflepuff
| "gryffindor" -> Ok Gryffindor
| "ravenclaw" -> Ok Ravenclaw
| "slytherin" -> Ok Slytherin
| other -> Error $"{input} is not a valid house"
In both of these cases, we have to wrap the value in a Result
type, as we have to handle the cases where we get an invalid input. As you can imagine, this can get quite tedious. Is there a way to remove some of this boilerplate, and potentially recover some of the features from the Django example?
Refinement Types
A refinement type is a type with an attached predicate that holds for all instances of that type. For example, a refined “list” type might be “all lists of length > 1”. The paper Refinement Types for ML shows how to extend the ML type system to support refinement types. The approach is quite elegant—at least if I’ve understood the paper correctly!
A refinement type consists of:
- A specific series of constructor instantiations
- An “or” operator that lets us choose between these series of constructors.
To give a concrete example, let’s define some refinement types for our own List
data type. Given that a List
has the following type:
type List a =
| nil
| Cons of a * List
The refinement type for the singleton list would then be:
Cons (a, nil)
The refinement type for non-empty lists:
Cons (a, List)
The refinement type for lists with at most two elements:
Cons (a, Cons (a, nil)) or
Cons (a, nil) or
nil
More formally, we can see that these operators lets us define types as regular languages—ie, we can think of refinement types as a form of regular expressions! Thus, we have that a refinement type B is a subtype of type A if the regular language described by B fits in A — eg a list of two elements is a subtype of the lists of at most two elements, which in turn is a subtype of the List
type.
The relative simplicity of regular languages means that we don’t need to use a constraint solver to check sub typing relations, and we don’t need to impose the same restrictions on the language as required by dependent types. As an aid to the programmer, language support could be added for common dependent type definitions, such as limiting the length of a “collection” type or enforcing that a string matches a given regular expression.
Input validation
While the properties described above are useful in and of themselves, the fact that we can describe refinements in the type system means we can include these checks when doing automatic deserialization! Let’s say we recreate the Django form above as a record with refinement types:
type input = {
subject: string with length < 100
message: string
sender: string matching “.+@.+”
cc_myself: bool
}
If we change the definition of Email
to use a refinement type, we no longer need the smart constructor:
type Email = Email of string matching “.+@.+”
Since the deserialized input.sender
field has been validated by the runtime to match the email regular expression, and has been given a matching refinement type, we can simply do:
let sender = Email input.sender
Just like in Django, the runtime can output an error message to be displayed on the frontend if input.sender
does not match the email regex.
Similarly, if a type is limited to a set of values, eg
type houseInput =
"hufflepuff" or
"gryffindor" or
"ravenclaw" or
"slytherin"
… the front-end can render these alternatives in an appropriate widget, eg a drop-down. Again, because the set of possible values is limited, we can write an exhaustive match statement without having to cover the “wildcard” case:
match houseInput with
| "hufflepuff" -> Hufflepuff
| "gryffindor" -> Gryffindor
| "ravenclaw" -> Ravenclaw
| "slytherin" -> Slytherin
Pretty nice, right?