OCaml for the Skeptical: User-Defined Types

We've seen OCaml's type syntax because the interpreter uses it to tell you the type of every value, but due to type inference we typically don't have to write a type expression ourselves. But actually there are several reasons to use the type sublanguage (as you'll see when we come to modules). Since types are also something you can manipulate in OCaml, there is a mechanism for naming them as well. This is done with the keyword type. The simplest kind of type definition is an alias for a type that already exists. Suppose that we like int list's so much that we want to give them a shorter name. A type alias like this is easy to define:

    # type ilist = int list;;
    type ilist = int list
    #

Since OCaml doesn't automatically start calling every int list an ilist:

    # type ilist = int list;;
    type ilist = int list
    # [1;2;3];;
    - : int list = [1; 2; 3]
    #

(they are different types!) you'll just have to trust me for now that there are good reasons to want to do this.

A much more interesting use of type is to name your own user-defined types (in fact, you must name them in order to use them: in general anonymous user-defined types aren't allowed). While the ilist above is indeed a user-defined type, here I'm talking about variant types, which are what are sometimes called sum types or union types or disjoint unions. This is a type where each value is either one type, or another (or another, etc). The constructor for variant types is | and it's like an or for types. There's a restriction, however: you can't just or-together types:

    # type intorstring = int | string;;
    Syntax error
    #

Each branch of the union (each disjunct) needs to have a constructor defined for it. This is to prevent type ambiguities. If OCaml allowed the above definition, then what would be the type of 12? It could be a plain old int, or a new-fangled intorstring: there's no way to tell. And unambiguous types is what make static type inference work. So instead you have to define this type like so:

    # type intorstring = Int of int | String of string;;
    type intorstring = Int of int | String of string
    #

The constructor names Int and String are arbitrary and certainly don't have to have anything to do with the names of the types they construct, but they do have to begin with an initial uppercase letter.

To construct a value of type intorstring, you apply one of the constructors to a value of the appropriate type:

    # Int 12;;
    - : intorstring = Int 12
    # String "foo";;
    - : intorstring = String "foo"
    # [Int 12; String "foo"];;
    - : intorstring list = [Int 12; String "foo"]
    #

Note that the last example illustrates another way you can have a list containing elements of "different" types, even though OCaml lists are homogenous (of course, the whole point is that this is a list of homogenous type: all the elements are of type intorstring).

So, we have solved the ambiguity problem mentioned above: 12 is clearly an int, while Int 12 is clearly an intorstring.

Nullary Constructors

It's quite useful to define constructors that don't take an argument; these can be used to define what are often called enumerated types in other languages. Suppose you're writing a program to play a card game:

    # type suit = Club | Diamond | Heart | Spade
      type value = Jack | Queen | King | Ace | Num of int
      type card = Card of value * suit
      type hand = card list
      ;;
    type suit = Club | Diamond | Heart | Spade
    type value = Jack | Queen | King | Ace | Num of int
    type card = Card of value * suit
    type hand = card list
    # Card (Ace, Spade);;
    - : card = Card (Ace, Spade)
    # ([Card(Ace, Spade); Card(Num 7, Heart)]:hand);;
    - : hand = [Card (Ace, Spade); Card (Num 7, Heart)]
    #

Note: here I am doing something quite unusual in OCaml: explicitly specifying the type of a value by saying ([Card(Ace, Spade); Card(Num 7, Heart)]:hand), because otherwise [Card(Ace, Spade); Card(Num 7, Heart)] would be of type card list. The syntax for specifying the type of a value is (expr:type) — the parens are necessary.

Parameterized or Polymorphic Types

All the types we've defined so far have been what are called monomorphic types: they have only one shape. But OCaml also lets us define polymorphic types, which you can think of as types that are parameterized across some other type(s).

A good example is a traditional Lisp-style list type: a Lisp list is either nil (empty), or it is a cons, which is a pair consisting of some value (of any type) — traditionally called the car — representing the first element of the list, and another list — traditionally called the cdr — representing the rest of the list. Note that this type is defined recursively: a cons consists of a value of some type and a value which is (recursively) a cons.

Even though OCaml's built-in lists are implemented exactly this way, let's implement our own version. The only difference between our lists and Lisp's lists is that, since Lisp is dynamically typed, its lists are heterogenous i.e. one list may contain elements of different types, and our lists, being statically typed, must be homogenous. We'll start by defining a monomorphic list type, say, lists of int's.

    # type intlist = Nil | Cons of int * intlist;;
    type intlist = Nil | Cons of int * intlist
    # Cons(1,Nil);;
    - : intlist = Cons(1, Nil)
    # let rec length list =
	  match list with
	    Nil -> 0
	  | Cons(_,tl) -> 1 + length tl
	;;
    val length : intlist -> int = <fun>
    # length (Cons(1,Cons(2,Nil)));;
    - : int = 2
    #

This is well and good for lists of integers, but if we also wanted lists of strings, we'd need to define a different type:

    # type stringlist = Nil | Cons of string * stringlist;;
    type stringlist = Nil | Cons of string * stringlist
    #

and define a new version of the above length function (and any other intlist functions we had defined), and this would be very annoying, since they would only be trivially different! This is where polymorphic types come in: instead of intlist's and stringlist's, we will define one polymorphic list type, with a type parameter to stand in for the type of the list elements. Remember, type parameters are written 'a, 'b, etc. We only need one:

    # type 'a mylist = Nil | Cons of 'a * 'a mylist;;
    type 'a mylist = Nil | Cons of 'a * 'a mylist
    # Cons(1,Cons(2,Nil));;
    - : int mylist = Cons (1, Cons (2, Nil))
    # Cons("foo",Cons("bar",Nil));;
    - : string mylist = Cons ("foo", Cons ("bar", Nil))
    # let rec length list =
	match list with
	  Nil -> 0
	| Cons(_,tl) -> 1 + length tl
      ;;
    val length : 'a mylist -> int = <fun>
    # length (Cons(1,Cons(2,Nil)));;
    - : int = 2
    # length (Cons("foo",Cons("bar",Nil)));;
    - : int = 2
    #

You'll notice that, since length computes its result without having anything to do with the list elements per se, it's polymorphic definition is exactly the same as the intlist version.