OCaml for the Skeptical: The Language of Types

OCaml lets you write expressions in two distinct realms: the realm of values (which is the realm of ordinary programming), and the realm of types. Part of learning OCaml is learning the names of the built-in types, the algebra that allows you to use type operators or type constructors to combine existing general types into specialized types, and how to express completely new user-defined types.

Since OCaml uses type inference to deduce the types of the expressions you write, you don't initially need to write down types in your code. But you see these types (or signatures) right from the beginning, because the interactive top-level tells you not just the value, but the type of every expression you evaluate. So it's good to understand the language of types right from the beginning, so that you understand what OCaml is telling you about your programs.

Most importantly, when you make type errors in your programs, you need to understand types so that you can understand the error messages!

How do you ask the OCaml top-level for the type of some value, say 12? You just evaluate an expression that results in that value (the simplest such expression being 12 itself) and OCaml tells you not only the value but the type:

    # 12;;
    - : int = 12
    #

That's type (int) = value (12). Since functions are values, you can ask OCaml the type of a function by evaluating an expression that results in that function, the simplest such expression being either a lambda expression, or, if the function is bound to a name, just the name. (To evaluate a function is not to apply it to some argument!) When you do this, OCaml displays the type and value of the function. Consider the Boolean function not, the list length function List.length, and the function that increments an integer by one (given as a lambda):

    # not;;  
    - : bool -> bool = <fun>
    # List.length;;
    - : 'a list -> int = <fun>
    # fun n -> n + 1;;
    - : int -> int = <fun>
    #

There are several interesting things to note. First, why is the value of the function given as <fun>? The answer is, because the value of a function is compiled machine code^[1], which might be very bulky and wouldn't make much sense anyway if displayed on the screen. So, OCaml just displays <fun> to stand in for it.

Next, we see that function types are indicated by expressions of the form domain -> range. This is explained in detail in Defining and Applying Functions.

Finally, we see that the domain of List.length is given as 'a list. What does this mean?

Type Variables and Polymorphism

Type expressions can contain type variables, just as ordinary (value) expressions can contain variables. In ordinary expressions, these variables stand for values, and in type expressions they stand for types. Type variables are written as a letter following a single-quote, e.g. 'a, 'b, 'c, etc. (I pronounce these as the nearest Greek letter, e.g. 'a as alpha, 'b as beta, etc.; I don't know how common this is in the OCaml community but it's common in books and papers about functional programming.)

The OCaml list type is actually not a type per se, but rather a postfix type-operator that constructs concrete (monomorphic) list types. For example, the list [1;2;3] is of type int list, while the list ["foo";"bar";"baz"] is of type string list. So in the type expression int list, the list type-constructor is applied to the type int to yield the type int list.

Now, if there were only monomorphic list types, then you couldn't have a function that would compute the length of any list: you'd have to define an int_list_length function, and a string_list_length function, and so on ad infinitum.

But OCaml lets you write polymorphic functions that work for any type of list. The problem for a statically-typed language is: what is the type of such a function? Well, as we saw above, the domain of the built-in List.length functions is 'a list; the type variable 'a can be replaced with any type, so List.length works for any type of list: int list, string list, (int -> int) list, etc.

Some functions are polymorphic in more than one place: they use several different type variables. Within a given type expression, all type variables must be used consistently. So all the 'a's must refer to the same type, and all the 'b's must refer to the same type. (These two types might be the same.) This is exactly like simple value expressions, such as a + b / (2 * a), where each a must be the same value, and the b can be the same value or a different one.

For example, List.map is of type ('a -> 'b) -> 'a list -> 'b list. If you count the arrows (the highest-level arrows!) you can see that this is a binary function that takes a function as its first argument, and a list as its second argument, and it returns a list. The type of its list parameter is 'a list, meaning it can be a list of any type. The domain of the (unary — count its arrows) function argument is also 'a. This means that domain of the function must be the same as the type of the list elements! The range of the function is 'b, meaning it is allowed to (but needn't necessarily) return a value of a different type. Finally, the domain of List.map is 'b list, meaning that the type of the elements of the result list must be the same as the type returned by the function argument. This makes perfect sense, since List.map applies it's function argument to each element of its list argument and returns a list of the results:

    # String.length "foo";;
    - : int = 3
    # ["hello";"sailor!"];;
    - : string list = ["hello"; "sailor!"]
    # List.map String.length ["hello";"sailor!"];;
    - : int list = [5; 7]
    #

Footnotes

In the interactive top-level, the value of a function is compiled abstract machine code (sometimes called byte code), rather than native code.