OCaml for the Skeptical

OCaml lets you write expressions in two distinct realms: the realm of *values* (which
is the realm of ordinary programming), and the realm of *types*. Part of learning
OCaml is learning the names of the built-in types, the algebra that allows you to use
*type operators* or *type constructors* to combine existing *general* types into
specialized types, and how to express completely new user-defined types.

Since OCaml uses *type inference* to deduce the types of the expressions you write,
you don't initially need to write down types in your code. But you *see* these types
(or *signatures*) right from the beginning, because the interactive top-level tells
you not just the value, but the type of every expression you evaluate. So it's good to
understand the language of types right from the beginning, so that you understand what
OCaml is telling you about your programs.

Most importantly, when you make type errors in your programs, you need to understand types so that you can understand the error messages!

How do you ask the OCaml top-level for the type of some value, say `12`? You just
evaluate an expression that results in that value (the simplest such expression being
`12` itself) and OCaml tells you not only the value but the type:

# 12;; - : int = 12 #

That's type (`int`) = value (`12`). Since functions are values, you can ask OCaml
the type of a function by evaluating an expression that results in that function, the
simplest such expression being either a lambda expression, or, if the function is bound
to a name, just the name. (To evaluate a function is *not* to apply it to some
argument!) When you do this, OCaml displays the type and value of the function.
Consider the Boolean function `not`, the list length function `List.length`, and
the function that increments an integer by one (given as a lambda):

# not;; - : bool -> bool = <fun> # List.length;; - : 'a list -> int = <fun> # fun n -> n + 1;; - : int -> int = <fun> #

There are several interesting things to note. First, why is the value of the function
given as `<fun>`? The answer is, because the value of a function is compiled
machine code^{[1]}, which might be very bulky and wouldn't make
much sense anyway if displayed on the screen. So, OCaml just displays `<fun>`
to stand in for it.

Next, we see that function types are indicated by expressions of the form ` domain
-> range`. This is explained in detail in Defining and Applying Functions.

Finally, we see that the domain of `List.length` is given as `'a list`. What does
this mean?

Type expressions can contain *type variables*, just as ordinary (value) expressions
can contain variables. In ordinary expressions, these variables stand for values, and
in type expressions they stand for types. Type variables are written as a letter
following a single-quote, e.g. `'a`, `'b`, `'c`, etc. (I pronounce these as the
nearest Greek letter, e.g. `'a` as *alpha*, `'b` as *beta*, etc.; I don't know
how common this is in the OCaml community but it's common in books and papers about
functional programming.)

The OCaml `list` type is actually not a type per se, but rather a postfix type-operator
that constructs concrete (monomorphic) list types. For example, the list `[1;2;3]`
is of type `int list`, while the list `["foo";"bar";"baz"]` is of type `string
list`. So in the type expression `int list`, the `list` type-constructor is
applied to the type `int` to yield the type `int list`.

Now, if there were only monomorphic list types, then you couldn't have a function that
would compute the length of *any* list: you'd have to define an `int_list_length`
function, and a `string_list_length` function, and so on ad infinitum.

But OCaml lets you write *polymorphic* functions that work for any type of list. The
problem for a statically-typed language is: what is the type of such a function? Well,
as we saw above, the domain of the built-in `List.length` functions is `'a list`;
the type variable `'a` can be replaced with any type, so `List.length` works for
any type of list: `int list`, `string list`, `(int -> int) list`, etc.

Some functions are polymorphic in more than one place: they use several different type
variables. Within a given type expression, all type variables must be used
consistently. So all the `'a`'s must refer to the same type, and all the `'b`'s
must refer to the same type. (These two types might be the same.) This is exactly like
simple value expressions, such as `a + b / (2 * a)`, where each `a` must be the
same value, and the `b` can be the same value or a different one.

For example, `List.map` is of type `('a -> 'b) -> 'a list -> 'b list`.
If you count the arrows (the highest-level arrows!)
you can see that this is a binary function that takes a function as its first argument,
and a list as its second argument, and it returns a list. The type of its list
parameter is `'a list`, meaning it can be a list of any type. The domain of the
(unary — count its arrows) function argument is also `'a`. This means that
domain of the function must be the same as the type of the list elements! The range of
the function is `'b`, meaning it is allowed to (but needn't necessarily) return a
value of a different type. Finally, the domain of `List.map` is `'b list`,
meaning that the type of the elements of the result list must be the same as the type
returned by the function argument. This makes perfect sense, since `List.map`
applies it's function argument to each element of its list argument and returns a list
of the results:

# String.length "foo";; - : int = 3 # ["hello";"sailor!"];; - : string list = ["hello"; "sailor!"] # List.map String.length ["hello";"sailor!"];; - : int list = [5; 7] #