OCaml for the Skeptical
U of C Library DLDC Informal OCaml Class

The Syntax of Definitions

Definitions are ways to give names to things: values, types, exceptions, modules, classes, etc.

let — Naming Values

OCaml has one uniform construct for giving names to values: the let definition. There are two major let variants: the global definition and the local definition. The global definition is just syntactic sugar for one particular use of the the local definition, so let's consider the latter first.

The (simplest) let syntax is:

    let name = expr1 in expr2

and the semantics are: give expr1 the name name within expr2. The name (because it's a value name) needs to begin with an initial lowercase letter. That's it: all you need to know about naming values in OCaml! Some examples:

Legend: let name = expr1 in expr2
    # let n = 2 in n * n;;
    - : int = 4
    # let list = ["foo";"bar";"baz"] in List.hd list;;
    - : string = "foo"
    #

These two examples don't illustrate why you'd want to use let's. Suppose you're about to write a complex expression that contains repeated instances of some expensive (in CPU cycles) subexpression, or just a subexpression that's complex, tedious to write (and thus to maintain), and easy to make typos in when you type it repeatedly. Such an expression is a candidate for expr1 in a let:

    let x = long complex expensive expression in [x,x;x,x]

Here are some details. The crucial thing to understand is that name is only defined in expr2; that's why we call this a local definition: the name is local to the body of the let.

The next thing to understand is that the entire let definition is itself an expression. The value of a let definition is the value of it's body (i.e. expr2). You can therefore use a let anywhere OCaml is expecting an expression.

    # 40 = (let n = 2 in n + n) * 10;;
    - : bool = true
    #

Notice the parentheses around the let; parens are not part of the syntax of let — they're part of the syntax of expressions and we can use them around any expression to change the precedence of operators or to disambiguate. We need them here only to indicate that we don't mean n + n * 10 which would parenthesize as n + (n * 10) (because * has a higher precedence than +); without the parens, the value of the whole expression would be different:

    # let n = 2 in n + n * 10;;
    - : int = 22
    #

This is because a let definition is like a syntactially fancy operator that takes three operands (name, expr1 and expr2) and has a lower precedence than the arithmetic operators. You can also say this (somewhat more ambiguously) as: the body of a let extends "as far as possible". If you are confused about this, you can religiously put parens around the body of your let — e.g. let n = 2 in (n + n) — (and OCaml even allows you to use begin and end (anywhere) in place of parens e.g. let n = 2 in begin n + n end), but you will look like a tyro if you don't get used to the precedence rules.

One of the places that OCaml expects an expression is as the body of a let (i.e. expr2). This means that you can nest your let's. This is fundamental to OCaml programming. Suppose you want to bind two variables and use them together. OCaml doesn't need any special syntax to express this: just nest your let's:

    # let a = 2 in
	let b = 3 in
	  a + b
      ;;
    - : int = 5
    #

To understand this, just work from the inside out. The innermost let's body is the expression a + b. To evaluate this, we need to know the value of a and the value of b and add them together. To find these values we look for the nearest surrounding binding, which is that established by let b = 3; that gives us a value for b. But we still need a value for a. So we expand our search and find that the entire inner let is itself within the body of another let which establishes the binding let a = 2; now we have all the values we need to evaluate the expression and get 5.

This brings up a deep topic that is fundamental to the understanding of any programming language: scope rules. OCaml has full nested scope: nested let's establish bindings that are all visible in the innermost let body (unless they are shadowed — see below). This was the norm in programming languages from about 1960 until the C programming language became popular and more or less disallowed it. Many languages (especially scripting languages) defined after C followed this "innovation". So for example Python[1] and Tcl[2] both have a two-level scope rule: there is only the innermost local scope and the outermost global scope; there are no bindings in between.

So what's shadowing? If a name has a previous binding (in an outer scope) and you bind that same name again, the old binding is shadowed inside the let, and only the new binding is visible. But the old binding isn't destroyed: once we have evaluated the let and we are back outside of it, the old binding is again visible. (This is exactly how function parameters and local variables work in most languages: the scope of local variables is the function body, they shadow globals of the same name, and once the function terminates any like-named globals are again visible.)

    # let a = 1 in
	a + let b = 2 in
	      let a = 3 in
	        a + b
      ;;
    - : int = 6
    #

Notice that we refer to (as opposed to bind) the name a twice. In the first case its value is 1, because only the first binding of a is active at that point. In the second case, the value of a is 3. The value of a doesn't change: they are two completely different names (that happen to be spelled exactly the same)!

Another place that OCaml expects an expression is in expr1 i.e. the value that you bind to the name in your let. This means that you can use a let for expr1 just as you can for expr2. This is not as exotic as it might seem, though my examples are somewhat pointless.

    # let a = let b = 2 in b + b * b in
	a + a
      ;;
    - : int = 12
    #

Note that there's no need for parens, but since this is not very idiomatic, I would use them to make things clearer:

    # let a = (let b = 2 in b + b * b) in
	a + a
      ;;
    - : int = 12
    #

Global Definitions

All the let definitions we've seen so far have been local definitions, but OCaml also supports global definitions. To understand these, you need to think ahead and picture the structure of a complete OCaml program. A complete program is really just one expression (which computes the "value" you're interested in (of course, since OCaml isn't a purely functional language, you may really be interested in a side-effect of the expression, which might be printing, the execution of a GUI, or the months-long running of a web server, say). Due to the substantial size of real-world programs, this one expression is typically a let expression, and in fact is typically a big deeply nested let expression. Even in a fairly small program you may be nesting 25 or 50 let's.

This nesting can get fairly tedious (even if you don't indent your nested let's) so OCaml has some syntactic sugar for this: the global definition. A global definition is just a let without an in and hence no body. But if there were really no body for the name to be defined in, there would be no point in the defining the name. Really, the body of a global definition is implicit and extends to the end of your program file (or interactive session), except where temporarily shadowed. So this sequence of global definitions:

    # let a = 12;;
    val a : int = 12
    # let b = 5;;
    val b : int = 5
    # let c = a + 2 * b;;
    val c : int = 22
    # c;;
    - : int = 22
    #

(again, double semicolons are only necessary in an interactive session) is exactly the same as:

    # let a = 12 in
	let b = 5 in
	  let c = a + 2 * b in 
	    c
      ;;
    - : int = 22
    #

except that after the local definition above, none of a, b, nor c are defined, whereas after the global definitions, all three remain defined.

Another difference: global definitions don't have a value: they are only definitions, not expressions! Note the difference here where I try to use both forms inside a list:

    # [let n = 2 in n];;
    - : int list = [2]
    # [let n = 2];;
    Syntax error
    #

let rec — Recursive let

In interactive sessions with the top-level, you frequently type a sequence of global definitions and redefine bindings as you go (this makes more sense with function definitions than with my silly examples!):

    # let a = 12;;
    val a : int = 12
    # (* oops, I meant the list containing 12! *) 
      let a = [12];;
    val a : int list = [12]
    # (* oops I meant the list containing 12 and 8! *)
      let a = [12; 8] 
      ;;
    val a : int list = [12; 8]
    #

(You might think that this should be a type error, since a's type seems to change: first it's int, then int list. But remember that global defintions are just syntactic sugar for nested let's, so the above is the same as: let a = 12 in let a = [12] in let a = [12; 8] in ..., and as we pointed out above, those three a's are completely different, unrelated names!)

Sometimes you even redefine a binding in terms of the previous value of the binding:

    # (* oops I meant the tuple containing that list and the string "foo" *)
      let a = a, "foo"
      ;;
    val a : int list * string = ([12; 8], "foo")
    #

This last example brings up an interesting question: why does that a in the tuple refer to the previous value of a? Why doesn't it refer, recursively, to itself? The answer is that OCaml doesn't allow recursion in data values[3] but only in function definitions, and I guess recursive function definitions were deemed to be less common than interactive redefinitions of functions in terms of previous versions. So non-recursive let is the default. (This strikes me as a strange default for a functional language, though.)

If you want a recursive let you have only to type let rec instead. We'll see this used with functions below.

Naming Functions

Functions are values in OCaml, so since you already know how to write a function value as a lambda expression with fun, and since now you know how to bind values to names with let, you already know how to name functions: let inc = fun n -> n + 1. However, as you'd expect with a functional language, there's a lot more to defining functions that this. For more details see Defining and Applying Functions.

type — Naming Types

The type definition for giving names to types is described in User-Defined Types.

exception — Naming Exceptions

The exception definition for giving names to exceptions is described in Exception Handling.

Footnotes

  1. At least, this was true of Python v1.x, when I last used it; scope rules may have changed in v2.x.
  2. Yes, I'm aware that Tcl has commands that allow access to intermediate scopes, but this direct access to the interpreter's data structures is very different from what's traditionally thought of as scope.
  3. Actually, OCaml does allow recursive data values under certain restricted conditions; see Recursive Values.