OCaml for the Skeptical: Pattern Matching

Pattern matching comes up in several places in OCaml: as a powerful control structure combining a multi-armed conditional, unification, data destructuring and variable binding; as a shortcut way of defining functions by case analysis; and as a way of handling exceptions.

The pattern matching control structure is match expr with pattern-matching. The pattern-matching part of the match is a sequence of clauses, each one of the form: pattern -> expr, separated by vertical bars (|). The clauses are processed in order, and only the expr of first matching clause is evaluated. The value of the entire match expression is the value of the expr of the matching clause; if it's possible for no clause to match, your match is said to be non-exhaustive and when a match fails it raise the exception Match_failure. Sometimes, the parser can tell in advance, at compile-time, that a match is non-exhaustive, and will give you a warning (see below), but it's not possible to detect this in all cases. The syntax of patterns is explained in the manual.

Simplifying somewhat, patterns are either constants, like 12 or "foo" or [] (the empty list constant), or pattern expressions built from constructors like :: (the list constructor) or [1;2] (using the [...] syntax as a constructor). Variables can also be used as wildcards to match parts of data structures, and these variables will be bound to the matching parts and are available in the expr part of the pattern match.

The simplest example is probably the use of match as a multiway conditional like C's switch. Suppose we want to replace this series of nested if's:

    # let f x = 
	if x = "foo"
	then "it is foo"
	else if x = "bar"
	then "bar is the one"
	else if x = "zap"
	then "totally different"
	else "our default: " ^ x
      ;;
    val f : string -> string = <fun>
    # f "hello";;
    - : string = "our default: hello"
    # f "bar";;
    - : string = "bar is the one"
#

with a more compact and readable match:

    # let f x =
	match x with
	  "foo" -> "it is foo"
	| "bar" -> "bar is the one"
	| "zap" -> "totally different"
	| _     -> "our default: " ^ x
      ;;
    val f : string -> string = <fun>
    # f "hello";;
    - : string = "our default: hello"
    # f "bar";;
    - : string = "bar is the one"
    #

Note our use of the magic variable _ as a wildcard; we don't need to use the variable in the expr part of that pattern since we can just as well use x.

More interesting is to use match with complex data structures. In most languages, when you define a data structure, you need to define a set of helper functions to manipulate that data structure, both for ease-of-use and information hiding. (In an O-O language, you might do this by defining the data structure as a class and the helper functions as methods.) These functions classically include a set of predicates to distinguish between the different disjuncts of a union (e.g. Lisp's null to tell the empty list from non-empty lists) and a set of selectors to extract the components of compound data (e.g. Lisp's car and cdr to select the head and tail of a cons (non-empty list)). You can define such functions in OCaml (and they're often useful) but more often you just use pattern matching.

Here's a function from the List module (slightly modified) that implements list membership; that is, it does a linear search for an item in a list, returning true if the item is found and false otherwise:

    # let rec mem x list =
	match list with 
	  [] -> false
	| hd::tl -> hd = x || mem x tl
      ;;
    val mem : 'a -> 'a list -> bool = <fun>
    # mem 3 [1;2;3;4;5];;
    - : bool = true
    # mem 12 [1;2;3;4;5];;
    - : bool = false
    #

The first pattern is the constant []; since this indicates an empty list, we return false: no x can be a component of the empty list. The second pattern is based on the list constructor ::, and uses a variable for each of the constructor's arguments: hd for the head of the list (some value of type 'a), and tl for the tail of the list (a value of type 'a list). (The variable names are completely arbitrary. This pattern will match any non-empty list, and in so doing, bind hd to the head of the list and tl to the tail in the pattern's expr. The expr in this case is a Boolean or: either hd = x, in which case x is indeed a member of the list and we return true, or else we return the result of testing the membership of x in the tail of the list, via a recursive call.

This function illustrates programming by case analysis: we use a pattern match that's structured just like the structure of the data itself. In the case of lists, every list is either the empty list or consists of a head (of some type 'a) and a tail, which is an 'a list. Via case analysis, some functions practically write themselves.

Patterns can also incorporate conditions or guards that impose restrictions on the match; these guards are introduced by the keyword when with the syntax: pattern when expr1 -> expr2. expr1 is a Boolean expression (it can reference variables bound by the pattern) and the entire clause matches only if the guard is true. Here's a rewrite of the mem function using a guard:

    # let rec mem x list =
	match list with 
	  []                -> false
	| hd::_ when hd = x -> true
        | _::tl             -> mem x tl
      ;;
    val mem : 'a -> 'a list -> bool = <fun>
    #

Non-Exhaustive Pattern-Matches

OCaml can sometimes detect non-exhaustive patterns and warn you about them. Let's define a function hd to return the head of a list; such a function is undefined on an empty list, so we leave that case out:

    # let hd list =
	match list with
	  hd::_ -> hd
      ;;
    Warning: this pattern-matching is not exhaustive.
    Here is an example of a value that is not matched:
    []
    val hd : 'a list -> 'a = <fun>
#

We could just not worry about this, feeling that if the user tries to take the head of an empty list, that's their lookout:

    # hd [];;
    Exception: Match_failure ("", 16, 2).
    #

But a better approach is to always write exhaustive pattern matches and supply a more precise exception for the otherwise-missing cases:

    # let hd list = 
	match list with
	  [] -> raise (Invalid_argument "hd is not defined on []")
	| hd::_ -> hd
      ;;
    val hd : 'a list -> 'a = <fun>
    # hd [];;
    Exception: Invalid_argument "hd is not defined on []".
#

Pattern Matching in Lambdas

OCaml has syntactic sugar that allows you to do pattern matching directly in lambda expressions. This is never necessary, because you can always ignore it and use match directly, but it can lead to more concise code. You can do this with the function keyword; the syntax is: function pattern-matching and it's exactly equivalent to fun x -> match x with pattern-matching. It saves you from pointlessly naming a function parameter and then just immediately matching on it. It can of course be used as a lambda but you most often see it used in function definitions. Here are two equivalent function definitions:

    let rec length list =
      match list with
	[] -> 0
      | _::tl -> 1 + length tl

    let rec length = function
	[] -> 0
      | _::tl -> 1 + length tl

These functions compute the length of a list. With the latter, you can tell that's it's a function of one argument because function only defines unary functions. You can also tell that the argument is intended to be of type list by examining the patterns.

You can also use function to define functions of more than one argument, but only when you want to use the pattern matching on the last parameter (and you can only do the pattern matching on that one parameter; of course, you can freely apply match to other parameters in any of the function clauses). Here's the two-argument mem function recast with function:

    # let rec mem x = function
	  [] -> false
	| hd::tl -> hd = x || mem x tl
      ;;
    val mem : 'a -> 'a list -> bool = <fun>
    #

You can tell this is a function of two arguments by counting the named parameters (in this case, just x) and adding one for the anonymous parameter implied by function.

If you find the above to be confusing, it is just syntactic sugar and you can stick to explicit match's. But one idiom involving pattern matching in function definitions particularly useful. A one-armed pattern match can be used for its de-structuring ability to pick apart data. Assuming in this case that we don't care about the non-exhaustive pattern warning, we could define our hd function more compactly:

    let hd (x::_) = x

This is syntactic sugar on top of syntactic sugar! Any definition of the form let name = function pattern -> expr (when there's only one clause in the pattern match) can be replaced with let name pattern = expr.