OCaml for the Skeptical: Preliminary Syntax Issues

There are some issues of OCaml syntax that it's best to be aware of from the start. Some of these details won't make sense to you yet; just try to absorb the generalities for now.

Whitespace, Indentation and Line Endings

Back in the old days of programming, the earliest languages had issues with lines in source code. Typically, each assembler instruction had to be on a separate line, and one instruction couldn't be split across two lines either. FORTRAN worried about not just line boundaries but column boundaries! BASIC required you to number your lines!

Fortunately, after 1960 or so, languages got over these fiddley concerns. Languages like Algol 60 and Lisp both let you break lines wherever you liked; the beginnings and endings of things were indicated by keywords and the parsers innate knowledge of the syntax of the language.

Then, with the advent of so-called "scripting languages", much of this freedom was lost! Simplistic lexers and parsers resulted in languages that worried about line boundaries again. People put up with typing continuation characters (typically a backslash) at the end of a line to help their pathetic parser figure out that their expression wasn't finished. A new dark age began.

Fortunately, OCaml is old-school when it comes to these issues. Outside of string and character literals (of course), you can make liberal use of whitespace, indent your code (or not) as you like, and type expressions like this:

    # 16 +
      3
	      * 
	 8
      ;;
    - : int = 40
    #

if you like. It's a syntactic renaissance!

Comments

OCaml uses the Pascal-style comment syntax. Comments begin with (* and end with *). For example:

    (* this is a comment *)

Advantages of this syntax include:

can go anywhere on the line, even in the middle of expressions, e.g. 2 + (* addition *) 3
can span multiple lines, without having to repeat comment indicators on each line
comments can nest, making it easy to comment out blocks of code

Disadvantages are that it reminds me of Pascal...

OCaml Names

There are many things in OCaml that can be named (values, types, modules, classes, etc). All names are case-sensitive, and case is also used in some instances to distinguish different classes of names. Names of values (i.e. the usual data types: integers, strings, lists) must begin with an initial lowercase letter. Types, record field names, class and method names are also initial lowercase. Variant constructors, exception names, and module names must begin with an initial uppercase letter. I'll mention these restrictions as each one comes up, so you don't need to memorize it all now. See Names in the OCaml Manual for a complete list.

Besides these restrictions, names can consists of alphabetics, numerics, underscores (_), and single-quotes ('). I particularly like using single-quotes as trailing primes in names; you will frequently see OCaml'ers defining local helper functions this way; if for example the function name is fac, a local helper function might be named fac'.

You can also use certain symbols as names by writing them within parentheses when defining them. For example, the string concatenation operator is infix ^; you can define infix + to concatenate two strings like so:

    # let (+) a b = a^b;;
    val ( + ) : string -> string -> string = <fun>
    # "foo" + "bar";;
    - : string = "foobar"
    #

even though + normally means addition of integers (this definition locally overrides that old meaning of +!). Not all symbols can be used this way; see Names in the OCaml Manual for details. Don't fret about this: you're never required to use symbols for names!

There is one name that is special: _. This name can be bound to a value (in a let definion or a pattern match, but can never be evaluated (that is, used to get that value back). It's kind of like /dev/null in the Unix file system: you can write data into it, but you can never read it back; it's not even syntactically legal to use the name _ in a position where you are evaluating it:

    # let _ = 2 + 3;;
    - : int = 5
    # if false then _ + 1 else 0;;
    Syntax error
    #

The utility of this is that frequently (especially in pattern matches) you need to give a name where you aren't interested in the value (in a let definition, this might be because you're evaluating an expression only for a side-effect). Using _ in these cases explicitly tells the compiler that it can throw away this value, which may save memory and may even save CPU cycles in some cases. Since _ can never be evaluated, you can safely redefine it repeatedly in the same context if you have more than one value to ignore, e.g. let _ = ... and _ = ... and _ = ... in ....

Definition Before Use

In Ocaml, you must define names before you use (reference) them. So, if you write a function foo which calls a function bar, you must define bar before you define foo. This applies in the interactive top-level, in individual source files, and in the order in which you link separately compiled source files (though there are tools to figure the latter out for you automatically). Issues of mutual recursion are covered in Defining and Applying Functions.