Module Kwmacro

module Kwmacro: sig .. end
Simple parameterized macros for text strings.
Author(s): Keith Waclena See the {!tutorial} below.

exception Undef of string
exception raised when an undefined macro name is encountered
type 'a func = 'a -> string list -> string 
the type of the function that implements a macro

The first parameter is arbitrary auxiliary data passed at expansion time via the Kwmacro.Make.expand function; the second parameter is the macro's parameter list as parsed by the Kwmacro.Surface.parse function from the call point (N.B.: the zeroth element is the macro name); the result is the string used as the value of the macro expansion.

An example would be this macro that uppercases its single parameter (and ignores its auxiliary data); note the error checking code, which can be easily abstracted out:

fun _ ->
  function
    | self::p::[] -> String.uppercase p
    | self::_ -> failwith (sprintf "%s: too many parameters" self)
    | [] -> assert false (* implies bug in Kwmacro.Surface.parse implementation *)

module type Surface = sig .. end
The type of surface syntax modules.
module type Lookup = sig .. end
The type of macro lookup modules.
module Make: 
functor (S : Surface) ->
functor (L : Lookup) -> sig .. end
Functor to implement a macro expander from a given surface syntax and lookup module.

Convenience lookup module implementations

Choose an appropriate one based on performance considerations.

module Assoc: sig .. end
lookup module using association lists
module Map: sig .. end
lookup module using applicative Maps
module Hashtbl: sig .. end
lookup module using imperative Hash tables.

Tutorial

Use of Kwmacro involves several steps:

  1. decide on a surface syntax for your macro calls (in other words, what do macro calls look like?) and implement a module of type Surface implementing it
  2. implement a module of type Lookup to organize your macros by name (or, more likely, choose one of the provided ones)
  3. apply the Kwmacro.Make functor to these two modules to get an expand function
  4. implement one or more macros and store them in one or more lookup tables
  5. start expanding!
I should say that this module is much harder to explain than it is to use (or than it was to implement).

A Realistic Example

1. Implementing a Surface Syntax

Surface syntaxes generally need to have the following characteristics:

  1. allow recognizable macro calls that won't conflict with the "normal" text in the strings to be expanded
  2. make the name of the macro distinct in each call
  3. allow for multiple parameters per call, with various macros taking varying numbers of parameters per macro, and some macros taking optional or variadic parameters
  4. make it clear where each parameter begins and ends
Syntaxes that I've used have included:

(The latter being "processing instructions" in SGML or XML.)

Note that if you use spaces as parameter separators, then you may have trouble with spaces in parameter values. In general, simple regular expression-based Surface.parse implementations may fall down if you need arbitrarily complex parameter values.

For this example, let's assume a surface syntax that looks like C function calls. So in this string:

Hello foo(1,bar) and (don't worry) zap()

the parenthesized (don't worry) isn't a macro call, but foo(1,bar) is one with two parameters and zap() is one with no parameters.

Typically you'll use a regular expression module like Pcre or Str to handle your parsing and substituting, but this is by no means required; you could use ocamllex or a hand-written string parser. We'll use Pcre in our example.

  module C : Surface =
  struct
    let rex = Pcre.regexp ~study:true ~flags:[`CASELESS] "([a-z0-9]+)\\((.*?)\\)"
    let parse call =
      let subs = Pcre.exec ~rex call in
      let name = Pcre.get_substring subs 1 in
      let parms = Pcre.get_substring subs 2 in
	if parms = "" then [name] else name :: (Kwstring.split "," parms)
    let substitute subst str = Pcre.substitute ~rex ~subst str
  end
  

Our regexp rex is a weak attempt at matching vaguely C-style "function calls". Note that the regexp doesn't worry about distinguishing the parameters from one another; it leaves that for the parse function. The regexp just uses a non-greedy subexpression, which means that nested macro calls aren't going to work (regexps are bad at nesting, and any attempt at nested macro calls will probably need an alternative implementation technique). We do, however, exploit Pcre's support for parenthesized sub-expressions to distinguish between the macro name and the bulk of the parameters.

Our parse extracts the two sub-expressions and then uses Kwstring.split to separate the parameters between commas. Note that there's no support for quoting commas in parameter values!

The substitute function does all the heavy lifting. Its first parameter, a function (from string -> string), whose parameter is a macro call in textual form, returns the expansion of that call. This function is actually implemented by Kwmacro.Make, using Surface.parse. The substitute function itself simply assumes it will be invoked with such a function and needs to both find all the occurrences of macro calls in the given string and know how to replace them with their expansion (whatever it is).

This is exactly what Pcre.substitute does, given a suitable regexp.

It's worth noting that, if you are using Pcre, the implementation of substitute is almost always going to be exactly as in this example. Likewise there will typically only be one paradigmatic example of substitute if you're using Str.

2. Implementing a Lookup Module

Rather then implement a Lookup module, we'll just pick one of the provided ones. You're already familiar with the time/space trade-offs of association lists, applicative maps from the Map module, and hash tables from the Hashtbl module, so choose appropriately. I like to use Kwmacro.Assoc (association lists) for most cases, since the number of macro definitions is usually quite small.

3. Applying the Kwmacro.Make Functor

We obtain our M.expand function like so:

module M = Kwmacro.Make (C) (Kwmacro.Assoc)

4. Implementing Macro Definitions and Storing Them in Lookup Tables

One of our macros will be used to do variable expansion. It will take one parameter, a variable name, and expand it from the environment passed as auxiliary data in the form of an a-list.

  let syntax self parms =  (* error function for invalid macro calls *)
    Printf.sprintf "BAD MACRO SYNTAX: %s(%s)" self (String.concat "," parms)

  let var env = function
    | self::varname::[] -> (try List.assoc varname env with Not_found -> "UNKNOWN")
    | self::parms -> syntax self parms
    | [] -> assert false
  

Note that, in general, each macro definition ought to handle errors in its invocation. Whether to raise an exception or return an error message as the expansion is up to you.

One more macro will expand to the current time with a fixed format (a fancier version might take a strftime format as a parameter).

  let time env = function
    | self::[] ->
      let tm = Unix.localtime (Unix.time ()) in
	sprintf "%d:%d" tm.Unix.tm_hour tm.Unix.tm_min
    | self::parms -> syntax self parms
    | [] -> assert false
  

Note that, since they are going in the same Lookup table, these two definitions must both take the same type of auxiliary data (even though time doesn't use it).

Finally, we create the lookup table defs from an alist mapping macro names to functions:

  let defs = Kwmacro.Assoc.create ["time", time; "var", var]
  

5. Start Expanding!

Before we can start expanding, we need a suitable default function: we'll let ours raise an error:

  let default _ = function
    | self::_ -> failwith (sprintf "unknown macro: %s" self)
    | [] -> assert false
  

Now let's define our auxiliary data as an a-list from the process's environment:

  let aux =
   Array.fold_left
     (fun acc binding ->
       match Kwstring.split "=" binding with
       | [] -> assert false
       | name::rest -> (name, String.concat "=" rest)::acc)
     [] (Unix.environment ())
  

And finally, let's do an expansion:

  # M.expand default defs aux "hello it's time() on var(HOSTNAME)";;
  -- : string = "hello it's 23:4 on jfcl.lib.uchicago.edu"
  #
  

Bogus macro names (i.e. those not in the lookup table) will call default and so raise an exception:

  # M.expand default defs aux "bad macro bogus()";;
  

Expanding One Fixed String

This example is intended to illustrate exactly what the functions in your modules are doing by giving minimal (and hence contrived) implementations.

1. Implementing a Surface Syntax

For our syntax, we'll just consider a fixed, parameterless string as the sole macro call: FOO. Any occurrences of FOO in the text will be replaced with a different value. We'll be using BAR as our expansion, but this doesn't come up until the next section. Surface implementations are solely conerned with the syntax of macro calls, not with what they expand to.

  module Foo : Surface =
  struct
    let rex = Pcre.regexp ~study:true "FOO"
    let parse _ = ["FOO"]
    let substitute subst str = Pcre.substitute ~rex ~subst str
  end
  

We define a trivial regexp to match the macro call FOO.

The parse function takes one parameter, the matched text of a given macro call as a string, and returns a list of the parsed-out parameters of the macro call (which you can think of as its argv): the list is always of length > 0, because the zeroth parameter is the name of the macro (though the name could legitimately be the empty string).

A macro call typically has some sort of syntax for passing macro parameters, but not in this example, so our parse function ignores its parameter (which, as the regexp shows, will always be "FOO") and just returns a minimal argv, the list containing the name of the macro. In this example, it doesn't matter what the name is, but we'll make it "FOO" anyway. A more robust version of parse would be:

let parse call = assert (call = "FOO"); [call]

Our substitute function is the same as in our realistic example, because we're using Pcre.

2. Implementing a Lookup Module

For the sake of example, we'll (artificially) implement a Lookup module that exploits the highly restrictive nature of our (fixed, parameterless) macro calls.

  module Table : Lookup =
  struct
    type 'a table = 'a Kwmacro.func
    let create _ = fun _ _ -> "BAR"
    let lookup table _ = table
  end
  

The 'a table type is usually some data structure containing macro names mapped to 'a Kwmacro.func's. The way we've defined Foo.parse, the only macro name that we'll ever get is "FOO", so we can use the most trivial mapping of all: a single 'a Kwmacro.func.

The create function takes an a-list expressing the mapping from macro names to 'a Kwmacro.func's, but ours can just ignore this parameter and hard-wire our single macro implementation.

A macro implementation is an 'a Kwmacro.func, i.e. a function of two parameters: the first, auxiliary data passed to Make.expand, and the second, a parsed argv parameter list for the macro call. In our restricted example, our macro implementation can just ignore both of these parameters, and simply return our replacement string. We'll return "BAR".

The lookup function normally takes an 'a table and a macro name and returns the 'a Kwmacro.func that implements that macro, but in our case, the 'a table is our single 'a Kwmacro.func, so we can ignore the macro name and just return the table.

3. Applying the Kwmacro.Make Functor

Now that we have our Surface and Lookup implementations, we can pass them to the Kwmacro.Make functor to obtain our expand function:

module M = Kwmacro.Make (Foo) (Table)

4. Implementing Macro Definitions and Storing Them in Lookup Tables

Due to the restricted nature of our example, this step is already accomplished!

5. Start Expanding!

Now we can expand FOO macros in strings to replace the FOO's with BAR's.

The expand function takes several parameters: the first is a default function (an 'a Kwmacro.func) to call when an unknown macro is encountered. This assumes that there's a syntactic way of recognizing macro calls with different names, but in our example the only macro call is FOO, so we can't encounter any other names. As a result, our default function can be:

(fun _ _ -> assert false)

expand's second parameter is a lookup table of macro implementations. It's often useful to be able to switch between different collections of macros for different situations, but we'll just use our only possible collection, the unvarying return value of Table.create.

The third parameter of expand is auxiliary data. This is data that may vary dynamically and be independent of any given macro implementation. For example, it might be some CGI data from a web application (like a conglomeration of GET / POST parameters, the referer URI, etc.) some data suitable for debugging, rows of results from an SQL query, etc. This parameter means that the macro implementations don't have to use global variables for such info. But again, in our example, we're ignoring this.

The final parameter is the string in which to perform the macro expansions. In our case, in order to be interesting it needs to contain some FOO's.

M.expand (fun _ _ -> assert false) (Table.create []) () "hello FOO how are FOO?"

The result is: "hello BAR how are BAR?"

A Handy Trick

A handy trick is to take a page from sed's book, and allow the parameter separator to be determined on a per-call basis, as with sed's s/// command, which can also be written e.g. s;;; if the parameters include slashes. While this is perhaps not as good as a clean quoting scheme, it's easy to achieve with regexps.