module Kwmacro:sig
..end
exception Undef of string
type'a
func ='a -> string list -> string
The first parameter is arbitrary auxiliary data passed at
expansion time via the Kwmacro.Make.expand
function; the second
parameter is the macro's parameter list as parsed by the
Kwmacro.Surface.parse
function from the call point (N.B.: the zeroth
element is the macro name); the result is the string used as the
value of the macro expansion.
An example would be this macro that uppercases its single parameter (and ignores its auxiliary data); note the error checking code, which can be easily abstracted out:
fun _ ->
function
| self::p::[] -> String.uppercase p
| self::_ -> failwith (sprintf "%s: too many parameters" self)
| [] -> assert false (* implies bug in Kwmacro.Surface.parse implementation *)
module type Surface =sig
..end
module type Lookup =sig
..end
module Make:
Choose an appropriate one based on performance considerations.
module Assoc:sig
..end
module Map:sig
..end
module Hashtbl:sig
..end
Use of Kwmacro
involves several steps:
Surface
implementing itLookup
to organize your macros by
name (or, more likely, choose one of the provided ones)Kwmacro.Make
functor to these two modules to get an expand
function
Surface syntaxes generally need to have the following characteristics:
{name,foo,bar}
<?name foo bar>
Note that if you use spaces as parameter separators, then you may
have trouble with spaces in parameter values. In general, simple
regular expression-based Surface.parse
implementations may fall
down if you need arbitrarily complex parameter values.
For this example, let's assume a surface syntax that looks like C function calls. So in this string:
Hello foo(1,bar) and (don't worry) zap()
the parenthesized (don't worry)
isn't a macro call, but
foo(1,bar)
is one with two parameters and zap()
is one with no
parameters.
Typically you'll use a regular expression module like Pcre
or
Str
to handle your parsing and substituting, but this is by no
means required; you could use ocamllex
or a hand-written string
parser. We'll use Pcre
in our example.
module C : Surface =
struct
let rex = Pcre.regexp ~study:true ~flags:[`CASELESS] "([a-z0-9]+)\\((.*?)\\)"
let parse call =
let subs = Pcre.exec ~rex call in
let name = Pcre.get_substring subs 1 in
let parms = Pcre.get_substring subs 2 in
if parms = "" then [name] else name :: (Kwstring.split "," parms)
let substitute subst str = Pcre.substitute ~rex ~subst str
end
Our regexp rex
is a weak attempt at matching vaguely C-style "function
calls". Note that the regexp doesn't worry about distinguishing the
parameters from one another; it leaves that for the parse
function. The regexp just uses a non-greedy subexpression, which
means that nested macro calls aren't going to work (regexps are bad
at nesting, and any attempt at nested macro calls will probably need
an alternative implementation technique). We do, however, exploit
Pcre
's support for parenthesized sub-expressions to distinguish
between the macro name and the bulk of the parameters.
Our parse
extracts the two sub-expressions and then uses
Kwstring.split
to separate the parameters between commas. Note
that there's no support for quoting commas in parameter values!
The substitute
function does all the heavy lifting. Its first
parameter, a function (from string -> string
), whose parameter is
a macro call in textual form, returns the expansion of that call.
This function is actually implemented by Kwmacro.Make
, using
Surface.parse
. The substitute
function itself simply assumes it
will be invoked with such a function and needs to both find all the
occurrences of macro calls in the given string and know how to
replace them with their expansion (whatever it is).
This is exactly what Pcre.substitute
does, given a suitable
regexp.
It's worth noting that, if you are using Pcre
, the implementation
of substitute
is almost always going to be exactly as in this
example. Likewise there will typically only be one paradigmatic
example of substitute
if you're using Str
.
Lookup
Module
Rather then implement a Lookup
module, we'll just pick one of the
provided ones. You're already familiar with the time/space trade-offs
of association lists, applicative maps from the Map
module, and
hash tables from the Hashtbl
module, so choose appropriately.
I like to use Kwmacro.Assoc
(association lists)
for most cases, since the number of macro definitions is usually
quite small.
Kwmacro.Make
Functor
We obtain our M.expand
function like so:
module M = Kwmacro.Make (C) (Kwmacro.Assoc)
One of our macros will be used to do variable expansion. It will take one parameter, a variable name, and expand it from the environment passed as auxiliary data in the form of an a-list.
let syntax self parms = (* error function for invalid macro calls *)
Printf.sprintf "BAD MACRO SYNTAX: %s(%s)" self (String.concat "," parms)
let var env = function
| self::varname::[] -> (try List.assoc varname env with Not_found -> "UNKNOWN")
| self::parms -> syntax self parms
| [] -> assert false
Note that, in general, each macro definition ought to handle errors in its invocation. Whether to raise an exception or return an error message as the expansion is up to you.
One more macro will expand to the current time with a fixed format
(a fancier version might take a strftime
format as a parameter).
let time env = function
| self::[] ->
let tm = Unix.localtime (Unix.time ()) in
sprintf "%d:%d" tm.Unix.tm_hour tm.Unix.tm_min
| self::parms -> syntax self parms
| [] -> assert false
Note that, since they are going in the same Lookup
table, these
two definitions must both take the same type of auxiliary data (even though
time
doesn't use it).
Finally, we create the lookup table defs
from an alist mapping macro names
to functions:
let defs = Kwmacro.Assoc.create ["time", time; "var", var]
Before we can start expanding, we need a suitable default function: we'll let ours raise an error:
let default _ = function
| self::_ -> failwith (sprintf "unknown macro: %s" self)
| [] -> assert false
Now let's define our auxiliary data as an a-list from the process's environment:
let aux =
Array.fold_left
(fun acc binding ->
match Kwstring.split "=" binding with
| [] -> assert false
| name::rest -> (name, String.concat "=" rest)::acc)
[] (Unix.environment ())
And finally, let's do an expansion:
# M.expand default defs aux "hello it's time() on var(HOSTNAME)";;
-- : string = "hello it's 23:4 on jfcl.lib.uchicago.edu"
#
Bogus macro names (i.e. those not in the lookup table) will call default
and
so raise an exception:
# M.expand default defs aux "bad macro bogus()";;
This example is intended to illustrate exactly what the functions in your modules are doing by giving minimal (and hence contrived) implementations.
For our syntax, we'll just consider a fixed, parameterless string as
the sole macro call: FOO
. Any occurrences of FOO
in the text
will be replaced with a different value. We'll be using BAR
as our
expansion, but this doesn't come up until the next section.
Surface
implementations are solely conerned with the syntax of
macro calls, not with what they expand to.
module Foo : Surface =
struct
let rex = Pcre.regexp ~study:true "FOO"
let parse _ = ["FOO"]
let substitute subst str = Pcre.substitute ~rex ~subst str
end
We define a trivial regexp to match the macro call FOO
.
The parse
function takes one parameter, the matched text of a
given macro call as a string, and returns a list of the parsed-out
parameters of the macro call (which you can think of as its argv): the list is always of length > 0, because the zeroth
parameter is the name of the macro (though the name could
legitimately be the empty string).
A macro call typically has some sort of syntax for passing macro
parameters, but not in this example, so our parse
function ignores
its parameter (which, as the regexp shows, will always be "FOO"
)
and just returns a minimal argv, the list containing the name of
the macro. In this example, it doesn't matter what the name is, but
we'll make it "FOO"
anyway. A more robust version of parse would
be:
let parse call = assert (call = "FOO"); [call]
Our substitute
function is the same as in our realistic example, because
we're using Pcre
.
Lookup
Module
For the sake of example, we'll (artificially) implement a
Lookup
module that exploits the highly restrictive nature of our
(fixed, parameterless) macro calls.
module Table : Lookup =
struct
type 'a table = 'a Kwmacro.func
let create _ = fun _ _ -> "BAR"
let lookup table _ = table
end
The 'a table
type is usually some data structure containing macro
names mapped to 'a Kwmacro.func
's. The way we've defined
Foo.parse
, the only macro name that we'll ever get is "FOO"
, so
we can use the most trivial mapping of all: a single 'a Kwmacro.func
.
The create
function takes an a-list expressing the mapping from
macro names to 'a Kwmacro.func
's, but ours can just ignore this parameter
and hard-wire our single macro implementation.
A macro implementation is an 'a Kwmacro.func
, i.e. a function of two
parameters: the first, auxiliary data passed to Make.expand
, and
the second, a parsed argv parameter list for the macro call.
In our restricted example, our macro implementation can just ignore
both of these parameters, and simply return our replacement string.
We'll return "BAR"
.
The lookup
function normally takes an 'a table
and a macro name and
returns the 'a Kwmacro.func
that implements that macro, but in our case,
the 'a table
is our single 'a Kwmacro.func
, so we can ignore the macro
name and just return the table.
Kwmacro.Make
Functor
Now that we have our Surface
and Lookup
implementations, we can
pass them to the Kwmacro.Make
functor to obtain our expand
function:
module M = Kwmacro.Make (Foo) (Table)
Due to the restricted nature of our example, this step is already accomplished!
Now we can expand FOO
macros in strings to replace the FOO
's
with BAR
's.
The expand
function takes several parameters: the first is a
default function (an 'a Kwmacro.func
) to call when an unknown macro is
encountered. This assumes that there's a syntactic way of
recognizing macro calls with different names, but in our example the
only macro call is FOO
, so we can't encounter any other names. As
a result, our default function can be:
(fun _ _ -> assert false)
expand
's second parameter is a lookup table of macro
implementations. It's often useful to be able to switch between
different collections of macros for different situations, but we'll
just use our only possible collection, the unvarying return value of
Table.create
.
The third parameter of expand
is auxiliary data.
This is data that may vary dynamically and be independent of any
given macro implementation. For example, it might be some CGI data
from a web application (like a conglomeration of GET / POST
parameters, the referer URI, etc.) some data suitable for debugging,
rows of results from an SQL query, etc. This parameter means that
the macro implementations don't have to use global variables for
such info. But again, in our example, we're ignoring this.
The final parameter is the string in which to perform the macro
expansions. In our case, in order to be interesting it needs to
contain some FOO
's.
M.expand (fun _ _ -> assert false) (Table.create []) () "hello FOO how are FOO?"
The result is: "hello BAR how are BAR?"
A handy trick is to take a page from sed
's book, and allow the
parameter separator to be determined on a per-call basis, as with
sed
's s///
command, which can also be written e.g. s;;;
if the
parameters include slashes. While this is perhaps not as good as a
clean quoting scheme, it's easy to achieve with regexps.