Module Restful

module Restful: sig .. end

Restful Web Framework

OCaml library code that makes it trivial to turn an application (which may itself be anywhere on the continuum from trivial to very complex) into a web service, emphasizing very rapid turnaround and highly flexible, inexpensive (single-file executable) deployment.

Restful is a high-level abstraction on top of Ocamlnet; this means you can escape from the Restful abstraction at any time and dive down into Ocamlnet to do things that aren't directly supported.

Restful mostly defines modules, so it's conventional to open Restful at the top of your code.

See the Rationale and Tutorial.

Author(s): Keith Waclena

module Types: sig .. end

Types defined by Restful.

val error : ?error:Types.error -> ('a, unit, string, 'b) Stdlib.format4 -> 'a

error fmt ...: raise a User_exn with an error message formatted as per Printf.

module Convert: sig .. end

Functions to convert QUERY_STRING parameters from strings to various other types.

module Param: sig .. end

Functions to process QUERY_STRING parameters.

module Valid: sig .. end

Handy functions to validate common types of CGI parameter values.

module Encode: sig .. end

Functions for encoding data.

module Url: sig .. end

Functions to manipulate the service's own URL.

module Pathinfo: sig .. end

Functions to manipulate PATH_INFO (including SCRIPT_NAME).

module Json: sig .. end

Functions for generating Json data.

module Content: sig .. end

Functions for generating content (the output of the service).

module Robots: sig .. end

Functions for generating responses to requests for /robots.txt.

module Favicon: sig .. end

Functions for generating responses to requests for /favicon.ico.

module Config: sig .. end

Module for handling config files in refer format.

module Auth: sig .. end

module type SERVICE = sig .. end

The type of service modules; input signature of Make.

module Service: SERVICE  with type data = unit

Default SERVICE module suitable for Make.

module type LOG = sig .. end

Type of log modules; input signature of Restful.Error.

module Log: sig .. end

Predefined LOG modules and utility functions for writing new ones.

module type ERROR = sig .. end

The type of error modules; input signature of Make.

module Error: functor (Log : LOG) -> ERROR

Default ERROR module suitable for Make.

module Make: functor (Service : SERVICE) -> functor (Error : ERROR) -> sig .. end

Functor to realize a running service, supporting the run-time choice of serveral modes.

Tutorial

Table of Contents:

The Smallest Restful Service

"Hello, World!"

Accessing CGI Parameters

Accessing Optional Parameters

PATH_INFO

Handling Errors and Exceptions

Logging

Runtime Modes

Passing Extra Data to Main at Start-up

Handling /robots.txt requests

Handling /favicon.ico requests

Installation of Restful

Compiling Restful Applications

Rationale

The Smallest Restful Service

Here is the smallest possible Restful service:

open Restful
module R = Make (Service) (Error (Log.Default))
let () = R.main ()

It demonstrates that to compile a service application, you must instantiate a module (here called R) by applying the Restful.Make functor to a module of type SERVICE and a module of type ERROR. Then you simply call R.main () to run your service.

This example uses default modules that Restful provides for you. Restful.Service defines a service that does nothing but generate a 500 server error:

    Content-type: text/plain
    Status: 500 Internal Server Error
    Cache-control: no-cache
    Pragma: no-cache
    Expires: Mon, 02 Nov 2009 18:16:30 +0000

    Unimplemented service

Nonetheless, this module is useful, as we will see in the next example.

Restful.Error is a functor that requires a module of type LOG. Here we instantiate it with Restful.Log.Default. The result is a perfectly usable, if not very sexy, error module that generates plain text error messages and logs in Common Log Format to stderr. It should be fine for actual RESTFUL services, but you might want to provide your own Restful.ERROR implementation for a more traditional CGI-style service (see Handling Errors and Exceptions for more information.).

"Hello, World!"

Here is the simplest "useful" (i.e., non-erroneous) Restful service:

open Restful

module Service =
struct
  include Service
  let main mode data cgi =
    Content.write ~content_type:"text/plain" cgi "Hello, World!\n"
end

module R = Make (Service) (Error (Log.Default))

let () = R.main ()

It implements a parameterless, idempotent service that simply returns the string "Hello, World!" as plain text.

A module of type Restful.SERVICE needs to define several values. The only value that you are required to define is the main function that actually implements your service. You can get defaults (which are at least usable for development) by including the default Restful.Service module and then overriding whichever values you like.

Your main function takes four parameters:

mode, which informs your service which Restful.Types.mode it is running in (if you are interested)
data, a parameter of arbitrary data (typically used to pass in data that you initialize once, at the time the service starts up, rather than once per request; see below)
argv, the runtime argv of the process (cleaned up to remove the leading parameters that Restful itself requires, leaving only those that you have decided to use yourself)
cgi, an Ocamlnet Netcgi.cgi

Your main can do anything you like, but typically it will call Restful.Content.write to generate content of some kind. The default content type is text/html, but you can override this in the optional content_type parameter and produce, e.g., application/json, image/png, or (as in this example) text/plain. Content.write takes two mandatory parameters:

the Ocamlnet Netcgi.cgi
a string, which ought to match your content_type

Finally, your function needs to return a Nethttp.http_status. Content.write returns its ~status parameter, so if your function ends with Content.write (as is typical), you don't need to do anything special.

After Installation of Restful, compile your app like so:

ocamlbuild minimal.byte

The result, run on the command line, is this:

    $ ./minimal cgi
    Content-type: text/plain
    Cache-control: no-cache
    Pragma: no-cache
    Expires: Tue, 27 Oct 2009 19:48:55 +0000

    Hello, World!
    $

See Runtime Modes for general information on the various ways you can run your service.

Accessing CGI Parameters

Any realistic service will need to access parameters. You can do this the Ocamlnet way by directly manipulating the cgi object, but Restful provides a higher-level abstraction that may be sufficient in many cases. Here's a service that takes two mandatory parameters that must meet certain conditions; Restful validation functions (see Restful.Valid) are used in the parameter specification to enforce these conditions:

    open Restful

    module Service =
    struct
      include Service

      open Valid open Param

      let spec = [
	"id",   Mandatory, conjunction [numeric; atleast 1; atmost 10];
	"name", Mandatory, notblank;
      ]

      let main _ _ cgi =
	let ps = process cgi spec in
	  Content.write ~content_type:"text/plain" cgi
	    (Printf.sprintf "Hello, %s #%d!\n" (value ps "name") (Convert.int (value ps "id")));
    end

    module R = Make (Service) (Restful.Error (Restful.Log.Default))

    let () = R.main ()

Param.process takes the cgi object and the parameter specification spec, assures that Mandatory parameters are present and that they meet the requirements of the validation functions, and then returns the values as an alist -- here, ps. The Param.value function is a handy way to get at Mandatory parameters in the alist.

The Convert module defines functions to convert data from strings to various Ocaml data types. The difference between Convert.int and the standard int_of_string function is that Convert.int catches any error and converts it to a nicer 400 error message.

If you run this program and don't give it the correct parameters, you will get 400 errors, e.g.:

    Missing mandatory parameter 'id'
    Invalid value for 'id': 'foo'
    Invalid value for 'id': '12'
    Missing mandatory parameter 'name'

Accessing Optional Parameters

The values in the alist returned by Param.process are actually string option's; Mandatory parameters are guaranteed to be Some's, but Optional parameters can be either Some or None. So, you can't use Param.value as above with optional parameters (attempting to do so will yield a 500 error). You need to deal with your optional parameters in one of two ways: ordinary Ocaml pattern matching, or default values.

Here we add an optional "age" parameter to the previous example, and handle it with pattern matching:

    open Restful

    module Service =
    struct
      include Restful.Service

      open Valid open Param

      let spec = [
	"id",   Mandatory, conjunction [numeric; atleast 1; atmost 10];
	"name", Mandatory, notblank;
	"age",  Optional,  conjunction [numeric; positive];
      ]

      let main _ _ cgi =
	let ps = process cgi spec in
	let comment = match optvalue ps "age" with
	  | None -> "how old are you?"
	  | Some n -> Printf.sprintf "you are %d years old." (Convert.int n)
	in
	  Content.write ~content_type:"text/plain" cgi
	    (Printf.sprintf "Hello, %s #%d, %s\n" (value ps "name") (Convert.int (value ps "id")) comment);
    end

    module R = Make (Service) (Restful.Error (Restful.Log.Default))

    let () = R.main ()

If this service is accessed with the QUERY_STRING ?id=9&name=Amber, the value returned will be

Hello, Amber #9, how old are you?

If, on the other hand, this service is accessed with the QUERY_STRING ?id=9&name=Amber&age=47, the value returned will be

Hello, Amber #9, you are 47 years old.

The Restful.Param.optvalue function simply extracts the named string option from the validated parameters list; it's just List.assoc with error handling to turn a Not_found exception into a 500 error.

Alternatively, you can use Restful.Param.value as long as you give provide a ?default value, which will be returned for a missing Optional parameter.

PATH_INFO

There are two conventional ways to parameterize your services: via QUERY_STRING or via PATH_INFO (or a combination of the two). People typically prefer the look of PATH_INFO parameterization to distinguish sub-services. The Restful.Pathinfo module provides a Unix-process-like API to the PATH_INFO and a convenient dispatcher.

If you structure your URL API like so:

http://DOMAIN/.../SERVICE/SUBSERVICE[/PARM...][?QUERY_STRING]

you can use the Restful.Pathinfo module to structure your application.

However, the whole issue of PATH_INFO is fraught with peril (see Restful.Pathinfo.dispatch for a discussion), so I recommend you resist the cosmetic temptations and stick exclusively to QUERY_STRING parameters.

If you ignore my advice and decide to use Restful.Pathinfo.dispatch, then you must implement your sub-services as functions of type Restful.Pathinfo.subservice; within these functions you can of course also access QUERY_STRING parameters.

Here is a simple example:

    open Restful
    module Service : SERVICE =
    struct
      include Service
      let debug cgi argv0 argv =
	[
	  "scriptname: " ^ argv0;
	  "pathinfo: " ^ (String.concat "; " argv);
	  "";
	]
	|> String.concat "\n"
	|> Restful.Content.write ~content_type:"text/plain" cgi
      let hello cgi _ _ =
        Restful.Content.write ~content_type:"text/plain" cgi "Hello, World!\n"
      let goodbye cgi _ _ =
        Restful.Content.write ~content_type:"text/plain" cgi "Goodbye, World!\n"
      let main _ _ cgi =
        Pathinfo.dispatch "multi" ["hello",hello;"goodbye",goodbye;"debug",debug] cgi
    end

    module R = Restful.Make (Service) (Restful.Error (Restful.Log.Default))

    let () = R.main ()

Handling Errors and Exceptions

If you use Restful.Error, the default error-handling module, any exception raised in your main function will be caught and converted into a 500 error, except for Types.User_exn, which will be converted to a 400. Errors raised by the functions in the Restful.Convert module and validation errors checked by the functions in the Restful.Param module all raise Types.User_exn. Misuses of Param.value and Param.optvalue, i.e., treating an Optional parameter as if it is Mandatory, generate 500 errors.

Simple error generation in your service module can be done by either calling Restful.error, which formats up an error message, Printf-style, and raises a User_exn, or by manually raising User_exn yourself.

The default Restful.Error module just generates simple text/plain error messages, which are probably adequate for RESTFUL services. If you're implementing a more elaborate web framework that users will see in a web browser, you can define your own error module, possibly using an HTML templating system.

Logging

The Restful.Error module is also the place to handle logging, as Restful.ERROR.handler is passed the final HTTP status and the CGI activation. The default Restful.Error implementation is paramterized on a Restful.LOG module. You can implement your own LOG module.

The default logging module, Restful.Log.Default, is extremely simple and can be used as an exemplar:

    module Default : LOG =
    struct
      let handler mode cgi status = match mode with
	| Http -> Restful.Log.clf_format status cgi |> prerr_endline
	| Cgi  -> ()
      let error fmt = Printf.ksprintf prerr_endline fmt
    end

Runtime Modes

A Restful application supports several runtime modes, depending on how it is invoked. If invoked with no parameters, it's behavior depends on the presence or absence of the mandatory CGI environment variables: if they are present, it runs as a proper CGI. This is why you can just drop it into a cgi-bin directory and expect it to work. If these env vars are absent (and the process has a controlling terminal), it runs in a CGI test mode, interactively prompting you for parameters, like for example:

    $ ./params.byte
    This is a CGI program. You can now input arguments, every argument on a new
    line in the format name=value. The request method is fixed to GET, and cannot
    be changed in this mode. Consider using the command-line for more options.
    > id=7
    > name=Eduardo
    > (Got EOF)
    (Continuing the program)
    Content-type: text/plain
    Cache-control: no-cache
    Pragma: no-cache
    Expires: Wed, 28 Oct 2009 22:11:45 +0000

    Hello, Eduardo #7!

Otherwise, a Restful app has a subcommand-style command line, like openssl. The help subcommand generates a help message like this one (suppose your application is called "myapp"):

    Usage: myapp [SUBCOMMAND OPTIONS]

      myapp http CONFIGFILE ...

        run an HTTP server providing services

      myapp cgi [VAR=VALUE ...] [-- ...]

        testing mode

      myapp

        run as a CGI under some other HTTP server

      myapp version

        display version information

      myapp help [SUBCOMMAND]

        display help

Currently, the http mode requires you to provide a config file (see documentation here), but this will eventually be optional, with sensible defaults, the most common of which will changeable via command line options.

The cgi subcommand is a non-interactive way of running in CGI test mode, where you can set parameters and such on the command line:

    $  ./myapp.byte cgi id=7 name=Eduardo
    Content-type: text/plain
    Status: 200 OK
    Cache-control: no-cache
    Pragma: no-cache
    Expires: Mon, 02 Nov 2009 19:07:41 +0000

    Hello, Eduardo #7!

To set SCRIPT_NAME and PATH_INFO from the command line, use -prop PATH_INFO=/... and -prop SCRIPT_NAME=/....

    $ ./myapp.byte cgi -help
    This program expects a CGI environment. You can simulate such an environment
    by name=value command-line arguments. Furthermore, the following options
    are recognized:
    -get                    Set the method to GET (the default)
    -head                   Set the method to HEAD
    -post                   Set the method to POST enctype multipart/form-data
    -put file               Set the method to PUT and read this file
    -delete                 Set the method to DELETE
    -mimetype type          Set the MIME type for the next file argument(s) (default: text/plain)
    -filename path          Set the filename property for the next file argument(s)
    -filearg name=file      Specify a file argument whose contents are in the file
    -user name              Set REMOTE_USER to this name
    -prop name=value        Set the environment property
    -header name=value      Set the request header field
    -help                   Output this help
    --help  Display this list of options
    $

Passing Extra Data to Main at Start-up

Sometimes your application needs to initialize some data, just once, when it starts up. For example, suppose your app takes a config file as a command-line option, and this config file needs to be validated, parsed, and converted to a data structure for your Service.main to use. Obviously, you don't want to do all this every time Service.main is called (i.e. with each query), but rather just once when your service starts up. You could do this with global variables and option-types and conditionals, but a much cleaner way is to simply perform the initialization in the Restful.SERVICE.init function, the result of which is passed to your Restful.SERVICE.main in the data parameter. Your data can be arbitrarily complex.

The only trick to this is that you must declare the type of your data in the definition of your Service module. Up to now, we have been include'ing the default Restful.Service module and redefining only the main function; we have been inheriting Restful.Service's definition of the data type, which is:

type data = unit

Here's a simple example of a service that reports how long it has been running. The init function just returns the start-up time, and the service subtracts this from the current time:

    open Restful

    module Service =
    struct
      type data = float
      let version = []
      let init _ = Unix.time ()
      let main _ t1 cgi =
	Content.write ~content_type:"text/plain" cgi
	  (Printf.sprintf "I have been running for %g seconds.\n" (Unix.time () -. t1))
    end

    module R = Make (Service) (Error (Log.Default))

    let () =
      R.main ()

Here's an elaborate example where we parse a config file in Refer format (with a primary key field called "name") into a Map and pass that data structure to Service.main.

    open Restful

    module Service =
    struct
      type data = string * Kwrefer.refermap Kwrefer.M.t
      let version = []
      let init = function
	| [keyfield;filename] -> keyfield, Kwio.with_open_in_file (Kwrefer.to_mmap keyfield) filename
	| _                   -> failwith "usage"
      let main _ (keyfield,map) cgi =
	let ps = Param.process cgi ["key", Param.Mandatory, Valid.notblank] in
	let show map =
	  let each k vs acc = Printf.sprintf "%s: %s" k (String.concat "; " vs) :: acc in
	    String.concat "\n" (Kwrefer.M.fold each (Kwrefer.M.find (Param.value ps "key") map) [])
	in
	Content.write ~content_type:"text/plain" cgi (show map)
    end

    module R = Make (Service) (Error (Log.Default))

    let () = R.main ()

Handling `/robots.txt` requests

Restful makes it easy for your service to handle /robots.txt requests. Just pick the appropriate function from the Restful.Robots module, pass it your Service.main and you get back a new function that adds /robots.txt support to your function.

Here's an example that demonstrates how assemble a robots.txt response algebraically:

    open Restful

    module Service =
    struct
      include Service
      let robots = [
	Robots.User_agent "*", Robots.Disallow "/cyberworld/map/", [
	  Robots.Disallow "/tmp/";
	  Robots.Disallow "/foo.html";
	];
	Robots.User_agent "cybermapper", Robots.Disallow "", [];
      ]
      let main' mode data cgi =
	Content.write ~content_type:"text/plain" cgi "Hello, World!\n"
      let main = Robots.some ~robots main'
    end

    module R = Make (Service) (Error (Log.Default))

    let () = R.main ()

Handling `/favicon.ico` requests

The Restful.Favicon.favicon function makes your Restful.SERVICE.main handle /favicon.ico requests, analogous to Restful.Robots and /robots.txt. Since the Favicon and Robots functions all return a new function of the same type as Restful.SERVICE.main, you can compose them to yield a service that handles both /favicon.ico and /robots.txt. Here's an example:

    open Restful

    module Service =
    struct
      include Service
      let main' mode data cgi =
	Content.write ~content_type:"text/plain" cgi "Hello, World!\n"
      let main = Robots.none (Favicon.favicon ~file:"/data/web/favicon.ico" main')
    end

    module R = Make (Service) (Error (Log.Default))

    let () = R.main ()

Installation of `Restful`

Restful has two (direct) dependencies:

Ocamlnet
Kw

I highly recommend installing them via an ocamlfind-based package manager, such as GODI.

After installing the dependencies, untar restful and just do:

gmake install

Compiling Restful Applications

These instructions assume you are using an ocamlfind-based package manager.

To compile with ocamlbuild, use a command like this:

ocamlbuild -lflag -custom MYAPP.byte

or:

ocamlbuild -lflag -static MYAPP.native

You can of course use any other compilation technique you like (OCamlMakefile, hand-written Makefile, etc).

N.B. if you compile your application to native code and statically-link it, or compile to byte code and custom-link it, you will have a single-file executable that does not require any libraries, nor even the Ocaml runtime system, to run, and which will be immune to any changes to the environment (e.g., library upgrades, etc).

Rationale

Restful is OCaml library code that makes it trivial to turn an application (which may itself be anywhere on the continuum from trivial to very complex) into a web service, emphasizing very rapid turnaround and highly flexible, inexpensive (single-file executable) deployment.

Actually the framework has no bias towards RESTFUL services: it can also be used to build traditional CGI's or entire web sites.

Every application built using the framework -- a single-file executable -- is runnable in all the following modes, simply selected via command-line arguments (or the lack thereof):

Standalone HTTP server

Every application contains its own embedded web server that can serve up the service itself. Actually, a given embedded HTTP server implementation can serve up any number of distinct services. This HTTP server is a very high-performance one, completely competitive with Apache, and since the services it runs are linked into the application, there is none of the traditional CGI-overhead (forking): the overhead is comparable to Apache mod_-interpreters (except that this HTTP server and its services run at OCaml native-code speeds).

The HTTP server is highly configurable, supporting:

Logging and log files
Binding address / port
Virtual hosts
Host-based access control (allow / deny)
Ability to easily serve up static files from some document root
Preforking parameters, max loads, timeouts
Enabling and disabling of specific services from a config file

A service can also run as a traditional CGI under any HTTP server. This can be preferable to a standalone server in certain cases, e.g. services that require fancy authentication options (passwords, LDAP) that are easily handled by Apache in .htaccess. The idea is that a service could start out as a standalone HTTP server, and then suddenly require access restrictions, which can be trivially satisfied by literally mv'ing the executable to a CGI directory (and possibly setting up a redirect for the change).

Test Mode

A service can be invoked from the command line, passing all cgi pathinfo and query string parameters; url-encoding is handled automatically. Test output displays all HTTP headers.

Command line (pipes)

A service also runs on the command line; this allows services to be used in pipelines, tested under system monitors or cronjobs, etc. Not yet fully implemented.

High-performance, long-running CGI daemon protocols

Such as fastcgi, scgi, and AJP. Not yet implemented.

The combination of modes available to a service makes it clonable, testable, and mobile from host to host, as needed.