module Kwrefer:sig..end
Refer is an excellent low-noise, easy to edit, flat-file data format for non-recursive key-value data with repeating and optional fields. See Refer Data Format for more information.
Refer databases (collections of files of refer records) can be validated against a schema.
Parsed refer records are represented by this module as association lists (alists). The order of the fields in the record is preserved.
A refer database on an input channel is best processed with
Kwrefer.fold. It can also be processed as a stream of char (using
Stream.of_channel or Stream.of_string) with Kwrefer.records,
which converts the char stream to a stream of refer records. In
addition to List.assoc and friends, Kwlist.asplit and
Kwlist.coalesce may be useful.
You can convert an alist to a string in refer format with
Kwrefer.assemble; writing a bunch of alists out as a refer
database can be done via print_endline & assemble.
Author(s): Keith Waclena
See also
refer man page.exception Syntax of string option * string
Syntax (location, explanation)exception Err of int * string
Err (ln, line)exception Invalid_record of string
val fold : ?err:(string -> int -> 'a -> string list -> 'a) ->
(int -> 'a -> (string * string) list -> 'a) -> 'a -> Kwchan.src -> 'afold ?err f init src: fold function f over the refer records in src.
The err function's first parameter is the invalid line of text
from the input src; the final parameters is the list of valid
lines accumulated so far (this parameter is probably not of
interest). Typically, you would report the first parameter as
context for an error message.
Raises Err upon syntax errors
val withoutln : ('a -> 'b -> 'c) -> 'a -> 'd -> 'b -> 'c
val records : ?loc:string -> char Stream.t -> (int * (string * string) list) Stream.trecords ?loc stream: convert a stream of characters to a stream of pairs of
(line number, parsed record (alist)).
The line number represents the beginning of the record in the stream of characters.
loc : location (typically filename) to be helpfully added to exceptionsval parse : ?loc:string -> string -> (string * string) listparse ?loc str: parse a single refer record (a string) to an alist.
This function is relatively inefficient; Kwrefer.fold or Kwrefer.records
will be much faster for parsing entire files.
Returns pair of (line number, parsed record (alist))
loc : location (typically filename) to be helpfully added to exceptionsval assemble : (string * string) list -> stringassemble alist: format a refer record (a string) from an alistval keys : ('a * 'b) list -> 'a listkeys alist: get all (unique) keys in record (alist)val keycounts : ('a * 'b) list -> ('a * int) listkeycounts alist: return counts of keys in a parsed record.val getall : 'a -> ('a * 'b) list -> 'b listgetall key alist: return list of all values in alist corresponding to keyA refer database is a collection of refer records, which can be represented externally as one or more refer files. These databases can be validated against a schema. Schemas associate sets of properties with fields.
A schema can be represented as a refer record; the compilation
functions below will parse such a record as a schema (performing
validation checks on the schema itself) and return the internal
representation as type Kwrefer.schema. You can of course build
compiled schemas from ad hoc strings or alists.
The validation functions below can then validate refer records against a compiled schema, returning a list of errors (if any).
In addition to a simple schema, applied to each record of a database, we support multischemas that can be applied to refer databases containing different types of records that are distinguishable by some field (called the schema key field or skey). This is not a normal KEY field because its values aren't unique across the database.
A multischema is itself a refer database consisting of multiple schemas, each identified by a true KEY field whose value is one of the skey values.
A keyed database is validated against a multischema by using each record's skey to lookup the appropriate schema to be used to validate that record.
Fields of schema records can't contain arbitrary values; they can only contain:
ENUM property takes a parameter) %R author COMMENT schema key
%A KEY REQ COMMENT author's name
%D OPT UNIQ COMMENT author's dates: born-died
%B OPT UNIQ ENUM complete,select COMMENT state of bibliography
Validation occurs as follows. A simple schema or multi-schema
must be compiled. If compiled from a file, a simple schema must
be the only record in the file; each record of a multi-schema must
have an skey field as named by the ~skey parameter of the
compilation function; the skey fields of the multi-schema are KEYS
and must have unique values.
If validating with a simple schema, the schema is applied to each record of the database; if a multi-schema, then the schema record that matches the skey field of the database record is used.
Most properties concern the presence or absence of fields, and their number, without regard to their values.
The fields of the record must accord with the properties of the corresponding field in the schema record. Each REQ field must occur in every database record; each UNIQ field can only occur once per record. OPT fields may be absent, and REP fields may occur multiple times.
Fields with an ENUM property place a constraint on the field value
(if the field is present): the field value must be one of the
possible values enumerated in the schema. For example, if the
schema contains ENUM foo,bar,baz for a field A, then each
value of each A field must be one of the simple strings foo,
bar, or baz. Any other value is illegal. UNIQ and REQ
properties control the presence of the fields as usual. ENUM
values can't contain spaces or commas.
Fields with a KEY property also place a constraint on the field value. The values of KEY fields must be unique across all records in the database. It is common for KEY fields to also be declared REQ and UNIQ, but this isn't mandatory. A KEY field that's not UNIQ allows a record to have multiple keys. A KEY field that's not REQ allows "anonymous" records.
Any fields not mentioned in the schema are ignored, unless
~strict validation is being used, in which case any unmentioned
fields generate Illegal errors.
type prop =
| |
OPT |
(* |
optional field (mutually exclusive with
REQ) | *) |
| |
REQ |
(* |
required field (mutually exclusive with
OPT) | *) |
| |
UNIQ |
(* |
unique field (mutually exclusive with
REP) | *) |
| |
REP |
(* |
repeatable field (mutually exclusive with
UNIQ) | *) |
| |
KEY |
(* |
key field; values must be unique across entire database
| *) |
| |
ENUM |
(* |
field value is restricted to enumeration
| *) |
| |
COMMENT |
(* |
comment extending to EOL
| *) |
type pprop
val prop_of_string : string -> prop
val string_of_prop : prop -> string
module PS:Kwset.Swith type elt = pprop
module SS:Kwset.Swith type elt = string
module SM:Kwmap.Swith type key = string
module PM:Kwmap.Swith type key = pprop
type validation_error =
| |
Key of |
(* |
KEY field has non-unique value
| *) |
| |
Illegal of |
(* |
field not allowed by strict application of schema
| *) |
| |
Missing of |
(* |
REQ field missing
| *) |
| |
Repeat of |
(* |
UNIQ field repeated
| *) |
| |
Enum of |
(* |
field value not in ENUM
| *) |
| |
Skey of |
(* |
schema key value doesn't exist in multischema
| *) |
string option's are ?loc's (e.g. filenames). int's are line
numbers.
val string_of_validation_error : validation_error -> string
type schemaerror =
| |
REQOPT of |
(* |
schema field has conflicting properties (REQ and OPT)
| *) |
| |
UNIQREP of |
(* |
schema field has conflicting properties (UNIQ and REP)
| *) |
| |
BADPROP of |
(* |
schema field has invalid property
| *) |
| |
TOOMANY of |
(* |
simple schema has too many records (> 1)
| *) |
| |
INVALID of |
(* |
multi-schema contains invalid schema record(s)
| *) |
| |
MANYSKEYS of |
(* |
schema has too many schema key fields
| *) |
int's are line numbers; strings are field names.
val string_of_schemaerror : string -> schemaerror -> stringstring_of_schemaerror loc error: convert a schemaerror to a
string, assuming loc is the location (e.g. filename).type schema
type multi
type schemas =
| |
Simple of |
| |
Multi of |
| |
Bad of |
val string_of_schemas : schemas -> string
val getref : ?def:'a list ->
prop * prop -> 'b -> ('b * 'a list) list -> 'a listval compile_stream : ?loc:string -> ?skey:SM.key -> char Stream.t -> schemascompile_stream ?loc ?skey stream: compile the schema on stream to internal form.loc : location (typically filename)skey : the name of the schema key field (if any); required for multi-schema;
not allowed for simple schemaval compile_channel : ?loc:string ->
?skey:SM.key -> Pervasives.in_channel -> schemascompile_channel ?loc ?skey channel: compile the schema file open on channel to internal form.loc : location (typically filename)skey : the name of the schema key field (if any); required for multi-schema;
not allowed for simple schemaval compile_file : ?skey:SM.key -> string -> schemascompile_file ?skey file: compile the schema file to internal form.skey : the name of the schema key field (if any); required for multi-schema;
not allowed for simple schemaval compile_string : ?loc:string -> ?skey:SM.key -> string -> schemascompile_string ?loc ?skey file: compile the schema in string to internal form.loc : location (typically filename)skey : the name of the schema key field (if any); required for multi-schema;
not allowed for simple schemaval validate : ?loc:string ->
?strict:bool ->
schemas ->
SS.t SM.t * validation_error list ->
int ->
(SM.key * SS.elt) list ->
SS.t SM.t * validation_error listvalidate ?loc ?strict cschema (keys,errs) ln alist: validate the refer record in alist
This is the lowest-level validation function. It validates one record at a time,
and is designed to be used with a suitable fold, in particular Kwrefer.fold. If you
partially apply it with a compiled schema cschema (and optionally any of loc
and strict) you have a function that can be passed directly to Kwrefer.fold:
Kwrefer.fold (validate ?loc ?strict (compile_file schemafile)) (SS.empty,[]) chan
The accumulator (keys,errs) is dual purpose: errs accumulates a list of validation errors,
if any. keys accumulates the values from the KEY fields of the database for
duplicate-checking; duplicate keys are reported as errors in errs, so you can discard
keys after the fold, or use it as a handy set of all key values.
loc : location (typically filename)strict : whether or not to parse the database in strict modeval validate_record : ?loc:string ->
?strict:bool ->
schemas ->
(SM.key * SS.elt) list -> validation_error listvalidate_record ?loc ?strict cschema alist: validate the refer record in alist
Convenience function to validate just one record.
loc : location (typically filename)strict : whether or not to parse the database in strict modeval validate_channel : ?strict:bool ->
schemas -> string -> Kwchan.src -> validation_error listvalidate_channel ?strict cschema filename channel: validate the refer database open on channel.
filename is the name associated with channel, for error messages.
validate_channel, partially applied with a compile schema and
optional ~strict, is suitable for use with
Kwio.with_open_in_file.
strict : whether or not to parse the database in strict modeval validate_file : ?strict:bool -> schemas -> string -> validation_error listvalidate_file ?strict cschema filename: validate the refer database in filename.strict : whether or not to parse the database in strict modeval validate_files : ?strict:bool ->
schemas -> string list -> validation_error listvalidate_files ?strict cschema filenames: validate the refer database in filenames.strict : whether or not to parse the database in strict modeA keyed refer database is a one in which every record has a key field.
A key field is a distinguished field, which is required
(REQ) and unique (UNIQ), and whose values are unique across
the database (KEY): i.e., specifying such a value identifies a unique
record in the database.
A multi-map is a Kwmap whose keys are strings (the keyfield
values) and whose values are refer records represented as
refermap's.
typerefermap =string list SM.t
refermap, which is a
Kwmap whose keys are strings (representing field names) and whose
values are lists of strings, representing the values of the possibly
repeating occurrences.val bucket : ?err:('a list option ->
'b ->
('c * 'a) list -> ('a * ('c * 'a) list) list * ('c * 'a) list list) ->
'c ->
('a * ('c * 'a) list) list * ('c * 'a) list list ->
'b -> ('c * 'a) list -> ('a * ('c * 'a) list) list * ('c * 'a) list listbucket ?empty ?toomany ?none ?err field acc ln alist: group records by a distinguishing field into buckets.
bucket field returns a function suitable for Kwrefer.fold that groups
records into equivalence classes based on the value of field.
The return value is a pair, consisting of a Kwrefer.SM map whose keys
are the unique values of field and whose values are lists of
those records (as alists), and a list of the remaining records
that had problems with field.
field is assumed to be a KEY field, i.e. UNIQ and REQ.
The remainder list consists of records that are missing field,
or whose field has no value or mutiple (repeating) values.
The Kwrefer.unbucket function can conveniently raise an error if the
result contains any unbucketed records. Alternatively, you can
pass an err function that will be called with the offending
values option, a line number and the alist for each problematic
record.
err : general error function called for any of the above cases that don't have a specific functionexception Unbucketed of int
val unbucket : ?strict:bool -> 'a * 'b list -> 'aunbucket ?strict pair: discard unbucketed remainder of result of Kwrefer.bucket
If strict = true (the default), an exception is raised.
Raises Unbucketed if remainder is not of length zero
Returns the bucket map
val to_mmap : ?loc:string ->
?validate:(?loc:string ->
SS.t SM.t * validation_error list ->
int ->
(SM.key * SS.elt) list ->
SS.t SM.t * validation_error list) ->
?map:refermap SM.t ->
string -> Pervasives.in_channel -> refermap SM.tKwrefer.bucket.to_mmap ?validate ?map keyfield : function to convert a keyed refer database to multi-map.refermap'svalidate : optional validation function (Kwrefer.validate partially-applied with a compiled schema is suitable)map : optional multi-map to populate (allows you to load several database files into one multi-map)
In the get*u functions, the field's are assumed to have been
validated as UNIQ; if they are not, the returned values are the
first occurrences of each field.
val get : string list -> refermap -> string list liststring list's)val getu : string list -> refermap -> string liststring's)val get1 : string -> refermap -> string liststring list)val get1u : string -> refermap -> stringstring list)val get2 : string * string -> refermap -> string list * string liststring list's)val get2u : string * string -> refermap -> string * stringstring's)val get3 : string * string * string ->
refermap -> string list * string list * string liststring list's)val get3u : string * string * string -> refermap -> string * string * stringstring's)val get4 : string * string * string * string ->
refermap -> string list * string list * string list * string liststring list's)val get4u : string * string * string * string ->
refermap -> string * string * string * stringstring's)
To be written.