module Kwrefer:sig
..end
Refer is an excellent low-noise, easy to edit, flat-file data format for non-recursive key-value data with repeating and optional fields. See Refer Data Format for more information.
Refer databases (collections of files of refer records) can be validated against a schema.
Parsed refer records are represented by this module as association lists (alists). The order of the fields in the record is preserved.
A refer database on an input channel is best processed with
Kwrefer.fold
. It can also be processed as a stream of char (using
Stream.of_channel
or Stream.of_string
) with Kwrefer.records
,
which converts the char stream to a stream of refer records. In
addition to List.assoc
and friends, Kwlist.asplit
and
Kwlist.coalesce
may be useful.
You can convert an alist to a string in refer format with
Kwrefer.assemble
; writing a bunch of alists out as a refer
database can be done via print_endline & assemble
.
Author(s): Keith Waclena
See also
refer
man page.exception Syntax of string option * string
Syntax (location, explanation)
exception Err of int * string
Err (ln, line)
exception Invalid_record of string
val fold : ?err:(string -> int -> 'a -> string list -> 'a) ->
(int -> 'a -> (string * string) list -> 'a) -> 'a -> Kwchan.src -> 'a
fold ?err f init src
: fold function f
over the refer records in src
.
The err
function's first parameter is the invalid line of text
from the input src; the final parameters is the list of valid
lines accumulated so far (this parameter is probably not of
interest). Typically, you would report the first parameter as
context for an error message.
Raises Err
upon syntax errors
val withoutln : ('a -> 'b -> 'c) -> 'a -> 'd -> 'b -> 'c
val records : ?loc:string -> char Stream.t -> (int * (string * string) list) Stream.t
records ?loc stream
: convert a stream of characters to a stream of pairs of
(line number, parsed record (alist)).
The line number represents the beginning of the record in the stream of characters.
loc
: location (typically filename) to be helpfully added to exceptionsval parse : ?loc:string -> string -> (string * string) list
parse ?loc str
: parse a single refer record (a string
) to an alist.
This function is relatively inefficient; Kwrefer.fold
or Kwrefer.records
will be much faster for parsing entire files.
Returns pair of (line number, parsed record (alist))
loc
: location (typically filename) to be helpfully added to exceptionsval assemble : (string * string) list -> string
assemble alist
: format a refer record (a string) from an alistval keys : ('a * 'b) list -> 'a list
keys alist
: get all (unique) keys in record (alist)val keycounts : ('a * 'b) list -> ('a * int) list
keycounts alist
: return counts of keys in a parsed record.val getall : 'a -> ('a * 'b) list -> 'b list
getall key alist
: return list of all values in alist corresponding to keyA refer database is a collection of refer records, which can be represented externally as one or more refer files. These databases can be validated against a schema. Schemas associate sets of properties with fields.
A schema can be represented as a refer record; the compilation
functions below will parse such a record as a schema (performing
validation checks on the schema itself) and return the internal
representation as type Kwrefer.schema
. You can of course build
compiled schemas from ad hoc strings or alists.
The validation functions below can then validate refer records against a compiled schema, returning a list of errors (if any).
In addition to a simple schema, applied to each record of a database, we support multischemas that can be applied to refer databases containing different types of records that are distinguishable by some field (called the schema key field or skey). This is not a normal KEY field because its values aren't unique across the database.
A multischema is itself a refer database consisting of multiple schemas, each identified by a true KEY field whose value is one of the skey values.
A keyed database is validated against a multischema by using each record's skey to lookup the appropriate schema to be used to validate that record.
Fields of schema records can't contain arbitrary values; they can only contain:
ENUM
property takes a parameter)%R author COMMENT schema key %A KEY REQ COMMENT author's name %D OPT UNIQ COMMENT author's dates: born-died %B OPT UNIQ ENUM complete,select COMMENT state of bibliography
Validation occurs as follows. A simple schema or multi-schema
must be compiled. If compiled from a file, a simple schema must
be the only record in the file; each record of a multi-schema must
have an skey field as named by the ~skey
parameter of the
compilation function; the skey fields of the multi-schema are KEYS
and must have unique values.
If validating with a simple schema, the schema is applied to each record of the database; if a multi-schema, then the schema record that matches the skey field of the database record is used.
Most properties concern the presence or absence of fields, and their number, without regard to their values.
The fields of the record must accord with the properties of the corresponding field in the schema record. Each REQ field must occur in every database record; each UNIQ field can only occur once per record. OPT fields may be absent, and REP fields may occur multiple times.
Fields with an ENUM property place a constraint on the field value
(if the field is present): the field value must be one of the
possible values enumerated in the schema. For example, if the
schema contains ENUM foo,bar,baz
for a field A
, then each
value of each A
field must be one of the simple strings foo
,
bar
, or baz
. Any other value is illegal. UNIQ and REQ
properties control the presence of the fields as usual. ENUM
values can't contain spaces or commas.
Fields with a KEY property also place a constraint on the field value. The values of KEY fields must be unique across all records in the database. It is common for KEY fields to also be declared REQ and UNIQ, but this isn't mandatory. A KEY field that's not UNIQ allows a record to have multiple keys. A KEY field that's not REQ allows "anonymous" records.
Any fields not mentioned in the schema are ignored, unless
~strict
validation is being used, in which case any unmentioned
fields generate Illegal
errors.
type
prop =
| |
OPT |
(* |
optional field (mutually exclusive with
REQ ) | *) |
| |
REQ |
(* |
required field (mutually exclusive with
OPT ) | *) |
| |
UNIQ |
(* |
unique field (mutually exclusive with
REP ) | *) |
| |
REP |
(* |
repeatable field (mutually exclusive with
UNIQ ) | *) |
| |
KEY |
(* |
key field; values must be unique across entire database
| *) |
| |
ENUM |
(* |
field value is restricted to enumeration
| *) |
| |
COMMENT |
(* |
comment extending to EOL
| *) |
type
pprop
val prop_of_string : string -> prop
val string_of_prop : prop -> string
module PS:Kwset.S
with type elt = pprop
module SS:Kwset.S
with type elt = string
module SM:Kwmap.S
with type key = string
module PM:Kwmap.S
with type key = pprop
type
validation_error =
| |
Key of |
(* |
KEY field has non-unique value
| *) |
| |
Illegal of |
(* |
field not allowed by strict application of schema
| *) |
| |
Missing of |
(* |
REQ field missing
| *) |
| |
Repeat of |
(* |
UNIQ field repeated
| *) |
| |
Enum of |
(* |
field value not in ENUM
| *) |
| |
Skey of |
(* |
schema key value doesn't exist in multischema
| *) |
string option
's are ?loc's (e.g. filenames). int
's are line
numbers.
val string_of_validation_error : validation_error -> string
type
schemaerror =
| |
REQOPT of |
(* |
schema field has conflicting properties (REQ and OPT)
| *) |
| |
UNIQREP of |
(* |
schema field has conflicting properties (UNIQ and REP)
| *) |
| |
BADPROP of |
(* |
schema field has invalid property
| *) |
| |
TOOMANY of |
(* |
simple schema has too many records (> 1)
| *) |
| |
INVALID of |
(* |
multi-schema contains invalid schema record(s)
| *) |
| |
MANYSKEYS of |
(* |
schema has too many schema key fields
| *) |
int
's are line numbers; strings are field names.
val string_of_schemaerror : string -> schemaerror -> string
string_of_schemaerror loc error
: convert a schemaerror to a
string, assuming loc
is the location (e.g. filename).type
schema
type
multi
type
schemas =
| |
Simple of |
| |
Multi of |
| |
Bad of |
val string_of_schemas : schemas -> string
val getref : ?def:'a list ->
prop * prop -> 'b -> ('b * 'a list) list -> 'a list
val compile_stream : ?loc:string -> ?skey:SM.key -> char Stream.t -> schemas
compile_stream ?loc ?skey stream
: compile the schema on stream
to internal form.loc
: location (typically filename)skey
: the name of the schema key field (if any); required for multi-schema;
not allowed for simple schemaval compile_channel : ?loc:string ->
?skey:SM.key -> Pervasives.in_channel -> schemas
compile_channel ?loc ?skey channel
: compile the schema file open on channel
to internal form.loc
: location (typically filename)skey
: the name of the schema key field (if any); required for multi-schema;
not allowed for simple schemaval compile_file : ?skey:SM.key -> string -> schemas
compile_file ?skey file
: compile the schema file to internal form.skey
: the name of the schema key field (if any); required for multi-schema;
not allowed for simple schemaval compile_string : ?loc:string -> ?skey:SM.key -> string -> schemas
compile_string ?loc ?skey file
: compile the schema in string
to internal form.loc
: location (typically filename)skey
: the name of the schema key field (if any); required for multi-schema;
not allowed for simple schemaval validate : ?loc:string ->
?strict:bool ->
schemas ->
SS.t SM.t * validation_error list ->
int ->
(SM.key * SS.elt) list ->
SS.t SM.t * validation_error list
validate ?loc ?strict cschema (keys,errs) ln alist
: validate the refer record in alist
This is the lowest-level validation function. It validates one record at a time,
and is designed to be used with a suitable fold, in particular Kwrefer.fold
. If you
partially apply it with a compiled schema cschema
(and optionally any of loc
and strict
) you have a function that can be passed directly to Kwrefer.fold
:
Kwrefer.fold (validate ?loc ?strict (compile_file schemafile)) (SS.empty,[]) chan
The accumulator (keys,errs)
is dual purpose: errs
accumulates a list of validation errors,
if any. keys
accumulates the values from the KEY fields of the database for
duplicate-checking; duplicate keys are reported as errors in errs
, so you can discard
keys
after the fold, or use it as a handy set of all key values.
loc
: location (typically filename)strict
: whether or not to parse the database in strict modeval validate_record : ?loc:string ->
?strict:bool ->
schemas ->
(SM.key * SS.elt) list -> validation_error list
validate_record ?loc ?strict cschema alist
: validate the refer record in alist
Convenience function to validate just one record.
loc
: location (typically filename)strict
: whether or not to parse the database in strict modeval validate_channel : ?strict:bool ->
schemas -> string -> Kwchan.src -> validation_error list
validate_channel ?strict cschema filename channel
: validate the refer database open on channel
.
filename
is the name associated with channel, for error messages.
validate_channel
, partially applied with a compile schema and
optional ~strict
, is suitable for use with
Kwio.with_open_in_file
.
strict
: whether or not to parse the database in strict modeval validate_file : ?strict:bool -> schemas -> string -> validation_error list
validate_file ?strict cschema filename
: validate the refer database in filename
.strict
: whether or not to parse the database in strict modeval validate_files : ?strict:bool ->
schemas -> string list -> validation_error list
validate_files ?strict cschema filenames
: validate the refer database in filenames
.strict
: whether or not to parse the database in strict modeA keyed refer database is a one in which every record has a key field.
A key field is a distinguished field, which is required
(REQ
) and unique (UNIQ
), and whose values are unique across
the database (KEY
): i.e., specifying such a value identifies a unique
record in the database.
A multi-map is a Kwmap
whose keys are strings (the keyfield
values) and whose values are refer records represented as
refermap
's.
typerefermap =
string list SM.t
refermap
, which is a
Kwmap
whose keys are strings (representing field names) and whose
values are lists of strings, representing the values of the possibly
repeating occurrences.val bucket : ?err:('a list option ->
'b ->
('c * 'a) list -> ('a * ('c * 'a) list) list * ('c * 'a) list list) ->
'c ->
('a * ('c * 'a) list) list * ('c * 'a) list list ->
'b -> ('c * 'a) list -> ('a * ('c * 'a) list) list * ('c * 'a) list list
bucket ?empty ?toomany ?none ?err field acc ln alist
: group records by a distinguishing field
into buckets.
bucket field
returns a function suitable for Kwrefer.fold
that groups
records into equivalence classes based on the value of field
.
The return value is a pair, consisting of a Kwrefer.SM
map whose keys
are the unique values of field
and whose values are lists of
those records (as alists), and a list of the remaining records
that had problems with field
.
field
is assumed to be a KEY
field, i.e. UNIQ
and REQ
.
The remainder list consists of records that are missing field
,
or whose field
has no value or mutiple (repeating) values.
The Kwrefer.unbucket
function can conveniently raise an error if the
result contains any unbucketed records. Alternatively, you can
pass an err
function that will be called with the offending
values option, a line number and the alist for each problematic
record.
err
: general error function called for any of the above cases that don't have a specific functionexception Unbucketed of int
val unbucket : ?strict:bool -> 'a * 'b list -> 'a
unbucket ?strict pair
: discard unbucketed remainder of result of Kwrefer.bucket
If strict = true
(the default), an exception is raised.
Raises Unbucketed
if remainder is not of length zero
Returns the bucket map
val to_mmap : ?loc:string ->
?validate:(?loc:string ->
SS.t SM.t * validation_error list ->
int ->
(SM.key * SS.elt) list ->
SS.t SM.t * validation_error list) ->
?map:refermap SM.t ->
string -> Pervasives.in_channel -> refermap SM.t
Kwrefer.bucket
.to_mmap ?validate ?map keyfield
: function to convert a keyed refer database to multi-map.refermap
'svalidate
: optional validation function (Kwrefer.validate
partially-applied with a compiled schema is suitable)map
: optional multi-map to populate (allows you to load several database files into one multi-map)
In the get*u
functions, the field's are assumed to have been
validated as UNIQ
; if they are not, the returned values are the
first occurrences of each field.
val get : string list -> refermap -> string list list
string list
's)val getu : string list -> refermap -> string list
string
's)val get1 : string -> refermap -> string list
string list
)val get1u : string -> refermap -> string
string list
)val get2 : string * string -> refermap -> string list * string list
string list
's)val get2u : string * string -> refermap -> string * string
string
's)val get3 : string * string * string ->
refermap -> string list * string list * string list
string list
's)val get3u : string * string * string -> refermap -> string * string * string
string
's)val get4 : string * string * string * string ->
refermap -> string list * string list * string list * string list
string list
's)val get4u : string * string * string * string ->
refermap -> string * string * string * string
string
's)
To be written.