Prelude.EmailParse (partially) Email mboxes and messages.
These functions do simplistic parsing of RFC 5322 emails. For serious work, you'll need something like Ocamlnet or Mrmime.
The RFC specifies that emails use CRLF line terminators; when these functions return modified data from an email, the carriage returns are elided. The functions also work on emails that use LF terminators.
Example: extract all the subject headers from the mbox in mboxfile:
within Email.mbox mboxfile (* list of raw emails *)
|> map Email.split (* pairs of (headers,body) *)
|> map fst (* just raw headers *)
|> map Email.(parse >> nonums) (* parsed headers w/o line nums *)
|> map (Email.assocs "subject") (* subject headers *)(mbox chan) is the list of raw email messages from the RFC-4155 mbox on chan.
Each raw email message is suitable for split.
(split str) is a pair (headers, body) where headers is a list of lines comprising all the raw email header lines in their original order, and body is a string representing the raw body of the email.
str must be a single complete RFC 5322 email message (not an mbox).
This function can also be used to parse a string that is a single ANVL record.
(parse lines) is an alist representation of the email headers returned by split.
The keys are header field-names, and the values are pairs of (ln,value) where ln is the line number (1-based) of the header line, and value is the value trimmed of whitespace on both ends. See assocs for a typically more suitable lookup function, and nonums if you'd prefer to remove the line numbers from the alist.
(nonums alist) is the alist returned by parse with the line numbers removed.
(assocs key alist) is the list of all the values (in the original order) associated with key in alist; the key lookup is case-insensitive.
Because mail header field names are case-insensitive, and can repeat.
(headers ?sep list) is a string representation of the headers from the list of (split >> fst).
This string is exactly the header block from the email given to split mod the details of the line-endings, which are as given by ?sep (default: "\r\n").