Binary and text documents
Introduction
In Orbeon Forms XPL and pipelines only deal with XML documents. This means that between processor outputs and processor inputs in a pipeline, only pure XML infosets circulate. There is however often a need to handle non-XML data in pipelines, in particular:
Binary document: any document that can be represented as a stream of bytes. In general this is the case of any document, but some document formats are almost always represented this way: images, sounds, PDF documents, etc.
Text documents: any document that can be represented as a stream of characters. Some documents are better looked at this way, like plain txt files, HTML files, and even the textual representation of XML.
Orbeon Forms addresses this question by defining two standard XML document formats to embed binary and text documents within an XML infoset. This solution has the benefit of keeping XPL simple by limiting it to pure XML infosets, while allowing XPL to conveniently manipulate any binary and text document.
Binary documents
A binary document consist of a document
root node containing character data encoded with Base64. The following attributes are supported:
xsi:type
: mandatory, specifies the content asxs:base64Binary
content-type
: optional, provides a content-type which may be used by the consumerlast-modified
: optional, provides a last modification date which may be used by the consumerstatus-code
: optional, provides a status code which may be used by the consumerfilename
: optional, provides a file name which may be used by the consumerdisposition-type
: [SINCE Orbeon Forms 2017.1] optional, whenfilename
is specified:attachment
the default, if the browser should download the documentinline
if the browser should display the document inline
Example:
NOTE: For the curious, the Base64 encoding is documented in RFC 2045. This encoding represents binary data by mapping it to a set of 64 ASCII characters.
Such documents are not meant to be read by users, in the same way that regular binary files are not meant to be examined by users. Binary documents are generated by Orbeon Forms processors, like the URL generator and converters. They are consumed by processors like the HTTP serializer, the Email processor, and converters.
Text documents
A text document consists of a document
root element containing the text. The following attributes are supported:
xsi:type
: mandatory, specifies the content asxs:string
content-type
: optional, provides a content-type which may be used by the consumerlast-modified
: optional, provides a last modification date which may be used by the consumer
Example:
The content-type
attribute may have a charset
parameter providing a hint for the character encoding, for example:
Because XML character data itself is represented in Unicode (in other words it is designed to allow representing in a same document all the characters specified by the Unicode specification), there is no requirement for specifying character encoding in XML pipelines. However, when an XML infoset is read or written as a textual XML document, specifying a character encoding may may be a useful hint. For example a URL generator can, with this mechanism, communicate to an HTTP serializer the preferred character encoding obtained when the document was read. The serializer may then use that hint, but it is by no means authoritative.
In general, XML documents can be read and written using the utf-8
character encoding, which allows representing all the Unicode characters.
Unlike binary documents, text documents can easily be examined by users. They can also be easily manipulated by languages such as XSLT. Like binary documents, they are generated by Orbeon Forms processors, like the URL generator and converters. They are consumed by processors like the HTTP serializer, the Email processor, and converters.
Streaming
Processors can stream binary and text documents by issuing a number of short character SAX events. It is therefore possible to generate "infinitely" long binary and text documents with a constant amount of memory, assuming both the sender and the receiver of the document are able to perform streaming. This is the case for example of the URL generator and the HTTP serializer.