Orbeon Forms
  • Getting started
  • Installation
    • Logging
    • Configuration banner
    • Docker
    • Azure
    • Tomcat
    • WildFly
    • WebSphere
    • WebLogic
    • GlassFish
    • Caches
    • Replication
    • Upgrading
  • Configuration
    • Properties
      • General
        • HTTP client
      • Form Runner
        • Detail page
          • Attachments
          • Email properties
          • PDF
          • Table of contents
        • Persistence
        • Summary page
      • Form Builder
      • XForms
    • Advanced
      • Workflows
      • Session management
      • State handling
      • Client-side error handling
      • Clustering and High Availability
      • Configuring a Form Runner eXist database
      • Creating a production WAR
      • Environments
      • JavaScript and CSS assets
      • Limiter filter
      • Run modes
      • Security
        • Content-Security-Policy header
      • SAP Hybris Module
      • XForms logging
    • Troubleshooting
      • Troubleshooting with the orbeon.log
      • Memory and threads
      • Relational database logging
      • Misc
  • Form Builder
    • Form settings
      • Time window
    • Form editor
      • Form area
      • Toolbox
      • Buttons bar
      • Control settings
      • Dependent fields and sections
      • Validation
      • Choices editor
      • Publishing
      • Cut, copy and paste
      • Section and grid settings
      • Section settings
      • Grid settings
      • Quick control search
      • Repeat settings
      • Repeated grids
      • Undo and redo
      • Keyboard shortcuts
    • Formulas
      • Examples of formulas
      • Formulas inspector
      • Formulas console
    • Summary page
    • Form localization
    • Advanced
      • Edit source
      • Services and actions
        • HTTP services
        • Database services
        • Simple Actions
        • Action Syntax
        • Action Syntax examples
        • Synchronizing repeated content
      • Testing a form in web mode
      • Testing PDF production
      • Testing offline functionality
      • Email Settings dialog
      • Field-level encryption
      • Messages
      • Section templates
      • Template syntax
      • XML Schemas support
      • Extensibility
        • Extension API
        • Integration
        • Toolbox component metadata
  • Form Runner
    • Overview
      • Terminology
    • Pages
      • Landing page
      • Published Forms page
      • Forms Admin page
      • Summary page
    • Components
      • Alert dialog
      • Attachment
      • Autocomplete
      • Captcha
      • Character counter
      • Checkbox input
      • Currency
      • Date
      • Dropdown date
      • Static and dynamic dropdown
      • Error summary
      • Grid
      • Handwritten signature
      • Hidden field
      • Image
      • Image annotation
      • Image attachment
      • International Securities Identification Number (ISIN)
      • Legal Entity Identifier (LEI)
      • Number
      • Open selection
      • Repeater
      • Formatted Text / Rich Text Editor
      • Section
      • Single-selection tree
      • Source code editor
      • Time
      • US Employer Identification Number (EIN)
      • US phone
      • US Social Security Number (SSN)
      • US state
      • Video
      • Video attachment
      • Wizard
      • XForms inspector
      • Yes/No answer
    • Features
      • Automatic calculations dependencies
      • Datasets
      • Excel and XML import
      • Excel and XML export
      • Summary page Excel Export
      • Form definitions and form data Zip Export
      • Purging historical data
      • Lease
      • Localization
      • Supported languages
      • Mobile support
      • Multitenancy
      • Form Runner navigation bar
      • PDF production
        • Automatic PDF
        • Automatic PDF header and footer configuration
        • PDF templates
      • Responsive design
      • Revision history
      • S3 storage
      • Simple data migration
      • TIFF production
      • Versioning
      • Wizard view
      • Workflow stage
    • Persistence
      • Using a relational database
      • Relational database schema
      • Purging old data using SQL
      • Auditing
      • Autosave
      • Database support
      • Flat view
    • Linking and embedding
      • Linking
      • Java Embedding API
      • JavaScript Embedding API
      • Liferay full portlet
      • Liferay proxy portlet
      • Securing Form Runner access
      • Form Runner offline embedding API
      • Angular component
      • React component
    • Access control and permissions
      • Users
      • Login & Logout
      • Deployed forms
      • Form fields
      • Editing forms
      • Owner and group member
      • Organizations
      • Scenarios
      • Token-based permissions
    • Styling
      • CSS
      • Grids CSS
      • Automatic PDF styling and CSS
    • APIs
      • Authentication of server-side service APIs
      • Persistence API
        • CRUD API
        • Search API
        • List form data attachments API
        • Form Metadata API
        • Lease API
        • Reindexing API
        • Caching
        • Versioning
        • Revision History API
        • Zip Export API
        • Custom persistence providers
      • Other APIs
        • Connection context API
        • Duplicate form data API
        • File scan API
        • Form Runner JavaScript API
        • Generate XML Schema API
        • PDF API
        • Publish form definition API
        • Run form in the background API
      • Data formats
        • Form data
        • Date and time
        • Form definition
    • Architecture and integration
      • Architecture
      • Access form data
      • Integration
    • Advanced
      • Buttons and processes
        • Simple process syntax
        • Core actions
        • Form Runner actions
          • Save action
          • Send action
          • Email action
        • XForms actions
        • Predefined buttons, processes and dialogs
        • Summary page buttons and processes
      • Custom dialogs/model logic
      • Services
      • Singleton form
      • Monitoring HTTP requests
  • XForms
    • Core
      • Attribute Value Templates (AVTs)
      • Binds
      • Validation
      • Variables
      • Keyboard focus
      • XForms JavaScript API
      • Error handling
        • Detailed behavior
      • Model-Bind variables
      • XForms 2.0 support
    • Events
      • Standard support
      • UI refresh events
      • Keyboard events
      • Extension events
      • Extension context information
      • Other event extensions
    • Actions
      • Repeat, insert and delete
      • Scripting actions
      • Extensions
    • Controls
      • Label, hint, help
      • Input
      • Output
      • Text area
      • Button
      • Upload
      • Dialog
    • Submission
      • Standard support
      • JSON support
      • Asynchronous submissions
      • Caching extension
      • Other submission extensions
    • XPath
      • Type annotations
      • Expression analysis
      • Tips
      • Compatibility
      • Standard functions
      • Maps and arrays Functions
      • Extension functions
        • Core functions
        • Utility functions
        • Model functions
        • Controls functions
        • XML functions
        • JSON functions
        • HTTP functions
        • Form Runner functions
        • Other functions
        • Deprecated functions
    • XBL components
      • FAQ
      • Guide
        • XBL Tutorial
        • Bindings
        • XForms models
        • Including content
        • Event handling
        • Conventions
        • Map XBL example
        • Learning from existing components
      • Advanced topics
        • XBL Modes
        • JavaScript companion classes
        • XBL library
        • Extensions
        • Attachment controls
    • XForms tutorial
      • Introduction
      • Installation
      • The Hello application
      • The Bookcast application
        • The basic app
        • Database access
        • Polishing the app
        • Adding a feed
    • Using XForms from Java apps
  • XML Platform
    • Page Flow Controller
      • Basics
      • XML submission
      • Navigating between pages
      • Paths and matchers
      • Other configuration elements
      • Typical combinations of page model and page view
      • Examples
      • Authorizing pages and services
    • Processors
      • URL generator
      • Request generator
      • PDF to image converter
    • Resources
      • Resource managers
      • Setting up an external resources directory
    • Other
      • Binary and text documents
  • FAQ
    • Licensing
    • PE and Dev Support
    • Form Builder and Form Runner
    • Resources and support
    • Other technical questions
  • Contributors
    • Automated tests
    • Building Orbeon Forms
    • Localizing Orbeon Forms
    • Validation functions
    • Contributor License Agreement
  • Release notes
    • Orbeon Forms 2022.1.9
    • Orbeon Forms 2024.1.1
    • Orbeon Forms 2023.1.7
    • Orbeon Forms 2024.1
    • Orbeon Forms 2023.1.6
    • Orbeon Forms 2023.1.5
    • Orbeon Forms 2021.1.11
    • Orbeon Forms 2022.1.8
    • Orbeon Forms 2023.1.4
    • Orbeon Forms 2023.1.3
    • Orbeon Forms 2023.1.2
    • Orbeon Forms 2022.1.7
    • Orbeon Forms 2023.1.1
    • Orbeon Forms 2023.1
    • Orbeon Forms 2022.1.6
    • Orbeon Forms 2021.1.10
    • Orbeon Forms 2022.1.5
    • Orbeon Forms 2021.1.9
    • Orbeon Forms 2022.1.4
    • Orbeon Forms 2022.1.3
    • Orbeon Forms 2021.1.8
    • Orbeon Forms 2022.1.2
    • Orbeon Forms 2022.1.1
    • Orbeon Forms 2022.1
    • Orbeon Forms 2021.1.7
    • Orbeon Forms 2021.1.6
    • Orbeon Forms 2021.1.5
    • Orbeon Forms 2021.1.4
    • Orbeon Forms 2021.1.3
    • Orbeon Forms 2021.1.2
    • Orbeon Forms 2021.1.1
    • Orbeon Forms 2021.1
    • Orbeon Forms 2020.1.6
    • Orbeon Forms 2019.2.4
    • Orbeon Forms 2019.1.2
    • Orbeon Forms 2018.2.5
    • Orbeon Forms 2018.1.4
    • Orbeon Forms 2020.1.5
    • Orbeon Forms 2020.1.4
    • Orbeon Forms 2020.1.3
    • Orbeon Forms 2020.1.2
    • Orbeon Forms 2019.2.3
    • Orbeon Forms 2020.1.1
    • Orbeon Forms 2020.1
    • Orbeon Forms 2019.2.2
    • Orbeon Forms 2019.2.1
    • Orbeon Forms 2019.1.1
    • Orbeon Forms 2019.2
    • Orbeon Forms 2019.1
    • Orbeon Forms 2018.2.4
  • Release history
  • Use cases
  • Product roadmap
  • Index of features
Powered by GitBook
On this page
  • Introduction
  • Content type
  • XML mode
  • HTML mode
  • Text mode
  • JSON mode
  • Binary mode
  • Character encoding
  • HTTP headers
  • Cache control
  • Local cache
  • Conditional GET
  • Authentication
  • Relative URLs
  1. XML Platform
  2. Processors

URL generator

PreviousProcessorsNextRequest generator

Introduction

Generators are a special category of processors that have no XML data inputs, only outputs. They are generally used at the top of an XML pipeline to generate XML data from a Java object or other non-XML source.

The URL generator fetches a document from a URL and produces an XML output document. The protocols supported are http:, https:, and file: as well as the Orbeon Forms resource protocol (oxf:). See for more information about the oxf: protocol.

Content type

The URL generator operates in several modes depending on the content type of the source document. The content type is determined according to the following priorities:

  1. Use the content type in the content-type element of the configuration if force-content-type is set to true.

  2. Use the content type set by the connection (for example, the content type sent with the document by an HTTP server), if any. Note that when using the oxf: or file: protocol, the connection content type is never available. When using the http: protocol, the connection content type may or may not be available depending on the configuration of the HTTP server.

  3. Use the content type in the content-type element of the configuration, if specified.

  4. Use application/xml.

In addition, it is possible to force the mode using the <mode> configuration element:

<p:processor name="oxf:url-generator">
  <p:input name="config">
    <config>
      <url>http://example.org/resource</url>
      <mode>binary</mode>
    </config>
  </p:input>
  <p:output name="data" id="binary-document"/>
</p:processor>

XML mode

The XML mode is selected when:

  • the content type is text/xml, application/xml, or ends with +xml according to the selection algorithm above

  • the xml mode is forced using the <mode> configuration element

The generator fetches the specified URL and parses the XML document.

The following options are available:

  • validating:

    • if set to true, a validating parser (using a DTD) is used, otherwise a non-validating parser is used

    • default: false

  • handle-xinclude:

    • if set to true, handle XInclude inclusions during parsing

    • default: true

  • external-entities:

    • if set to true, external entities are processed

    • default: false

  • handle-lexical:

    • if set to true, propagate XML comments present in the input

    • default: true

Example:

<p:processor name="oxf:url-generator">
  <p:input name="config">
    <config>
      <url>oxf:/urlgen/note.xml</url>
      <content-type>application/xml</content-type>
      <validating>true</validating>
      <handle-xinclude>false</handle-xinclude>
      <external-entities>false</external-entities>
      <handle-lexical>false</handle-lexical>
    </config>
  </p:input>
  <p:output name="data" id="xml"/>
</p:processor>

If the URL is an HTTP or HTTPS URL and the server returns a non-success status code, an exception is raised.

NOTE: The URL must point to a well-formed XML document. If it doesn't, an exception is raised.

NOTE: Be careful when setting _external-entities to true, as non-trusted documents with external entities could be used by malicious users to inject content into your XML document._

HTML mode

The HTML mode is selected when:

  • the content type is text/html according to the selection algorithm above

  • the html mode is forced using the <mode> configuration element

In this mode, the URL generator uses HTML Tidy to transform HTML into XML. This feature is useful to later extract information from HTML using XPath.

Examples:

<p:processor name="oxf:url-generator">
  <p:input name="config">
    <config>
      <url>http://www.cnn.com</url>
      <content-type>text/html</content-type>
      <tidy-options>
        <show-warnings>false</show-warnings>
        <quiet>true</quiet>
      </tidy-options>
    </config>
  </p:input>
  <p:output name="data" id="html"/>
</p:processor>
<p:processor name="oxf:url-generator">
  <p:input name="config">
    <config>
      <url>oxf:/html/example.html</url>
      <content-type>text/html</content-type>
      <force-content-type>true</force-content-type>
      <tidy-options>
        <show-warnings>false</show-warnings>
        <quiet>true</quiet>
      </tidy-options>
    </config>
  </p:input>
  <p:output name="data" id="html"/>
</p:processor>

The <tidy-options> part of the configuration in the two examples above is optional. However, by default quiet is set to false, which causes HTML Tidy to output messages to the console when it finds invalid HTML. To prevent this, add a <tidy-options> section to the configuration with quiet set to true.

Even if HTML Tidy has some tolerance for malformed HTML, you should use well-formed HTML whenever possible.

If the URL is an HTTP or HTTPS URL and the server returns a non-success status code, an exception is raised.

Text mode

The text mode is selected when:

  • the content type according to the selection algorithm above starts with text/ and is different from text/html or text/xml, for example text/plain

  • the text mode is forced using the <mode> configuration element

In this mode, the URL generator reads the input as a text file and produces an XML document containing the text read.

Example:

<p:processor name="oxf:url-generator">
  <p:input name="config">
    <config>
      <url>oxf:/list.txt</url>
      <content-type>text/plain</content-type>
    </config>
  </p:input>
  <p:output name="data" id="text"/>
</p:processor>

Assume the input document contains the following text:

This is line one of the input document!  
This is line two of the input document!  
This is line three of the input document!
  • xsi:type, set to xs:string

  • content-type, if known

  • status-code, if the resource was retrieved through HTTP or HTTPS

<document xsi:type="xs:string" content-type="text/plain">
  This is line one of the input document! This is line two of the input document! This is line three of the input document!
</document>

JSON mode

[SINCE Orbeon Forms 2016.2]

The JSON mode is selected when:

  • the content type is application/json according to the selection algorithm above

  • the json mode is forced using the <mode> configuration element

[SINCE Orbeon Forms 2017.1]

In addition to the application/json mediatype, mediatypes of the form a/b+json are recognized.

Binary mode

The binary mode is selected when:

  • the content type is neither one of the XML content types nor one of the text/* content types

  • the binary mode is forced using the <mode> configuration element

<p:processor name="oxf:url-generator">
  <p:input name="config">
    <config>
      <url>oxf:/my-image.jpg</url>
      <content-type>image/jpeg</content-type>
    </config>
  </p:input>
  <p:output name="data" id="image-data"/>
</p:processor>

The resulting document consists of a document root node containing character data encoded with Base64. The following attributes are present:

  • xsi:type, set to xs:base64Binary

  • content-type, if known

  • status-code, if the resource was retrieved through HTTP or HTTPS

<document xsi:type="xs:base64Binary" content-type="image/jpeg">
  /9j/4AAQSkZJRgABAQEBygHKAAD/2wBDAAQDAwQDAwQEBAQFBQQFBwsHBwYGBw4KCggLEA4R ... KKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooA//2Q==
</document>

Character encoding

For text and XML, the character encoding is determined as follows:

  1. Use the encoding in the encoding element of the configuration if force-encoding is set to true.

  2. Use the encoding set by the connection (for example, the encoding sent with the document by an HTTP server), if any, unless ignore-connection-encoding is set to true (for XML documents, precedence is given to the connection encoding as per RFC 3023). Note that when using the oxf: or file: protocol, the connection encoding is never available. When using the http: protocol, the connection encoding may or may not be available depending on the configuration of the HTTP server. The encoding is specified along with the content type in the content-type header, for example:

     content-type: text/html; charset=iso-8859-1
  3. Use the encoding in the encoding element of the configuration, if specified.

  4. For XML, the character encoding is determined automatically by the XML parser.

  5. For text, including HTML: use the default of iso-8859

When reading XML documents, the preferred method of determining the character encoding is to let either the connection or the XML parser auto detect the encoding. In some instances, it may be necessary to override the encoding. For this purpose, the force-encoding and encoding elements can be used to override this default behavior, for example:

<p:processor name="oxf:url-generator">
  <p:input name="config">
    <config>
      <url>oxf:/urlgen/note.xml</url>
      <content-type>application/xml</content-type>
      <encoding>iso-8859-1</encoding>
      <force-encoding>true</force-encoding>
    </config>
  </p:input>
  <p:output name="data" id="xml"/>
</p:processor>

This use should be reserved for cases where it is known that a document specifies an incorrect encoding and it is not possible to modify the document.

HTML example:

<p:processor name="oxf:url-generator">
  <p:input name="config">
    <config>
      <url>http://www.cnn.com</url>
      <content-type>text/html</content-type>
      <encoding>iso-8859-1</encoding>
    </config>
  </p:input>
  <p:output name="data" id="html"/>
</p:processor>

Note that only the following encodings are supported for HTML documents:

  • iso-8859-1

  • utf-8

Also note that use of the HTML <meta> tag to specify the encoding from within an HTML document is not supported.

HTTP headers

When retrieving a document from an HTTP server, you can optionally specify the headers sent to the server by adding one or more header elements, as illustrated in the example below:

<p:processor name="oxf:url-generator">
  <p:input name="config">
    <config>
      <url>http://www.cnn.com</url>
      <content-type>text/html</content-type>
      <header>
        <name>User-Agent</name>
        <value>Mozilla/5.0</value>
      </header>
      <header>
        <name>Accept-Language</name>
        <value>en-us,fr-fr</value>
      </header>
    </config>
  </p:input>
  <p:output name="data" id="html"/>
</p:processor>

In addition, you can provide a list of space-separated header names with the forward-headers element. Any header listed is automatically forwarded if it exists in the incoming request:

<forward-headers>Authorization SM_USER</forward-headers>

Headers specified with the header element have precedence over forward-headers.

Cache control

Local cache

It is possible to configure whether the URL generator caches documents locally in the Orbeon Forms cache. By default, it does. To disable caching, use the cache-control/use-local-cache element, for example:

<p:processor name="oxf:url-generator">
  <p:input name="config">
    <config>
      <url>http://www.cnn.com</url>
      <content-type>text/html</content-type>
      <cache-control><use-local-cache>false</use-local-cache></cache-control>
    </config>
  </p:input>
  <p:output name="data" id="html"/>
</p:processor>

Using the local cache causes the URL generator to check if the document is in the Orbeon Forms cache first. If it is, its validity is checked with the protocol handler (looking at the last modified date for files, the last-modified header for http, etc.). If the cached document is valid, it is used. Otherwise, it is fetched and put in the cache.

When the local cache is disabled, the document is never revalidated and always fetched.

Conditional GET

Usually, the URL generator does forced GET requests. You can enable conditional GETs with the cache-control/conditional-get element.

When conditional-get is set to true, and if the URL generator finds a corresponding resource in its local cache, it sends a conditional HTTP GET using the If-Modified-Since header. If the server responds with a code 304, the URL generator uses the resource it holds in cache, following usual HTTP semantics.

Example of configuration:

<p:processor name="oxf:url-generator">
  <p:input name="config">
    <config>
      <url>http://www.cnn.com</url>
      <content-type>text/html</content-type>
      <cache-control><conditional-get>true</conditional-get></cache-control>
    </config>
  </p:input>
  <p:output name="data" id="html"/>
</p:processor>

Relation to other settings:

  • When handle-xinclude is set to true, conditional-get is automatically overridden to false.

  • When conditional-get is set to true, use-local-cache is automatically overridden to true as well.

Authentication

The simplest way to handle authentication is to embed user names and passwords in the URL:

<p:processor name="oxf:url-generator">
  <p:input name="config">
    <config>
      <url>http://aUsername:aPassword@example.com</url>
    </config>
  </p:input>
  <p:output name="data" id="html"/>
</p:processor>

In that case the default authentication parameters are applied: preemptive authentication is used and forces the HTTP basic scheme.

If you don't want to embed user names and passwords in URLs or need more control over authentication schemes, you can use an authentication element:

<p:processor name="oxf:url-generator">
  <p:input name="config">
    <config>
      <url>http://example.com</url>
      <authentication>
        <username>aUsername</username>
        <password>aPassword</password>
        <preemptive>true|false</password>
        <domain>an NTLM domain name</domain>
      </authentication>
    </config>
  </p:input>
  <p:output name="data" id="html"/>
</p:processor>
  • The username and password are self explanatory and contain the username and password.

  • When preemptive is set to false, the preemptive mode is switched off and the URL generator will use a basic or digest scheme as requested by the server.

  • When the domain element is present the NTLM authentication scheme is used with this domain name.

Relative URLs

URLs passed to the URL generator can be relative. For example, consider the following pipeline fragment declared in a file called oxf:/my-pipelines/backend/import.xpl:

<p:processor name="oxf:url-generator">
  <p:input name="config">
    <config>
      <url>../../documents/claim.xml</url>
    </config>
  </p:input>
  <p:output name="data" id="file"/>
</p:processor>

In this case, the URL resolves to: oxf:/documents/claim.xml.

The resulting document consists of a document root element containing the text according to the . The following attributes are present:

NOTE: The URL generator performs streaming. It generates a stream of short character SAX events. It is therefore possible to generate an "infinitely" long document with a constant amount of memory, assuming the generator is connected to other processors that do not require storing the entire stream of data in memory, for example the [__SQL processor] (with an appropriate configuration to stream BLOBs), or the [_HTTP serializer]._

In this mode, the URL generator uses the to convert the incoming JSON content to XML.

In this mode, the URL generator uses a Base64 encoding to transform binary content into XML according to the . For example:

NOTE: The URL generator performs streaming. It generates a stream of short character SAX events. It is therefore possible to generate an "infinitely" long document with a constant amount of memory, assuming the generator is connected to other processors that do not require storing the entire stream of data in memory, for example the [__SQL processor] (with an appropriate configuration to stream BLOBs), or the [__HTTP serializer].

Resource Managers
text document format
3
4
XForms 2.0 conversion scheme
binary document format
3
4