XProc 2.0: An XML Pipeline Language

W3C Editor's Draft 13 April 2016 at 18:36 UTC (build 3)

This Version:: https://xquery.github.io/specification/langspec
Latest Version:: http://www.w3.org/TR/xproc20/
Editors:: Norman Walsh, MarkLogic Corporation <norman.walsh@marklogic.com>; Alex Milowski, Invited expert <alex@milowski.org>; Henry S. Thompson, University of Edinburgh <ht@inf.ed.ac.uk>
Repository:: This specification on GitHub; Report an issue
Changes:: Diff against current “status quo” draft; Commits for this specification

This document is also available in these non-normative formats: XML, automatic change markup from the previous draft courtesy of DeltaXML.

Abstract

This specification describes the syntax and semantics of XProc 2.0: An XML Pipeline Language, a language for describing operations to be performed on documents.

An XML Pipeline specifies a sequence of operations to be performed on documents. Pipelines generally accept documents as input and produce documents as output. Pipelines are made up of simple steps which perform atomic operations on documents and constructs similar to conditionals, iteration, and exception handlers which control which steps are executed.

Status of this Document

This document is an editor's draft that has no official standing.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document is an editor's draft without normative standing.

Please report errors in this document by raising issues on the specification repository. Alternatively, you may report errors in this document to the public mailing list public-xml-processing-model-comments@w3.org (public archives are available).

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 14 October 2005 W3C Process Document.

1 Introduction
2 Terminology
3 Understanding XProc 2.0
4 Step Chains
5 Inputs
- 5.1 URI inputs
- 5.2 Literal inputs
6 Output bindings
7 Step Declarations and Invocations
8 Block Expressions
9 Conditionals
10 Variables
11 Projections
12 Iteration
13 Replacement
14 Tee
15 Flow Declarations
16 XProc Modules
A Grammar

1 Introduction

An XProc Pipeline specifies a set of operations to be performed on a collection of input documents. Pipelines take documents as their input and produce documents as their output.

This document introduces an entirely new syntax for XProc:

It is a text-based syntax in UTF-8.
It is in some ways similar to XQuery; parts of the language may in fact end up being borrowed from or extensions of the XQuery 3.1 grammar.
It is inspired by XProc 1.0; it is attempting to solve the problems outlined in the requirements document for XProc 2.0.
We are hoping that this design will present fewer and easier conceptual hurdles than the XML syntax.

XProc remains a data flow language. There are steps, indivisible black boxes of computation, connected together into a graph. The connections are the bindings between the output(s) of one step and the input(s) of another.

Step declarations (see Section 7, “Step Declarations and Invocations”), resemble function declarations. The use of a step resembles a function invocation. This has an important consequence: the inputs and outputs are both named and ordered. If the declaration for the xslt step has two inputs, source and stylesheet, declared in that order, then it is coherent to speak of them both by name and by ordinal position: the source input is first, the stylesheet input is second.

2 Terminology

All of the terminology in this document is subject to revision. Some of the terms are better and more clearly defined than others. We present a list of terms here so that you will recognize them when they first appear. Most are explained in more detail, but in this draft, not always at the point of first use.

Some terms, such as port, denote the same concept in XProc 2.0 that they did in XProc 1.0. For conciseness, we use those terms in this draft without further explanation.

pipeline

A pipeline is a collection operations to be performed on a set of input documents. Each pipeline consists of steps and connections between them. The connections may be strictly linear, they may branch (many steps connected to the output of a single step), or they may join (a single step connected to the output of many steps). A pipeline is consequently a directed, acyclic graph. A pipeline can contain entirely disjoint subgraphs.

step

A step is a unit of operation in a pipeline. From the perspective of the pipeline in which it occurs, it is an atomic operation.

connection

A connection occurs between two steps where the output of one step is consumed by the input of another. Steps may have zero or more connections, if they do have connections, they may have one or more connection to any number of steps.

flow

A flow is a portion of the data flow graph that the pipeline represents. Writing programs in a text file is inherently linear while graphs are not. From a graph theoretic perspective, the graph is being cut into a set of flows. If you don’t have a background in graph theory, don’t worry about that bit, it’s not important that you understand it in those terms.

step chain

A step chain is a strictly linear sequence of steps. Step chains are the most significant unit of work in a pipeline. Syntactically, a step chain is represented with the arrow operator (i.e., -> or →, U+2192).

The left hand side of an arrow is always a step or an input binding. The right hand side of an arrow is always a step, an input binding for the next step, or a block expression.

input binding

Every step has an input binding; it is the set of input ports from which the step will read to get its input. The simplest input binding is an item. A binding can be anonymous, in which case it binds to the first input port, or it can be named, in which case it binds to the named input port. Assuming that $in is a variable bound to a document:

$in → binds $in to the first input of the step on the right hand side of the arrow.
source=$in → binds $in to the source input of the step on the right hand side of the arrow.

If a step has more than one input, the set of inputs is enclosed in square brackets:

[$in,$style] → binds $in to the first input and $style to the second input of the step on the right hand side of the arrow.
[source=$in, stylesheet=$style] → binds $in to the source input and $style to the stylesheet input on the right hand side of the arrow.

In an input binding, a literal quoted string is interpreted as a URI reference with the semantic that the URI is dereferenced and the document returned is used as the input.

"document.xml" → binds the contents of document.xml to the first input of the step on the right hand side of the arrow.

output binding

Steps have outputs that can be accessed by output bindings or implictly as via ordinal bindings.

In an input binding between two steps or a block expression, the output bindings of the preceding step are accessed ordinally, $1 is the first output binding, $2 is the second, etc.

Note

At the moment, it isn’t possible to refer to output bindings of the preceding step by name.

At the end of a step chain, the output binding operator (>> or ≫, U+226B), assigns outputs to variables:

≫ $out binds $out to the document sequence that appears on the first output port of the step it follows.
≫ result=$out binds $out to the result output of the step it follows.

If a step has more than one output, the set of outputs can be enclosed in square brackets:

≫ [$out,$chunks] binds $out to the document sequence that appears on the first output port of the step it follows and $chunks to the documents that appear on the second output port.
≫[secondary=$chunks,result=$out] binds $out to the documents on the result output port and $chunks to the documents on the secondary output port.

variables

A variable, $varname is a lexically scoped reference to a sequence of XDM items or an output port. The distinction is not significant in everyday usage, pipeline authors can always think of a variable as denoting a sequence of items. However, from an implementation perspective, an output port variable represents a connection in the graph to every step that references that variable. An analysis of input bindings and output bindings may allow a processor to build an entirely streaming implementation of (some) pipelines that never need to reify output port variables.

block expressions

A block expression is an inlined expression that can perform some relatively small amount of computation. It’s possible to describe the XProc flow language without block expressions; every block expression can be turned into a step that implements the expression. But that would be very tedious in practice.

Note

The expression language inside the block expression is imagined as a subset of XQuery 3.1. The exact nature of the subset is still being discussed. The canonical expression is an if/then/else test.

Block expressions occupy the position of a step in a step chain, consequently they may consume the outputs from the immediately preceding step (if there is one), and they may produce outputs.

Within a block expression, the syntax of the output binding operator is extended slightly. The outputs of the block expression are referenced ordinally using an @-sign; ≫ @1 writes to the first output, ≫ @2 writes to the second output, etc.

3 Understanding XProc 2.0

A flow is a description of the data flow graph. On the left and right sides of a chain of steps, we use structuring and de-structuring to assign ports to variables. The result is a variable that (logically) denotes a sequence of items.

Steps have declarations but are not described otherwise within the flow. An implementation can associate a step signature with an implementation in their domain language by matching function signatures.

Step invocations can be chained together by ordering them in a sequence connected by the arrow operator (i.e., -> or → U+2192). A step chain must fully specify the input port bindings along with any required options.

Let’s begin with an example pipeline. This is the “example 3” pipeline from the XProc 1.0 specification.

Example 1. An Example

xproc version = "2.0"; ❶

(: This example is from the XProc 1.0 specification (example 3). :)

 inputs $source as document-node(); ❷
outputs $result as document-node(); ❸

$source → { if (xs:decimal($1/*/@version) < 2.0) ❹
            then [$1,"v1schema.xsd"] → validate-with-xml-schema() ≫ @1 ❺
            else [$1,"v2schema.xsd"] → validate-with-xml-schema() ≫ @1}
        → [$1,"stylesheet.xsl"] → xslt() ❻
≫ $result ❼

❶: The declaration that begins an XProc 2.0 pipeline
❷: The pipeline inputs can be declared externally
❸: So can the outputs
❹: Inside this block $1 refers to the first input, in this case $source.
❺: Using @1 writes the validated result to the first output of this block expression
❻: The first (in this case only) output from the block expression is used as the first input to the xslt step.
❼: The final output binding writes the result of the pipeline to the $result output.

4 Step Chains

A step chain is a sequence of step invocations separated by the chain operator (i.e., -> or → U+2192). On the left of the chain operator is always a preceding step or input bindings. On the right must be a step invocation, a block expression, or an optional output binding.

The simplest input binding is a single expression that evaluates to a sequence of one or more items. For example, the document(s) bound to $in can be an input binding for the XInclude step:

$in → xinclude()

If a step takes multiple inputs, the individual bindings must be surrounded by square brackets:

["document.xml", "style.xsl"] → xslt()

In a binding with multiple inputs, the first input is bound to the first input port (in declaration order), the second input to the second port, etc. If necessary, or for clarity, a binding may be preceded by a name assignment that explicitly names a port:

[source="document.xml", stylesheet="style.xs"] → xslt()

If positional and name references are mixed, all positional references must precede the first named reference.

Steps produce some number of outputs on named ports. The outputs of a step invocation immediately preceding the chain operator are available as numbered inputs $1, $2, etc. whose order is the order of the output declarations on the step. For example, the xslt step has two output ports, result and secondary, declared in that order. Following an xslt step, $1 refers to the result port and $2 refers to the secondary port.

$in → xinclude() → [$1,"stylesheet.xsl"] → xslt()

A reference to an ordinal port that does not exist produces an empty sequence of documents.

Note

This is an explicit relaxation of the rules in XProc 1.0 where all bindings had to be composed statically, exactly, and perfectly. It facilitates the use of block expressions where the number of outputs may not always be the same. This explicitly relaxes the rule that all of the outputs from a conditional must be identical.

Note

It may be necessary to provide a function or other mechanism for testing at runtime if a reference to $3 (for example) is empty because the third output port produced an empty sequence or because there was no third output port.

If two steps are connected together without an intervening input binding, the implicit input binding is that the ports are connected ordinally:

→ [$1,$2,$3,…$n] →

So this flow:

$in → xinclude() → store("included.xml")

is equivalent to this one:

$in → xinclude() → [$1] → store("included.xml")

5 Inputs

Inputs…

5.1 URI inputs

A literal string in a port binding is a URI reference and the resource identified by the URI will be loaded and bound to the port.

"doc.xml" → xinclude()

An input can also be a sequence of documents using matching parens:

("d1.xml","d2.xml","d3.xml") → xinclude()

Expressions and literals may be mixed to produce new sequences:

($in,"doc.xml") → xinclude()

Step inputs can be combined:

[collection=($main,$secondary), query="query.xq"] → xquery()

and can be used in more complex expressions:

[$in,"stylesheet.xsl"] → xslt() → [($1,$2),"query.xq"] → xquery()

5.2 Literal inputs

A literal can be specified using a media-type specific data constructor. For example, a data constructor may construct a JSON object by include the object within the curly braces:

data "application/json" {
   {
      name: "Alex",
      favoriteColor: "orange"
   }
}

JSON array construction is also allowed:

data "application/json" { [ 1,2,3,4] }

An XML element may be constructed by embedding the literal within the curly braces:

data "application/xml" { <doc><title>A test</title></doc> }

An HTML element can be similarly constructed:

data "text/html" {
    <!DOCTYPE html>
    <html>
    <head><title>Template</title>
    <link type="text/css" href="style.css">
    </head>
    <body>…</body>
    </html>
}

Text may also be directly embedded:

data "text/plain" { "Now is the time for all good XProc …" }

Note

AVT expansion and curly brace escaping are unspecified here.

Processors are free to extend literal construction with the constraint that the format can be unambiguously embedded within curly braces.

6 Output bindings

The output binding operator (i.e., '>>' or ≫ U+226B) takes a step chain or port variable reference on the left hand side and binds the output to the right hand side (i.e., a port variable reference, a URI reference, or an ordered port ordinal.). The output binding operator is used to construct more complex chains of data flows, store results, or write to output ports for returning results.

The symbol “≫” is evocative of the “append” operation familiar from many command-line systems. An output binding appends data to its right hand side in the sense that it causes data to be sent there and if several chains cause data to be sent to the same place, the effect will be logical appending.

The identity assignment is performed by simply binding the input to the output:

$in ≫ $out

The result is all the input on $in is sent to the output port $out as it flows through the graph.

A literal URI reference implies a document store:

"doc.xml" → xinclude() ≫ "included.xml"

In the case of implicit store, if the same output URI is used more than once, the result of sending a sequence there is implementation defined (e.g., the last document written).

If the outputs need to be referenced as inputs elsewhere, they can be assigned to variables:

$in → xinclude() ≫ $included
[$included,"schema.xsd"] → validate-with-xml-schema()
[$included,"stylesheet.xsl"] → xslt() ≫ [result=$out,secondary=$chunks]

Variables assigned in this way can be used like any other variable in expressions, but the implementation must enforce the following semantics:

Any reference to the variable must return all the documents written by all of the step chains that write to that variable
All of the documents written by any single step chain must be adjacent in the resulting sequence and must be in the order written by the ultimate step in that chain.
Any referencial circularity raises a static error

For example, the following has two documents flowing through $included:

"doc1.xml" → xinclude() ≫ $included
"doc2.xml" → xinclude() ≫ $included
$included → validate-with-xml-schema()

The first two step chains are independent and the processor is free to run them in either order, or in parallel. However, what is passed to validate-with-xml-schema() when the $included variable is referenced must be all of the documents written by the first chain followed by all of the documents written by the second, or vice versa.

The names of output ports can be omitted in which case the assignments are taken in declaration order. For example, the XSLT step declares the result port first and the secondary port second. An explicit set of bindings:

[source=$in,stylesheet="stylesheet.xsl"] → xslt() ≫ [result=$out,secondary=$chunks]

can be shortened to:

[$in,"stylesheet.xsl"] → xslt() ≫ [$out,$chunks]

Within any context, every declared output port has an unnamed ordinal. Some expressions (e.g. block expressions) have implicitly declared output ports.

The ordinals can be referenced by name as @1, @2, etc.

$in → { if ($1/doc/cheese='cheddar')
        then consume() ≫ @1
        else reject() ≫ @1 }
     ≫ $out

7 Step Declarations and Invocations

All steps are declared as external procedures with any number of named inputs and outputs.

step my:computation()
 inputs $source as document-node(),
outputs $result as xs:int*;

Steps are always declared with qualified name. When they are are invoked, a default namespace may be assumed by the processor.

Steps may have any number of options that can be optional and defaulted:

step p:xslt(
  $initial-mode as xs:string ?,
  $template-name as xs:string?,
  $output-base-uri as xs:string?,
  $parameters as map()? = (),
  $version as xs:string = "2.0"
)
   inputs $source as document-node()+,
          $stylesheet as document-node()
   outputs $result as document-node()?,
           $secondary as document-node()*;

All required options must be listed first in the declaration.

Options values are specified on invocation. Any unnamed option values are matched in declaration order. Afterwards, all parameters must be specified with a name.

For example:

xslt("toc",$output-base-uri=base-uri($source))

invokes the xslt step with the option value "toc" for $initial-mode and explicitly named value for $output-base-uri but does not specify a value for $template-name. The value of $version is defaulted to "2.0".

8 Block Expressions

A step chain may contain a block expression. A block expression always has a ordinal set of inputs and outputs. The inputs are assigned from the context of the expression in the chain. The outputs are assigned based on the flow contained within the expression.

A block expression is enclosed within a set of curly brackets and contains any number of step chains or other statements.

9 Conditionals

A conditional may be placed within a step chain when surrounded by curly brackets:

$in → { if ($1/*/@version eq "v1.0")
        then [$1,"crummy.xsl"] → xslt() ≫ @1
        else [$1,"better.xsl"] → xslt() ≫ @1 }
    ≫ $out

When the if/then expression is invoked, it acts as a guard on the flows contained within the clause. Only one of the flows will execute.

The outputs of the block are completely determined by the flows executed. If they do not append any output to the ordinal outputs of the block expression, the expression will not have any output. That is, there is no implicit chaining of outputs.

Note

What functions are available in the test conditional? Can I use last or position for example?

10 Variables

Within curly bracketed expressions, a let clause may be use to assign variables to values:

$in → {
   let $version := xs:int($1/*/@version) {
      if ($version < 2)
      then [$1,"schema1.xsd"] → validate-with-xml-schema() ≫ @1
      else if ($version < 3)
      then [$1,"schema2.xsd"] → validate-with-xml-schema() ≫ @1
      else fail("No schema available")
   }
} ≫ $out

The variables share the same scope as port variable references but cannot be used within append operators on the right side.

For example, this is not legal:

$in → {
   let $dates := xs:dateTime($1/*/updated) {
     [$1,"schema1.xsd"] → validate-with-xml-schema() ≫ [@1,$dates]
   }
}

but you can do this:

$in → {
   let $dates := xs:dateTime($1/*/updated) {
     $dates >> @2
     [$1,"schema1.xsd"] → validate-with-xml-schema() ≫ @1
   }
}

11 Projections

A source can be turned into a sequence by an expression. The result is a port that contains a sequence of items.

For example:

$in//section → count() ≫ $out

assigns the count of section element subtrees.

Note

ndw: I still thinks it would be better to have a step that does this; then there can be an xpath() step, a jsonpath() step, a csv() step, etc. rather than building the semantics of projection into our expression language.

12 Iteration

Iteration is a core operation and can be embedded within a step chain with the ! operator. For example:

("d1.xml","d2.xml","d3.xml") ! { [$1,"schema.xsd"] → validate-with-xml-schema() ≫ @1}

validates the three documents contained in the sequence.

The result of an iteration operation is a set of output bindings where the first binding contains all of the documents written to @1, the second all of the documents written to @2, etc.

13 Replacement

Note

formerly know as "viewports"

A portion of a document can be iterated over and replaced by an embedded step chain. The replace operator requires a single input, an expression, and a step chain body.

For each subtree matched, the block expression is run with the subtree on the positional input port $1. The item on the positional output port @1 will be its replacement.

It is an error if the block expression does not produce a replacement.

Note

ndw: I think this error is in conflict with our earlier rule that attempting to read a port that wasn’t used returns the empty sequence. I think if step chain body doesn’t write to @1, the replacement is simply the empty sequence.

For example:

$in → replace (/doc/section) { [$1,"style.xsl"] → xslt() ≫ @1 } ≫ $out

applies XSLT over a subtree.

14 Tee

A chain can have an alternate flow embedded within the chain using the tee operator (tee or ⊤). The flow must be a block expression. The outputs following tee expression are exactly the same as if the tee operator had been omitted.

In the following example, the result of the xinclude step is stored via an tee operator and that result is also transformed by the xslt step.

$in → xinclude() ⊤ { $1 ≫ "included.xml" }
    → [$1,"stylesheet.xsl"] → xslt()
    ≫ $out

15 Flow Declarations

A flow can named and reused:

flow my:process
   inputs $source as document-node(),
  outputs $result as document-node() {
    $source → xinclude() → [$1,"stylesheet.xsl"] → xslt() ≫ $result
 };

"doc.xml" → my:process() ≫ "doc.html"

16 XProc Modules

XProc modules are top-level containers for reuse. Every XProc module must start with:

xproc version = "2.0";

A module consists of a version declaration (above), a set of declarations, and a single optional unnamed flow description.

A module may end with a flow description. The inputs and outputs of that port must be provided by the implementation when the module is invoked.

A module may import other declarations via the import statement:

import "library.xpl";

A module may import declarations in the expression language:

import "functions.xq";

A module may also declare options as parameters to the module.

option $user as xs:string;
option $passwd as xs:string;

A must provide declarations for any undefined inputs and outputs to the flow:

 inputs $source as document-node();
outputs $result as document-node();

A Grammar

XProc ::= XProcModule EOF

XProcModule ::= XProcVersionDecl? XProcProlog XProcFlow?
XProcVersionDecl ::= 'xproc' 'version' '=' StringLiteral XProcSeparator

XProcProlog   ::= ( ( XProcDefaultNamespaceDecl | XProcNamespaceDecl | XProcImport ) XProcSeparator )*
                  (XProcInputs XProcSeparator)?
                  (XProcOutputs XProcSeparator)?
                  (XProcOptionDecl XProcSeparator)*
                  (XProcStepDecl XProcSeparator)*

XProcImport ::= 'import' XProcURILiteral

XProcInputs ::= 'inputs' XProcParamList

XProcOutputs ::= 'outputs' XProcParamList

XProcOptionDecl ::= 'option' XProcParam

XProcStepDecl ::= 'step' FunctionName '(' XProcParamList? ')' XProcInputs? XProcOutputs?

XProcFlow ::= XProcFlowStatement+

XProcFlowStatement ::= ( XProcStepChain XProcOutputBinding? ) | XProcIfStatement | XProcLetStatement

XProcStepChain ::= (XProcSequenceLiteral XProcStepChainItem+) |
                   ((XProcStepInvocation | XProcBlockStatement) XProcStepChainItem*) 

XProcStepChainItem ::= XProcChainedItem | XProcIteratedItem | XProcTeedItem | XProcReplacedItem

XProcSequenceLiteral ::= XProcSequenceItem | '(' XProcSequenceItem ( ',' XProcSequenceItem )* ')'

XProcSequenceItem ::= XProcURILiteral | XProcPortInput

XProcChainedItem ::= XProcArrow XProcChainItem
XProcIteratedItem ::= XProcIteration XProcBlockStatement
XProcTeedItem ::= XProcTee XProcBlockStatement XProcChainedItem
XProcReplacedItem ::= XProcReplace '(' PathExpr ')' XProcBlockStatement

XProcChainItem ::=  XProcStepInvocation | XProcBlockStatement

XProcArrow ::= '=>' | '→'

XProcIteration ::= '!'

XProcTee ::= 'tee' | '⊤'

XProcReplace ::= 'replace'

XProcInputPortList ::= '[' XProcInputPortBinding ( ',' XProcInputPortBinding )* ']'

XProcInputPortBinding ::= (QName '=')? XProcSequenceLiteral

XProcInputOrdinal ::= '$' IntegerLiteral

XProcOutputBinding ::= XProcAppend ( XProcOutputItem | XProcOutputPortList)

XProcAppend ::= '>>' | '≫'

XProcOutputItem ::= XProcURILiteral | XProcPortRef | XProcOutputOrdinal

XProcOutputOrdinal ::= '@' IntegerLiteral

XProcOutputPortList ::= '[' XProcOutputPortBinding ( ',' XProcOutputPortBinding) ']'

XProcOutputPortBinding ::= (QName '=')? XProcOutputItem

XProcPortInput ::= ( XProcPortRef | XProcInputOrdinal ) XProcProjection?

XProcPortRef ::= '$' QName 

XProcProjection ::= '/' ( RelativePathExpr / ) | '//' RelativePathExpr

XProcStepInvocation ::= (XProcInputPortList XProcArrow)? XProcStepName XProcArgumentList

XProcStepName ::= QName

XProcArgumentList ::= ArgumentList

XProcBlockStatement ::= '{' XProcFlow? '}'

XProcIfStatement ::= 'if' '(' ExprSingle ')' 'then' XProcFlowStatement 'else' XProcFlowStatement

XProcLetStatement ::= 'let' XProcLetBinding ( ',' XProcLetBinding )*  XProcLetBody

XProcLetBody ::= XProcBlockStatement

XProcLetBinding
         ::= '$' VarName XProcTypeDeclaration? ':=' ExprSingle

XProcNamespaceDecl
         ::= 'declare' 'namespace' NCName '=' XProcURILiteral
XProcDefaultNamespaceDecl
         ::= 'declare' 'default' 'namespace' XProcURILiteral

XProcParamList
         ::= XProcParam ( ',' XProcParam )*

XProcParam    ::= '$' QName XProcTypeDeclaration?

XProcTypeDeclaration ::= 'as' SequenceType

XProcSeparator ::= ';'

XProcURILiteral ::= StringLiteral



Expr     ::= ExprSingle ( ',' ExprSingle )*
ExprSingle ::= OrExpr

OrExpr   ::= AndExpr ( 'or' AndExpr )*
AndExpr  ::= ComparisonExpr ( 'and' ComparisonExpr )*
ComparisonExpr
         ::= StringConcatExpr ( ( ValueComp | GeneralComp | NodeComp ) StringConcatExpr )?
StringConcatExpr
         ::= RangeExpr ( '||' RangeExpr )*
RangeExpr
         ::= AdditiveExpr ( 'to' AdditiveExpr )?
AdditiveExpr
         ::= MultiplicativeExpr ( ( '+' | '-' ) MultiplicativeExpr )*
MultiplicativeExpr
         ::= UnionExpr ( ( '*' | 'div' | 'idiv' | 'mod' ) UnionExpr )*
UnionExpr
         ::= IntersectExceptExpr ( ( 'union' | '|' ) IntersectExceptExpr )*
IntersectExceptExpr
         ::= InstanceofExpr ( ( 'intersect' | 'except' ) InstanceofExpr )*
InstanceofExpr
         ::= TreatExpr ( 'instance' 'of' SequenceType )?
TreatExpr
         ::= CastableExpr ( 'treat' 'as' SequenceType )?
CastableExpr
         ::= CastExpr ( 'castable' 'as' SingleType )?
CastExpr ::= UnaryExpr ( 'cast' 'as' SingleType )?
UnaryExpr
         ::= ( '-' | '+' )* ValueExpr
ValueExpr
         ::= SimpleMapExpr
GeneralComp
         ::= '='
           | '!='
           | '<'
           | '<='
           | '>'
           | '>='
ValueComp
         ::= 'eq'
           | 'ne'
           | 'lt'
           | 'le'
           | 'gt'
           | 'ge'
NodeComp ::= 'is'
           | '<<'
           | '>>'
           
SimpleMapExpr
         ::= PathExpr ( '!' PathExpr )*
PathExpr ::= '/' ( RelativePathExpr / )
           | '//' RelativePathExpr
           | RelativePathExpr
RelativePathExpr
         ::= StepExpr ( ( '/' | '//' ) StepExpr )*
StepExpr ::= PostfixExpr
           | AxisStep
AxisStep ::= ( ReverseStep | ForwardStep ) PredicateList
ForwardStep
         ::= ForwardAxis NodeTest
           | AbbrevForwardStep
ForwardAxis
         ::= 'child' '::'
           | 'descendant' '::'
           | 'attribute' '::'
           | 'self' '::'
           | 'descendant-or-self' '::'
           | 'following-sibling' '::'
           | 'following' '::'
AbbrevForwardStep
         ::= '@'? NodeTest
ReverseStep
         ::= ReverseAxis NodeTest
           | AbbrevReverseStep
ReverseAxis
         ::= 'parent' '::'
           | 'ancestor' '::'
           | 'preceding-sibling' '::'
           | 'preceding' '::'
           | 'ancestor-or-self' '::'
AbbrevReverseStep
         ::= '..'
NodeTest ::= KindTest
           | NameTest
NameTest ::= EQName
           | Wildcard
PostfixExpr
         ::= PrimaryExpr ( Predicate | ArgumentList )*
ArgumentList
         ::= '(' ( Argument ( ',' Argument )* )? ')'
PredicateList
         ::= Predicate*
Predicate
         ::= '[' Expr ']'

PrimaryExpr
         ::= Literal
           | VarRef
           | XProcInputOrdinal
           | ParenthesizedExpr
           | ContextItemExpr
           | FunctionCall

ParenthesizedExpr
         ::= '(' Expr? ')'
ContextItemExpr
         ::= '.'

FunctionCall
         ::= FunctionName ArgumentList

Literal  ::= NumericLiteral
           | StringLiteral
NumericLiteral
         ::= IntegerLiteral
           | DecimalLiteral
           | DoubleLiteral
VarRef   ::= '$' VarName
VarName  ::= EQName

Argument ::= ExprSingle
           | ArgumentPlaceholder
ArgumentPlaceholder
         ::= '?'


EQName   ::= QName
           | URIQualifiedName

SingleType
         ::= SimpleTypeName '?'?
SequenceType
         ::= ItemType ( OccurrenceIndicator / )
OccurrenceIndicator
         ::= '?'
           | '*'
           | '+'
ItemType ::= KindTest
           | 'item' '(' ')'
           | MapTest
           | ArrayTest
           | AtomicOrUnionType
           | ParenthesizedItemType
AtomicOrUnionType
         ::= EQName

KindTest ::= DocumentTest
           | ElementTest
           | AttributeTest
           | PITest
           | CommentTest
           | TextTest
           | NamespaceNodeTest
           | AnyKindTest
AnyKindTest
         ::= 'node' '(' ')'
DocumentTest
         ::= 'document-node' '(' ElementTest? ')'
TextTest ::= 'text' '(' ')'
CommentTest
         ::= 'comment' '(' ')'
NamespaceNodeTest
         ::= 'namespace-node' '(' ')'
PITest   ::= 'processing-instruction' '(' ( NCName | StringLiteral )? ')'
AttributeTest
         ::= 'attribute' '(' ( AttribNameOrWildcard ( ',' TypeName )? )? ')'
AttribNameOrWildcard
         ::= AttributeName
           | '*'
ElementTest
         ::= 'element' '(' ( ElementNameOrWildcard ( ',' TypeName '?'? )? )? ')'
ElementNameOrWildcard
         ::= ElementName
           | '*'
AttributeName
         ::= EQName
ElementName
         ::= EQName
SimpleTypeName
         ::= TypeName
TypeName ::= EQName

MapTest  ::= AnyMapTest
           | TypedMapTest
AnyMapTest
         ::= 'map' '(' '*' ')'
TypedMapTest
         ::= 'map' '(' AtomicOrUnionType ',' SequenceType ')'
ArrayTest
         ::= AnyArrayTest
           | TypedArrayTest
AnyArrayTest
         ::= 'array' '(' '*' ')'
TypedArrayTest
         ::= 'array' '(' SequenceType ')'
ParenthesizedItemType
         ::= '(' ItemType ')'

QName    ::= FunctionName
           
FunctionName
         ::= QName^Token

NCName   ::= NCName^Token

Whitespace
    ::= S^WS | Comment
    /* ws: definition */

Comment
    ::= '(:' ( CommentContents | Comment )* ':)'
    /* ws: explicit */


IntegerLiteral
         ::= Digits
DecimalLiteral
         ::= '.' Digits
           | Digits '.' [0-9]*
          /* ws: explicit */
DoubleLiteral
         ::= ( '.' Digits | Digits ( '.' [0-9]* )? ) [eE] [+#x002D]? Digits
          /* ws: explicit */
StringLiteral
         ::= '"' ( PredefinedEntityRef | CharRef | EscapeQuot | [^"&] )* '"'
           | "'" ( PredefinedEntityRef | CharRef | EscapeApos | [^'&] )* "'"
          /* ws: explicit */
URIQualifiedName
         ::= BracedURILiteral NCName
          /* ws: explicit */
BracedURILiteral
         ::= 'Q' '{' ( PredefinedEntityRef | CharRef | [^&{}] )* '}'
          /* ws: explicit */
PredefinedEntityRef
         ::= '&' ( 'lt' | 'gt' | 'amp' | 'quot' | 'apos' ) ';'
          /* ws: explicit */
EscapeQuot
         ::= '""'
EscapeApos
         ::= "''"
NameStartChar
         ::= ':'
           | [A-Z]
           | '_'
           | [a-z]
           | [#x00C0-#x00D6]
           | [#x00D8-#x00F6]
           | [#x00F8-#x02FF]
           | [#x0370-#x037D]
           | [#x037F-#x1FFF]
           | [#x200C-#x200D]
           | [#x2070-#x218F]
           | [#x2C00-#x2FEF]
           | [#x3001-#xD7FF]
           | [#xF900-#xFDCF]
           | [#xFDF0-#xFFFD]
           | [#x10000-#xEFFFF]
NameChar ::= NameStartChar
           | '-'
           | '.'
           | [0-9]
           | #x00B7
           | [#x0300-#x036F]
           | [#x203F-#x2040]
Name     ::= NameStartChar NameChar*
CharRef  ::= '&#' [0-9]+ ';'
           | '&#x' [0-9a-fA-F]+ ';'
NCName   ::= Name - ( Char* ':' Char* )
QName    ::= PrefixedName
           | UnprefixedName
PrefixedName
         ::= Prefix ':' LocalPart
UnprefixedName
         ::= LocalPart
Prefix   ::= NCName
LocalPart
         ::= NCName
S        ::= ( #x0020 | #x0009 | #x000D | #x000A )+
Char     ::= #x0009
           | #x000A
           | #x000D
           | [#x0020-#xD7FF]
           | [#xE000-#xFFFD]
           | [#x10000-#x10FFFF]
Digits   ::= [0-9]+
CommentContents
         ::= ( ( Char+ - ( Char* ( '(:' | ':)' ) Char* ) ) - ( Char* '(' ) ) &':'
           | ( Char+ - ( Char* ( '(:' | ':)' ) Char* ) ) &'('
EOF      ::= $
Wildcard ::= '*'
           | NCName ':' '*'
           | '*' ':' NCName
           | BracedURILiteral '*'

XProc 2.0: An XML Pipeline Language

W3C Editor's Draft 13 April 2016 at 18:36 UTC (build 3)

Abstract

Status of this Document

Table of Contents

1 Introduction

2 Terminology

Note

Note

3 Understanding XProc 2.0

4 Step Chains

Note

Note

5 Inputs

5.1 URI inputs

5.2 Literal inputs

Note

6 Output bindings

7 Step Declarations and Invocations

8 Block Expressions

9 Conditionals

Note

10 Variables

11 Projections

Note

12 Iteration

13 Replacement

Note

Note

14 Tee

15 Flow Declarations

16 XProc Modules

A Grammar