Advanced Automated Authoring with XML (XML Prague 2009)

Abstract

This article proposes a set of powerful XML technologies to automate authoring of large, detailed and highly visual documentation which would be difficult and error prone to reproduce manually. The author further proposes best-practices for XML authoring and introduces a simple yet powerful framework which supports tasks typically related to document publishing and integration of information from various sources.

Rather than building a complex theoretical background this article focuses on being very practical. It demonstrates the use of various technologies on a case study taken from the networking industry.


Introduction

There are many reasons why to use semantic automated authoring tools rather than using presentational visual word processors. As there are loads of articles on this topic, there is no reason to go into details. The following text summarizes only the most important pros and cons.

Why doing automated authoring

Content separated from style

this allows pluggable styles; the same document is produced with a completely different styling or suited for a different media without doing any changes to it

Auto-generation

many document elements such as table of contents, numbering, contextual headings etc... are auto-generated for the author

Professional looking outputs

follows typesetting standards, uses hyphenation, advanced kerning etc...

Highly customizable output

only the medium is the limit

Modularity

documents may be split into smaller chunks and than combined in variable ways, this helps to avoid any duplicities which are hard to maintain

    Why NOT doing automated authoring

    Visual editing

    to the authors knowledge, there are no tools which would allow semantic editing of documents with the same level of visual user experience as presentational word processors

      This article demonstrates that publishing automation does not end with auto generated ToCs or references. There is a far greater potential in specific applications. It introduces a specific problem domain and shows how DocBook [DB] and other excellent XML technologies has been utilised and extended to automate the authoring process as much as possible.

      Publishing automation is achieved through a framework which is basically a pure XML (some would say POX) application. Data are described in XML domain specific language, transformations are defined in XSLT [XSLT] and the procedural aspect is expressed using ANT [ANT] build files, again in XML format.

      This article is basically a celebration of XML. It shows what immense flexibility is gained when semantics is attached to data.

      The key benefits of what is being proposed in this article

      • no redundancy in data

      • highly specific domain model perfectly suited and intended for instant changes

      • professional looking typesetting

      • 100% control over the output

      • potential applications beyond publishing

      The case study

      The author of this article works for a company which implements large scale networking projects like nation wide backbones or inter-bank networks.

      Even the projects differ a lot from each other they share a significant portion of the domain model.

      Common elements of the domain model

      • Each project has one or more sites in one or more cities. Each site has a geographical location, contact information and other properties.

      • Sites are usually interconnected using networking protocols with specific configuration.

      • There is a set of hardware devices at each site, some of them installed in racks.

      • Network with certain criteria is modelled. This includes different communication technologies and protocols at each network layer. For example IP address plan is modelled at layer 3.

      The aim of an automated authoring system is to support the project from the very beginning — the proposal stage — until the project is hand over to the customer with detailed documentation. The same domain model is used through different stages of the project life cycle and different kind of documents are being generated out of it to support the individual stages.

      Documents auto-generated from the same data during the project life-cycle phases

      1. Proposal

        • Commercial Proposal

        • Technical Proposal

      2. Design

        • Network design

        • Per-site documentation

      3. Implementation

        • Time and progress planning

        • Installation guides

        • Compliance testing How-Tos

        • Detailed per-site documentation

      4. Support

        • Inventory registry

        • Network status

      Domain modelling

      DocBook [DB] is a perfectly suitable grammar for describing documents. But event the desired output are in fact various documents, it makes a good sense to model the domain first in a domain specific language and than transform it automatically into DocBook in the next stage.

      Expressing the data in a well designed and highly specific language will always be beneficial over the use of a generic purpose grammar. Most of all, such data are easier to express, understand, change, reuse and validate.

      Modelling domains in XML has several specific characteristics over object oriented programming or UML modelling. In addition, the grammar designer needs to find answers to the following design questions:

      • Use rather attributes or elements to model a certain aspect of the domain

      • Use ID references or rather prefer tree structures

      • Direction of ID references between entities

      Those decisions significantly influence the ease of entering new data, maintaining them and understanding them. It also determines how difficult it is to work with such data in terms of expressing transformations or validation rules. In many cases those two concerns go against each other.

      Figure 1. High level overview of the case study domain model

      High level overview of the case study domain model


      Obviously it is far more important to design an elegant DSL where expressing data is straightforward. Complexity in processing may always be reduced by applying simplification transformations to the data first, before further processing[1].

      Schema and validation

      Usually the domain model is the most rapidly changing part of any IT application or system. Therefore it is important to decide how loosely or strictly we like to define the model. If every little change needs to be propagated into the processing XSLT stylesheets and the schema, such change requires a lot of time and resources.

      It took several projects until the domain model for networking projects described in the section called “The case study” settled down in some more stable form. Thinking that you can model a perfect language from the beginning, create a perfect schema for it and use it unchanged is a pure fiction.

      Having an XML Schema defined for your grammar from the beginning may be a maintenance overhead. Every time there is a change in the model, not only current data needs to be migrated, but also the schema and stylesheets need adjustments.

      On the other hand, validation of data is important. There has to be some way to know whether changing a certain aspect of the structure of the input data will cause some XSLT stylesheets malfunction. Moreover, even a perfectly designed grammar cannot guarantee data consistency[2]. In the networking domain, there are many potential data inconsistencies. For example an IP address is assigned to a network where it does not belong to because of the very nature of the IP protocol.

      This article proposes validating data loosely[3] as an alternative to the traditional strict validation approach. A perfect language to define loose schemas is Schematron. In Schematron everything is allowed by default, unless a rule exists which says otherwise. Moreover, using the full power of XPath, Schematron is able to express even very complicated rules which operate over multiple contexts in the source document. Probably the biggest advantage of Schematron over grammar-based schema languages is the ability to output domain specific highly descriptive diagnostics [4]. Find more about Schematron and its unique features in [VAL], [RUL] and [SCH].

      The loose Schematron schema for the networking domain checks only those aspects of the XML structure which are necessary for the underlying stylesheets to work properly. In addition it contains high level consistency checks with verbose domain specific diagnostic messages.

      Also XSLT stylesheets themselves may be written in a way to make them as loosely couplet with the domain model as possible. The aim is to minimize the impact of refactoring in the model to the actual stylesheets.

      The loose approach to the domain model helps to keep it open to extensions and flexible when being changed.

      Namespaces and specific accents

      Having a common domain specific language shared across multiple projects allows to reuse XSLT stylesheets for common tasks. Duplicating similar stylesheets in various projects would be error prone and would cause maintenance difficulties.

      On the other hand, projects differ from each other, and we can hardly expect that in the real word a common language would describe all their specifics. Specific grammars based in a different namespace (assigned to a particular project) are used to describe such specific properties of certain projects.

      Specific stylesheets may than handle the specific constructs. It is easy to recognize stylesheets which are specific to a particular project, as they contain the xmlns declaration for the particular namespace.

      The authoring framework

      This section introduces an authoring framework which helps to auto-generate documents out of data described in a domain specific XML grammar. It shows what is the framework composed of, what tools are involved and the overall architecture of the system.

      The framework in a nutshell

      The framework itself is not tight to the networking domain. It is a generic purpose set of tools which automates the tasks involved in publishing documents. It consists of several components, responsible for passing the source domain specific XMLs through a transformation chain to generate the requested document out of it.

      First the DSL data get simplified and than they are transformed by applying a set of XSLT stylesheets on them (see Figure 2, “Document generation — Phase 1”). Common XSLT templates are shared among all projects which use the same DSL. This allows to write a stylesheet once and reuse it in several other projects. Specific functionality may be implemented by importing a common stylesheet and extending it on a per project bases.

      Figure 2. Document generation — Phase 1

      Document generation — Phase 1


      The aim of the framework is to generate a document describing the source data. The result of the transformations usually are DocBook fragments such as chapters, sections, tables and figures or diagrams in Scalable Vector Graphics (SVG) format [SVG][5].

      The generated document fragments are combined together with static fragments (usually descriptive texts, diagrams and images created directly by content authors) into chapters using XInclude [XI].

      For each resulting document there is a DocBook article or book skeleton which contains meta-data about the document (the DocBook <info> element with authors, organization, disclaimer, copyright, titles and so on). In addition, it includes a set of static or generated chapters or sections.

      In the next phase, the DocBook sources are transformed into PDF (or eventually HTML) format (see Figure 3, “Document generation — Phase 2”). This is done through a DocBook stylesheet customization layer [DBC]. The layer is composed of several styles one for each corporate identity involved.

      Figure 3. Document generation — Phase 2

      Document generation — Phase 2


      There is a three level hierarchy in customization stylesheets. In the first level, there are customizations which are shared among all styles. Those declare common typesetting best practices shared by all produced documents.

      Than there are styles which define appearance for each corporate identity involved. For example each subsidiary, product or department may have different requirements on styling of documents (for example different title pages, logos, colors, fonts). DocBook is very flexible in the way the output may be customized, even in a pixel precise manner. Even a very creative corporate identity design may be easily implemented in the DocBook customization XSLT if it adheres at least to some extend to general typesetting conventions.

      The last layer composes of specific modifications to different styles. For example the header of the document may slightly differ in case the output is a company official letter, a proposal or a detailed design document. For the proposal we like to highlight the company name or maybe even logo on every page in the header but for detailed design documents with hundreds of pages this is not desirable. It rather makes sense to use the header to show the current context (chapter / section) of the document.

      Finally, if required, the resulting PDF document can be automatically split and merged with static PDF blocks using a PDF manipulation library. In some cases it makes no sense to convert large presentational appendices into DocBook and maintain them. For example PDF data sheets for equipment involved in a certain project. Merging such blocks into the resulting document needs to be automated. Doing manual merges every time the document changes is cumbersome and error-prone.

      XInclude

      XInclude [XI] is a perfect tool to make XML data modular and to reduce duplicities. It helps to keep a single source of information. Changing a certain information in one place will automatically result in changes wherever the information is included.

      XInclude is used in the DSL data to modularize it for easier maintenance, but duplicate data are avoided already in the DSL design [6].

      XInclude plays a more important role when interweaving individual document fragments to compose the resulting document. In this case XInclude allows to pick arbitrary sets of elements from a set of XML documents. Moreover, includes may be embedded inside other includes. For example an included section may have further figure includes inside. Correct relative URI resolution of hrefs inside the included fragments is done through xml:base attributes.

      One of the common issues which needs to be coped with is the limitation of XML that each well-formed document needs to have a single root element. Imagine one of the XSLT stylesheets generates a sibling set of sections. Such sections can't be stored in a well-formed XML document without having a common root element. A correct DocBook ancestor for a set of sibling sections is for example a chapter. In case, the described set of sections needs to be included into another chapter, simple XInclude would produce a chapter in chapter situation, which is an invalid DocBook. In this case the xpointer [XP] construct has to be used to specify the range of elements to be included[7].

      Although it brings flexibility and easier maintenance, the use of XInclude is also problematic. Even being a W3C recommendation for several years now the tool support in Java (a mainstream programming platforms) is quite buggy. Several workarounds need to be done to get correct XInclude behaviour with the latest Xerces (2.9)[8] and Saxon (9.1).

      First of all Xerces has to be patched to produce correct xml:base attribute for embedded includes[9]. Than still Xerces supports only the XPointer element() scheme [XPE]. This means only individual elements may be referenced. Moreover, only DTD-determined shorthand IDs are supported[10] and addressing element by position is error-prone.

      Unfortunately the only way how to make a reasonable subset of XPointer working with latest Xerces is to associate a DTD with the DocBook documents using DOCTYPE. To make offline generation of documents possible and to allow DocBook fragments on any arbitrary place in the file system an XML Catalog needs to be configured in the transformer.

      Still with all the patches and workarounds listed above, only a very basic XInclude/XPointer subset is supported. Either whole XML files or individual elements marked with IDs can be included. Advanced constructs as for example ranges or XPath are not functional.

      Even in this crippled form, XInclude is extremely useful and the authoring framework would suffer significant drawbacks without it.

      Automate with Ant

      Ant [ANT] is a multi-platform Java build tool. The build process is described using XML. It consists of a hierarchical set of targets dependent on each other. Each target than consists of a sequence of tasks. Each task is described by a certain XML element with certain attributes and child elements to define the tasks execution parameters.

      In the authoring framework, where all other components are mostly declarative, Ant plays the procedural role. Its build file specifies the whole document generation process from the DSL data transformations to the resulting document generation.

      There are generic build definitions common to all projects within the framework and specific (per project) build definitions which may override the generic behavior. In simple scenario, the generic definition may be used as it is to generate documents from DSL data. Placing stylesheets and input data into a certain directories within the project directory structure will result in applying all XSLT automatically to the source data.

      For complex projects with several different target documents the default behaviour needs to be overridden. In such case, the specific build definition is basically a serie of transformation tasks defining inputs, outputs and stylesheets.

      The Ant's default XSLT transformation task is not sufficient for the needs of the authoring framework which requires XInclude (as discussed in the section called “XInclude”) and XSLT 2.0 and XPath 2.0 support (for more powerful and easier XSLT templates and Schematron validation). That's why the authoring framework implements it's own Ant macro for XSLT transformations.

      Example 1. The <saxon> task usage

      The simplest usage[11].

      <saxon source="${domain}" output="${out}" stylesheet="${xslt}"/>

      Advanced usage with up-to-date checks and stylesheet parameters. Up-to-date checks allow to regenerate the target only in case the sources did change. The default source is the file in the source attribute, but in case of XInclude use, all the included files need to be specified manually using <uptodat-source> element.

      <saxon source="${domain}"
             output="${out}"
             stylesheet="${xslt}">
      <uptodate-source><fileset dir="${domain.dir}">
      <includes name="**/*"/>
      </fileset></uptodate-source>
      <parameters><arg value="param1=1"/></parameters>
      </saxon>


      Example 2. Running Saxon with catalog and XInclude support

      This <java> task definition, is the core of the <saxon> macro defined in the generic Ant build files.

      <java classname="net.sf.saxon.Transform" fork="true" failonerror="@{failonerror}">
       <arg value="-xi"/>
       <arg value="-x"/><arg value=" ...ResolvingXMLReader"/>
       <arg value="-y"/><arg value=" ...ResolvingXMLReader"/>
       <arg value="-r"/><arg value=" ...CatalogResolver"/>
       <arg value="@{source}"/><arg value="@{stylesheet}"/>
       <parameters/><classpath>
        <path refid="transform.classpath"/>
        <pathelement location="... resolver${os-env}.jar"/>
       </classpath>
       <jvmarg value="-client"/>
       <jvmarg value="-D... DocumentBuilderFactory= ...DocumentBuilderFactoryImpl"/>
       <jvmarg value="-D... SAXParserFactory= ...SAXParserFactoryImpl"/>
      </java>


      In addition to the <saxon> tasks the authoring framework common build definition contains several other useful XML manipulation tasks. All of them are defined with the use of common Ant tasks as Ant Macros. No Java programming was required to define them. They are pure XML.

      Further custom Ant tasks

      schematron

      Schematron validation task, uses the <saxon> task.

      xmlconcat

      Concatenates several XML files into one big XML file with an artificial root element. This is useful in case the a stylesheet needs to operate over several such XML files at once and the use of XInclude is not possible because it is no known in advance what files shall be concatenated[12].

      csv2xml

      Converts a comma separated values file into XML tabular format which later may be processed using an XSLT stylesheet and thus turning the CSV file into the DSL format. This is a way how to turn for example purely presentational data in Excel into a semantic XML by adhering to some agreed structure of the input Excel document.

      SVG

      Modern documents need to get visual in order to succeed in competition. Visual data are significantly more understandable to humans than paragraphs of texts. Having a more human understandable proposal may help win a tender. Having a better understandable documentation may help to sell a product or decrease requirements on stuff knowledge.

      This trend leads to an increasing ratio of visual data in documents but it may also increase maintenance requirements. Textual data are usually easier to maintain especially when doing frequent changes. This section introduces several techniques how to increase maintainability of visual data through the use of XML technologies.

      Scalable Vector Graphics [SVG] is the technology which can help to maintain visual data in output documents. The language is XML-based which allows to utilize the very same XML-based techniques described in the previous sections. Today's FO processors as well as browsers are very well able to handle SVG data which means SVG may be used directly without any conversions to produce visual PDF or HTML outputs.

      Image callouts

      The DocBook documentation [DB] describes callouts as — “ a visual device for associating annotations with an image, program listing, or similar figure. Each location is identified with a mark, and the annotation is identified with the same mark.

      Keeping annotations separate from the actual annotated content makes it much easier to apply changes to that content. This is especially true for image callouts. Visually annotating images (diagrams, photographs...) is a frequent use-case but creating and maintaining them manually is a nightmare. Even DocBook stylesheets do not implement image callouts by default, the common DocBook customization layer of the authoring framework described in this article does.

      The authoring framework uses purely XML technologies (XSLT and SVG) to create annotation regions and marks within annotated images.

      First the DocBook sources are being preprocessed by XSLT and for each annotated <mediaobject> with an image reference an SVG file is created using the XSLT <result-document>. The generated SVG size is based on the size of the original image, and the image gets included in the background using <svg:image>. Than the stylesheet reads coordinates of the individual annotated regions and generates SVG rectangles including numbering for each annotation. The fileref of <imagedata> is altered to point to the newly generated SVG file. Figure 4, “Image callouts result” shows the generated image — an annotated back of a router device with ports and buttons explained. The DocBook example source fragment follows.

      Figure 4. Image callouts result

      Image callouts result


      Example 3. DocBook source for the annotated image

      <imageobjectco>
      <areaspec>
       <area xml:id="p1"
        coords="45,20 111,36"
        units="px"/> ...
      </areaspec>
      <imageobject>
       <imagedata
        width="383" height="140"
        fileref="media/xmlprague2009/cisco1750.png"/>
      </imageobject>
      <calloutlist>
       <callout arearefs="p1">
        <para>WIC/VIC Slot 1</para>
       </callout>...


      Rather than pre-processing the DocBook document, an alternative option for image callouts is to place the SVG generation directly into the DocBook customization stylesheet where it naturally belongs. But in this case we need to use an XSLT 2.0 construct (<result-document>) within an XSLT 1.0 DocBook stylesheets[13] or we could use processor specific extensions for that.

      Graphs

      Another frequently used visual element in documents are graph-like diagrams. Although it would be possible to enable automatic graph layouting within the authoring framework, this is out of scope of this article which rather demonstrates a different approach.

      Auto-generated graphs may safe a lot of time and resources especially when considering a huge amount of diagrams to maintain. But even advanced layouting algorithms are involved, the result can never compete in terms of beauty or human readability with manually layouted diagrams.

      A typical example of graphs in the networking domain are network diagrams. Those are usually undirected graphs, where nodes represent some kind of a network device (router, switch...) and edges are labeled with interfaces, IP addresses and protocols.

      Networking experts love to use Microsoft Visio for drawing network diagrams as it has nice layouting features and a huge clipart gallery of various networking devices. From the XML tool chain perspective Vision has quite good SVG export which makes it possible to post-process the diagrams automatically.

      The networking projects domain model groups individual sites into sets of a certain type. Those sets share same hardware configuration and same networking schema. Only the individual IP addresses, interfaces and device names differ according to the IP plan for each site. This means the diagram authors only draws a schema per site type rather than maintaining schemas for each individual site. Correct per-site IP addresses, interface and device names are obtained by the authoring framework stylesheets from the DSL data when automatically generating detailed per-site diagrams.

      Mapping between Visio entities (usually labels) and the domain specific data is achieved through Visio custom properties. Visio features special property sets which may be assigned to the diagram as a whole or to individual entities or groups of entities. When exporting to SVG, elements in the SVG namespace are wrapped in Visio-specific elements which preserve many (although not all) Visio-specific information[14].

      During generation of the resulting document, each Visio diagram is processed by an XSLT stylesheet. Each label with a certain set of custom properties is mapped to an XPath expression in the stylesheet which is than evaluated against the DSL data and the text of the label is replaced with the retrieved text nodes.

      Figure 5. Processing Visio diagrams

      Processing Visio diagrams


      When exporting SVG, Visio makes several mistakes which needs to be corrected by the processing stylesheet.

      Some auto-corrected Visio SVG mistakes

      • styles need to be adjusted to display arrow ending correctly

      • only align left works out of the box for labels, right and center needs to implemented in the stylesheet by adjusting the SVG output according to Visio-specific align information

      Fully auto-generated SVG diagrams

      Layouting graphs automatically is a complex tasks, but some simpler diagrams can be fully auto-generated easily. For example for the networking documentation it is important to visualize rack layouts with devices on a per-site bases.

      Fully visual rack layouts may help the maintenance personnel recognize the individual devices at a particular site, see how many interfaces they have or what is their position within racks.

      Domain specific stylesheets in the authoring framework are able to generate rack layout diagrams automatically by composing different SVG fragments into one resulting SVG diagram. Rack configurations for individual site types are described in the DSL data. Each rack has a number of slots defined which may be occupied by devices. Each device may occupy one or more slots. An SVG clipart may be associated with a certain group of devices and thus defining their appearance. A shared SVG clipart gallery is used to maintain them. If a certain device group does not have a clipart in the gallery a default device appearance is used.

      XSLT stylesheets are used to generate SVG representation of each rack, gather the different SVG fragments for each device in the rack from the gallery, scale the graphics to occupy the right amount of slots in the rack and filter everything redundant to obtain a single valid SVG output[15].

      Device cliparts are set in place using SVG matrix() transformation which translates the embedded SVG to the appropriate position and scales it up or down to fill the requires space. This requires to do some mathematical calculations within the stylesheet including unit conversions.

      The aim is to simplify creation of device cliparts in the shared clipart library as much as possible. All manipulations to the SVG clipart are done automatically by the stylesheet. Creating a new clipart may be as easy as opening Visio, choosing a certain device from the palette and exporting it into SVG[16].

      Figure 6. Auto-generated rack layout

      Auto-generated rack layout


      Other generated graphics

      Expressing information in a visual form is very powerful. Why sticking to figures? DocBook can be enriched by graphics in a more fine grained manner. SVG may be embedded for example directly into DocBook table cells.

      Let's again demonstrate this on the networking domain example which we use across the whole article. Consider some sites in the project needs to be connected wirelessly with each other. From their geographical location we can calculate their distance and azimuth[17].

      An azimuth is typically expressed in degrees. A nice visual representation of the azimuth is a circle with a pointer. Such visual documentation may help engineers installing the wireless directional antennas to point them in the right direction to gain signal strength. Figure 7, “SVG inside table cells” shows visual data being mixed with traditional textual data in a table. The screenshot is taken from the resulting PDF file. The small SVG clips are generated directly in the DocBook customization layer and injected into FO using <instream-foreign-object>.

      Figure 7. SVG inside table cells

      SVG inside table cells


      When semantics is missing

      Sooner or later any XML authoring framework will need to address the issue how to gather data from different non-semantic sources. the section called “Graphs” describes how to automatically process Visio diagrams. In the section called “Automate with Ant” there is a little note on how to export data from Excel/CSV and convert them first into tabular XML and later into domain specific XML.

      There are basically two options how to cope with this issue. Either force users (even non-technical) to output semantic XML directly or leave the users use the tools their are used to and attach semantics to data later automatically or at least semi-automatically.

      Produce semantic XML visually

      The stumbling block of wider adoption of XML for authoring is the lack and limitations of visual editing tools. Non-technical people are afraid of XML, they consider it some sort of black magic.

      Developing a good visual tool for XML editing resembles a quadrature of the circle. Tools are trying to shield the user from all kinds of complexities of the XML tree and thus getting the users into a position where they are not aware of what they are actually doing. Such users are unable to solve issues which the editor cannot solve for them when the scenario gets just a little more complicated.

      From the experience of evaluating visual XML editors such as Epic, XMetaL, Oxygen and XML Mind, it is obvious that those tools are still after years of development very far from being perfect[18].

      Of course the quality of the editors differs very much and each editor is more or less suitable for a certain set of tasks, but the author of this article has so far best experiences with the XML Mind editor for it's flexible extensibility, very good level of visual user experience and also for the licensing policy. Anyway, it is still very difficult to get non-technical users use such tools.

      Non-XML tools

      Traditional presentational tools are getting more and more XML enabled recently. Not only Microsoft Visio has SVG export. MS Word is able to present XML documents visually and even allow very primitive editing of such documents. MS Excel has a limited capability to bind tabular data to simple XML schemas and thus producing semantic XML out of spreadsheets.

      There are also semi-automatic approaches to convert purely presentational documents from MS Word into DocBook. In theory, Open XML could be transformed to some form of DocBook which may be later enhanced with additional semantics manually. Another approach is to open an MS Word document in Open Office and use it's DocBook export feature. This produces a very ugly DocBook source which may be partially automatically enhanced (XSLT).

      Beyond authoring

      Authoring is a relatively narrow domain which can be very well handled with XML tools. But the potential is far greater. There is no need to limit the output to a traditional document.

      Structuring data correctly and assigning semantics to them brings immense flexibility. With XSLT we are free to transform the data into all sorts of very different formats to make the best use of them for specific applications. To demonstrate the potential applications lets show few interesting use-cases specific to the networking project domain.

      Google Earth

      Google Earth uses the Keyhole Markup Language (KML), an XML-based language for expressing geographic annotation. Having longitude and latitude specified for each site in the domain model allows to mark and annotate the sites on the Google Earth surface. Lines connecting individual sites may represent wireless or other connections, different icons may represent different site types. Clicking on a site will reveal further information about a particular site.

      Having projects visualized in Google Earth is not only imposing, but it also helps to very well visualize certain aspects of the project for effective project planning, resource management etc...

      Network monitoring

      Network monitoring is very important during the support phases of every networking project.

      There are already all necessary information in the DSL data to setup a monitoring system. There are all devices, their IP addresses and interfaces and how are they connected with each other. Such data may be transformed to create configuration files for a monitoring system such as HP Open View or Nagios/Nagvis.

      With help of the monitoring tools the static domain data may be turned into a dynamic view of the current network displaying current status of all devices and connections with flashy green/red colours.

      Device configuration files

      The DSL data may be used to automatically generate configuration files for different type of devices in the network. Specific XSLT stylesheets can be used to configure routers, switches, firewalls and other device.

      Automatic generation will guarantee consistency and may avoid errors when configuring devices manually.

      Conclusion

      This articles has shown how to automate authoring for detailed and visual documentation to support all phases of networking projects using purely XML-based tools. But the networking field was more or less used only as an example use-case. The principles, approaches and best-practices described in this article are widely applicable for many different domains.

      Moreover the reader witnessed how XML was utilized with advantage to model a domain specific language, validate data consistency, describe modularity of data, define data transformations and transformation chains, describe document structures, style documents and visualize data.

      Bibliography

      [DB] Walsh, N.:DocBook 5.0: The Definitive Guide. 2008. URL: ???

      [DBS] Stayton, B.:DocBook XSL: The Complete Guide. Sagehill Enterprises, 2008. URL: ???

      [NS] Bray, T., Hollander, D., Layman, A., Tobin, R.:Namespaces in XML 1.0 (Second Edition). W3C, 2006. URL: ???

      [SVG] Ferraiolo, J., Fujisawa, S., Jackson, J.:Scalable Vector Graphics (SVG) 1.1 Specification. W3C, 2003. URL: ???

      [FO] Berglund, A.:Extensible Stylesheet Language (XSL) Version 1.1. W3C, 2006. URL: ???

      [XSLT] Clark, J.:XSL Transformations (XSLT) Version 2.0. W3C, 2007. URL: ???

      [XPTH] Berglund, A., Boag, S., Chamberlin, D., Fernández, M., Kay, M., Robie, J., Siméon, J.:XML Path Language (XPath) 2.0. W3C, 2007. URL: ???

      [VAL] Nálevka, P., Kosek, J.:Advanced approaches to XML document validation. University of Economics, Prague, 2007. URL: ???

      [RUL] Nálevka, P.:Grammar vs. rules. University of Economics, Prague, 2007. URL: ???

      [SCH] Information technology — Document Schema Definition Languages (DSDL) — Part 3: Rule-based validation — Schematron. ISO/IEC 19757-3, 2006. URL: ???

      [XI] Marsh, J., Orchard, D., Veillard, D.:XML Inclusions (XInclude) Version 1.0 (Second Edition). W3C, 2006. URL: ???

      [XP] Grosso, P., Maler, E., Marsh, J., Walsh, N.:XPointer Framework. W3C, 2003. URL: ???

      [XPE] Grosso, P., Maler, E., Marsh, J., Walsh, N.:XPointer element() Scheme. W3C, 2003. URL: ???

      [ANT] Loughran, S., Hatcher, E.:Ant in Action. Manning, 2007. ISBN: 1-932394-80-X

      [IC] Walsh, N.:Image Callouts. 2006. URL: ???




      [1] This approach is commonly used in XML grammars (e. g. Relax NG or NVDL simplification).

      [2] A perfect example of data inconsistency is the HTML tables model, where it is possible to define overlapping cells.

      [3] The aim of this article is not to discourage people from writing schemas. It just proposes an pragmatic lightweight alternative approach in case flexibility and fast change management is more important than precise binding.

      [4] For example a diagnostic message may tell us that according to the target rack configuration there is a missing device in rack X on site Y.

      [5] A typical result of such transformations for the networking project domain is for example: a table listing all sites, a table with wireless connections between sites, per-site networking schemas etc...

      [6] DSL data can be normalized for example using ID references.

      [7] There are much more use-cases where the use of xpointer is necessary. For example a main table which has all source data and several other tables which include only certain rows from the main table. Or a certain figure has to be included into another context from a DocBook chapter and so on.

      [8] Note that Xerces is the default XML parser in the latest Java JVMs

      [9] This is a known bug since 2005. xml:base for embedded includes are not relative to the parent include base URI. The patch (bug 1102) has been released just recently (middle 2008) and it still did not make it into the latest release.

      [10] Without an DTD no attribute is considered to be an ID, even the xml:id attribute.

      [11] output and stylesheet attributes are optional. If output gets omitted standard output is used, if stylesheet is missing the xml-stylesheet processing instruction is used to determine the stylesheet.

      [12] Alternatively collection() may be used.

      [13] XSLT 2.0 stylesheets for DocBook is currently work in progress

      [14] This means that the SVG diagram may still be opened in Visio and edited. This helps the content authors to edit the resulting appearance of the diagrams using Visio directly. Changing the position of some diagram entity in Visio will result in a changed position also in the generated PDF output. Unfortunately the exported SVG does not equal to the original native Visio file and some information may be lost during export. Therefore it is anyway a good idea to keep original Visio sources along with the exported SVG.

      [15] Such filtering includes for example merging <style> element contents from various SVG fragments into a single <style>. Or filtering any embedded <svg:svg>.

      [16] Some FO processors support only a subset of SVG, e. g. XEP does not support gradients. Automatic processing needs to be done in this case in the stylesheet. For example for gradients, the background color may be set to a color in the middle of the first and last stop.

      [17] This is no issue as XSLT is well equipped for mathematics. For example there are stylesheets in the authoring framework for the networking domain which do distance and bearing calculations from geographical coordinates for individual sites. This involves calculations with trigonometric functions and requires high precision.

      [18] A simple test will reveal the immaturity of the tools. Open a DocBook document in Oxygen Author, click to a paragraph level and insert a new section. No problem, the editor inserts the section right inside the paragraph and thus producing an invalid DocBook document. As the next step the editor will underwave it's own mistake with red.

      Petr Nalevka 2005