Data extensibility

The <data> element represents properties ranging from simple values to complex structures. Processes can harvest the <data> element for automated manipulation or to format data associated with the body flow. The <data> element is primarily intended for use in creating specializations.

You can nest <data> elements for structures. You can use the name attribute to indicate the semantic of instances of the <data> element such as addresses, times, amounts, and so on. In many cases, however, you may prefer to specialize the <data> element for more precise semantics and for constraints on structures and values. For instance, a specialization can specify an enumeration for the value attribute.

In some cases, it isn't possible or convenient to maintain a property as part of the content of its subject. For instance, you might prefer to maintain extensive data in the <prolog> that applies to a note or example within the body. To handle such exceptions, you can use the <data-about> element to identify the subject of the property.

A process can harvest the data values for a machine-processable representation such as RDF. The default formatting ignores the <data> element within the <body> element. Understanding whether and how the properties should display, customized or specialized processing can extend formatting to include data values in some formatted outputs.

It is an abuse of the DITA architecture to specialize <data> element for text that is part of the body flow, particularly to escape the constraints of the base content models. For example, a special kind of paragraph must specialize the base <p> element rather than the <data> element. When exchanging content with others or retiring a specialization, a paragraph specialized from the <data> element will be generalized and thus skipped by the base formatting, mangling the discourse flow and resulting in invalid content.


Uses of the <data> element include the following:

The following example specifies the delimited source code for a code fragment so an automated process can refresh the code fragment. The <fragmentSource>, <sourceFile>, <startDelimiter>, and <endDelimiter> elements are specialized from <data> but the <codeFragment> is specialized from <codeblock>. These properties wouldn't appear in the formatted output (except perhaps for debugging problems in the refresh):

    <title>An important coding technique</title>
            <sourceFile     value=""/>
            <startDelimiter value="FRAGMENT_START_1"/>
            <endDelimiter   value="FRAGMENT_END_1"/>

The following example identifies a real estate property as part of a house description. The <realEstateProperty> element and everything it contains are specialized from <data>. The <houseDescription> element is specialized from <section>. A specialized process can format the values to identify the lot if appropriate for the brochure.

  <title>A great home for sale</title>
    <realEstateBlock value="B7"/>
    <realEstateLot   value="4003"/>
  <p>This elegant....</p>
  <object data="B7_4003_tour360Degrees.swf"/>

