OASIS Open Document Format Interoperability and Conformance (OIC) TC
Bart Hanssens, Fedict
Robert Weir, IBM
This specification is related to:
This report discusses interoperability with respect to the OASIS OpenDocument Format (ODF) and notes specific areas where implementors might focus in order to improve interoperability among ODF-supporting applications.
This document was last revised or approved by the ODF Interoperability and Conformance TC on the above date. The level of approval is also listed above. Check the "Latest Version" or "Latest Approved Version" location noted above for possible later revisions of this document.
Technical Committee members should send comments on this specification to the Technical Committee’s email list. Others should send comments to the Technical Committee by using the “Send A Comment” button on the Technical Committee’s web page at http://www.oasis-open.org/committees/oic/.
For information on whether any patents have been disclosed that may be essential to implementing this specification, and any offers of patent licensing terms, please refer to the Intellectual Property Rights section of the Technical Committee web page (http://www.oasis-open.org/committees/oic/ipr.php.
The non-normative errata page for this specification is located at http://www.oasis-open.org/committees/oic/.
Copyright © OASIS® 2009-2010. All Rights Reserved.
All capitalized terms in the following text have the meanings assigned to them in the OASIS Intellectual Property Rights Policy (the "OASIS IPR Policy"). The full Policy may be found at the OASIS website.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published, and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this section are included on all such copies and derivative works. However, this document itself may not be modified in any way, including by removing the copyright notice or references to OASIS, except as needed for the purpose of developing any document or deliverable produced by an OASIS Technical Committee (in which case the rules applicable to copyrights, as set forth in the OASIS IPR Policy, must be followed) or as required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be revoked by OASIS or its successors or assigns.
This document and the information contained herein is provided on an "AS IS" basis and OASIS DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY OWNERSHIP RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
OASIS requests that any OASIS Party or any other party that believes it has patent claims that would necessarily be infringed by implementations of this OASIS Committee Specification or OASIS Standard, to notify OASIS TC Administrator and provide an indication of its willingness to grant patent licenses to such patent claims in a manner consistent with the IPR Mode of the OASIS Technical Committee that produced this specification.
OASIS invites any party to contact the OASIS TC Administrator if it is aware of a claim of ownership of any patent claims that would necessarily be infringed by implementations of this specification by a patent holder that is not willing to provide a license to such patent claims in a manner consistent with the IPR Mode of the OASIS Technical Committee that produced this specification. OASIS may include such claims on its website, but disclaims any obligation to do so.
OASIS takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on OASIS' procedures with respect to rights in any document or deliverable produced by an OASIS Technical Committee can be found on the OASIS website. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this OASIS Committee Specification or OASIS Standard, can be obtained from the OASIS TC Administrator. OASIS makes no representation that any information or list of intellectual property rights will at any time be complete, or that any claims in such list are, in fact, Essential Claims.
The names "OASIS", “ODF”, “Open Document Format”, “OIC” are trademarks of OASIS, the owner and developer of this specification, and should be used only to refer to the organization and its official outputs. OASIS welcomes reference to, and implementation and use of, specifications, while reserving the right to enforce its marks against misleading uses. Please see http://www.oasis-open.org/who/trademark.php for above guidance.
1 Introduction 5
1.1 Normative References 5
1.2 Non-Normative References 5
2 Conformance and Interoperability 6
2.1 ODF Conformance 6
2.2 ODF Interoperability 7
2.3 Conformance and Interoperability 7
3 An Interoperability Model 9
3.1 Sources of Interoperability Problems 9
3.2 Round-trip Interoperability 10
4 Problems and Solutions 11
4.1 Priority Areas for Improvement 11
4.2 Approaches to Improving ODF Interoperability 11
5 Conclusion 14
6 Conformance 15
Appendix A.Acknowledgments 16
Appendix B.Revision History 17
OASIS OpenDocument Format (ODF) is a standard for office documents, including text documents, spreadsheets and presentations. ODF 1.0 was published in 2005, and ODF 1.1 was published in 2007. ODF 1.0 was also approved as ISO/IEC 26300:2006. The OASIS ODF Technical Committee1 is currently working on ODF 1.2.
The OASIS ODF Interoperability and Conformance (OIC) TC was created in October 2008 with the stated purpose “to produce materials and host events that will help implementors create applications which conform the ODF standard and which are able to interoperate.”
The charter of the OIC TC also calls for it to periodically review the state of conformance and interoperability among ODF implementations, to report on “overall trends in conformance and interoperability”, to note “areas of accomplishment as well as areas needing improvement” and to “recommend prioritized activities for advancing the state of conformance and interoperability among ODF implementations.”
This State of ODF Interoperability report is the first of the OIC TC's reports on interoperability, and as such provides an overview of the topic and discusses the baseline level of achievement. Future reports will focus on progress achieved beyond this baseline.
This report contains no normative references.
This report contains no non-normative references. However, footnotes containing hyperlinks to relevant online resources are provided.
Conformance is the relationship between a product1 and a standard. A standard defines provisions that constrain the allowable attributes and behaviors of a conforming product. Some provisions define mandatory requirements, meaning requirements that all conforming products must satisfy, while other provisions define optional requirements, meaning that where applicable they must be satisfied. Conformance exists when the product meets all of the mandatory requirements defined by the standard, as well as those applicable optional requirements. For example, validity with respect to the ODF schema is a mandatory requirement that all conformant ODF documents must meet. However, support of the “fo:text-align” attribute is not required. Nevertheless, there are optional requirements that constrain how this feature must be implemented by products that do support that feature, namely that its value is restricted to “start”, “end”, “left”, “right”, “center” or “justify”.
A standard may define requirements for one or more conformance targets in one or more classes, in which case a product may be said to conform to a particular conformance target and class. For example, ODF 1.1 defines requirements for an ODF Document target as well as an ODF Consumer target. ODF 1.2, in its current draft, defines requirements for an additional conformance class for the ODF Document target, namely the Extended ODF Document class.
There have been several attempts to assess conformance of ODF Documents. The ODF Fellowship hosts an online validator2 that allows the user to upload a document which is then validated according to the ODF schema. Oracle hosts a similar tool at the OpenOffice.org web site3. The ODFToolkit Union has a ODF Conformance project4 to further develop this code. Office-a-tron5 is another validator. The ODF TC has posted instructions for how to validate ODF document instances using a variety of tools6.
These tools primarily look at document validity, an XML concept, which is a necessary but insufficient condition for document conformance. These tools are limited to a static examination of document contents and are not capable of testing conformance of ODF applications. However, we believe that XML validation, combined with additional static analysis, is a promising approach to automate conformance testing of ODF documents.
Based on the results from existing online validators, which only test a subset of conformance requirements, the overall level of document conformance appears to be good. However, there are some recurring issues, including:
Style names beginning with a number, making them an invalid NCName. Implementations with this issue could trivially transfer their style names into valid ones by pre-pending an underscore character ('_') to the style name.
Documents missing a mimetype file
MathML documents with non-standard doctype declarations.
Since automated validation testing of documents is a cheap and effective way to find errors in ODF documents, its widespread use is especially encouraged.
According to ISO/IEC 2382-01, “Information Technology Vocabulary, Fundamental Terms”, interoperability is “The capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units”.
From the perspective of ODF, the document is the data which is transferred, and the functional units are the software applications which create, edit, view and manipulate these documents. Where the document can be successfully transferred among such applications, without the user needing to be concerned with the unique characteristics of each application, then interoperability is high. Conversely, where the user needs to be aware of the quirks of each application, there interoperability is poor.
Since the capabilities of ODF applications extend beyond the common desktop editors, and include other product categories such as web-based editors, mobile device editors, document convertors, content repositories, search engines, and other document-aware applications, interoperability will mean different things to users of these different applications. However, to one degree or another, interoperability consists of meeting user expectations regarding one or more of the following qualities when transferring documents:
The visual appearance of the document at various levels, e.g., glyph, run, line, block, page, etc.
The structure of the document as revealed when the user attempts to edit the document, e.g., headers, paragraphs, lists, tables.
The behaviors and capabilities of internal and external links and references.
The behaviors and capabilities of embedded images, media and other objects.
The preservation of document metadata.
The preservation of document extensions.
The integrity of digital signatures and other protection mechanisms.
The runtime behaviors manifest from scripts, macros and other forms of executable logic.
In any given user task, one or more of these qualities may be of overriding concern.
The relationship between conformance and interoperability is subtle and often confused. On one hand, it is possible to have good interoperability, even with non-conformant documents. Take HTML, for example. A study of 2.5 million web pages found that only 0.7% of them conformed to the HTML standard1. The other 99.3% of HTML pages were not conformant. So conformance is clearly not a prerequisite to interoperability. On the other hand, web browsers require significant additional complexity to handle the errors in non-conformant documents. This complexity comes at a tangible cost, in development resources required to write a robust (tolerant of non-conformance) web browser, and well as less tangible liabilities, such as greater code complexity which typically results in slower performance and decreased reliability. So although conformance is not required for interoperability, we observe that interoperability is most efficiently achieved in the presence of conforming applications and documents. However, this is an ideal alignment that is rarely achieved, since standards have defects, applications have bugs and users make mistakes. So, in practice, achieving a satisfactory degree of interoperability almost always requires additional efforts beyond mere conformance.
A model for describing document interoperability is given in Figure 1.
Any document exchange scenario will involve these steps:
Authoring: The human author of the document express his intentions by instructing a software application, typically via a combination of keyboard and GUI commands. Note that computer-generated documents also have an authoring step, in which case the human intentions are expressed by the author of the document generation software, via his code instructions.
Encoding: The software application, when directed to save the document, executes program logic to encode into ODF format a document that corresponds to the instructions given to it by the user.
Storage: The ODF document then represents a transferable data store that can be given to another user.
Decoding: The receiving user then uses their software application to decode the ODF document, and render it in fashion suitable for User B to interact with it.
Interpretation: User B then perceives the document in their software application and interprets the original author's intentions.
Interoperability defects can be introduced at any step in this process. For example:
An inexperienced user might instruct the authoring software application incorrectly, so that his instructions do not match his intentions. For example, a user may try to center text by padding a line with extra spaces rather than using the text align feature of their word processor. Or he might try to express a table header by merely applying the bold attribute to the text rather than defining it as a table header.
The software application that writes the document may have defects that cause it to incorrectly encode the user's instructions.
Due to ambiguities in the ODF specification, the document may be subject to different interpretations by Application A and Application B.
The software application that reads the document may have defects that cause it to incorrectly decode and render the document.
The user reading the document may incorrectly perceives its contents.
Round-trip interoperability refers to scenarios where a document author collaborates with one or more other users, such that the document will be edited, not merely viewed, by those other users, and where the revised document is eventually returned to the original author.
From the perspective of the interoperability model, this introduces nothing new. A round-trip scenario is simply a iteration of the above steps, with the same opportunities for errors being introduced, e.g., A→B→A is the same as A→B, followed by B→A. However, since errors introduced at any step in the process tend to accumulate, a complex round-trip scenario will tend to suffer more in the presence of any interoperability defects.
Also, since the original author is the person who most knows the author's intentions, he will also be the most sensitive to the slightest alterations in content or formatting introduced into his document. So minor differences that might not have been noticed when read by a second user will be more obvious to the original author. This sensitivity to even the smallest differences will tend to cause the perception of round-trip interoperability to be lower.
Based on initial testing in the OASIS ODF Interoperability and Conformance TC, as well as scenario-based testing at the ODF Plugfest, several feature areas have been identifies as needing improvement in one or more ODF editors:
Some ODF features are not widely-implemented, such as document encryption, change tracking and form controls.
There are implementation bugs and inconsistencies that reduce interoperability of embedded charts, specifically:
When writing out embedded charts, some implementations write out a link to the embedded chart asxlink:href=“Object 1/”. Other implementations write it out as: xlink:href=”./Object 1”.
Some optional attributes, such as table:cell-range-address and chart:values-cell-range-address are not written by all implementations, although some implementations will not correctly render a chart without these values being set. More details are on the ODF Plugfest wiki.1
Implementations vary in their ability to to parse spreadsheet formulas written by other implementations. Adoption of OpenFormula (part of ODF 1.2) as the interchange format for spreadsheet formula expressions is encouraged. When an implementation does not support a formula syntax, typically the values are shown when office:value is present, but formulas themselves are removed. A simple example is a SUM function that calculates the sum of 2 cells. When opening a spreadsheet, a user may not notice any difference because the values are still there. But when changing the values of these 2 cells, the value of the cell originally containing the SUM function will not be updated accordingly because the formula has been removed.
Although simple lists appear to work well across the tested implementations, documents using advanced list structures (nested lists and list continuations) fared less well. When a document containing change tracking information is loaded in an implementation that does not support change tracking, the <text:tracked-changes> element and its content can be removed. Also, the content of deleted paragraphs, stored in <text:p> elements inside <text:tracked-changes>, can become visible at the top of the first page of the document (because <text:tracked-changes> is stored before all other <text:p>'s), which can be very confusing for the user
Improving interoperability generally follows three steps:
Define the expected behavior
Identify defects in implementations
Fix the defects
The primary definition of expected behavior for the rendering of ODF documents is the published ODF standard. However, there are other sources of expected behavior, and meeting these expectations, where they do not conflict with the ODF standard, are, from the user's perspective, very important as well. For example:
Approved Errata1 to the ODF standard
Other publications of the OASIS ODF TC, such as the Accessibility Guidelines2
Draft versions of ODF, where they clarify the expected behavior
Interoperability Guidelines, as may be published by the ODF Interoperability and Conformance TC.
Common sense and convention, which often provides a shared set of expectations between the user and the application vendor. For example, the ODF standard does not define the exact colorimetric value of the color “red”, though undoubtedly users would be surprised to see their word processor render it as yellow.
There are several tools, processes and techniques that have been suggested for identifying interoperability defects, including:
Automated static testing of ODF documents, which could range in complexity from simple XML validation to more involved testing of conformance and portability.
“Atomic” conformance tests, which test the interoperability of individual features of the ODF standard at the lowest level of granularity. The ODF Fellowship's OpenDocument Sample Documents3 is an early example of this approach. The approach used by Shah and Kesan4 falls into this category as well.
Scenario-based testing, which combine multiple ODF features into documents which reflect typical real-world uses.
“Acid” tests aim to give the end-user a quick view how well their application supports the standard. This approach was popularized in the Web Standards Project5 to improve browser interoperability. Sam Johnston has created a prototype ACID test for spreadsheet formulas6.
Plugfests, face-to-face and virtual This approach has proven useful in interactive testing among vendors in the ODF Plugfests7.
OfficeShots.org8 allows an end-user to upload a document and compare how it will render in different applications. This can allow the user to see potential interoperability problems before they distribute their document.
User-submitted bug reports, sent to their application vendor, can help the vendor prioritize areas which are in need of improvement.
Public comments, submitted to the OASIS ODF TC9, which report ambiguities in the specification which can impact interoperability. Such comments can be resolved in errata or revisions of the standard.
As enumerated above, there are a variety of approaches to identifying interoperability defects. At this point it is not clear which of these techniques will, over the long term, be the most effective. Some require more substantial up-front investment in automation development, but this investment also permits a degree of test automation. Some approaches require substantial efforts in analysis of the ODF standard and construction of individual test cases. But once these test cases are designed, they can be executed many times at low cost. There are also techniques that require far less advance preparation and are suitable for formal and informal testing at face-to-face plugfests.
These options are familiar in the field of software quality assurance (SQA), and from that field we are taught to consider several factors:
What is the “defect yield” of each testing approach? The defect yield is the number of defects found per unit of testing time, and is a measure of the efficiency of any given approach.
What is the test coverage that can be achieved by any given approach? Test coverage would indicate what fraction of the features of ODF are tested by that approach.
Once an initial investment is made, how expensive will it be to update test materials as new ODF versions and new application versions are released?
How well does the testing approach prioritize the testing effort, so that the defects that most impact interoperability for the most number of users are found quickly?
The goal should be to identify an approach that finds the most number of defects with the least effort, with an emphasis on those defects that have the greatest real-world impact. It is well-known from SQA research that that the optimal approach often involves a blend of different complementary techniques.
Of course, interoperability will not improve merely by testing. The results of testing must feed forward into the vendors' development plans, so these defects are fixed and the fixes make it into the hands of end-users. Nothing improves until vendors change their code. Merely talking about the problem doesn't make it go away.
This State of ODF Interoperability report is a snapshot in time based on current ODF implementations, current interoperability activities and the current thoughts of the OASIS ODF Interoperability and Conformance TC. We intend to periodically reissue this report as conditions evolve, to note progress and remaining challenges.
This report is informative and contains no normative clauses or references.
The following individuals have participated in the creation of this specification and are gratefully acknowledged
Jeremy Allison, Google Inc.
Mathias Bauer, Oracle Corporation
Michael Brauer, Oracle Corporation
Alan Clark, Novell
Xiaohong Dong, Beijing Redflag Chinese 2000 Software Co., Ltd.
Bernd Eilers, Oracle Corporation
Andreas J. Guelzow
Dennis E. Hamilton
Bart Hanssens, Fedict
Donald Harbison, IBM
Shuran Hua, Beijing Redflag Chinese 2000 Software Co., Ltd.
Mingfei Jia, IBM
Peter Junge, Beijing Redflag Chinese 2000 Software Co., Ltd.
Doug Mahugh, Microsoft Corporation
Eric Patterson, Microsoft Corporation
Tom Rabon, Red Hat
Daniel Rentz, Oracle Corporation
Andrew Rist , Oracle Corporation
Svante Schubert, Oracle Corporation
Charles Schulz, Ars Aperta
Jerry Smith, US Department of Defense (DoD)
Patrick Strader, Microsoft Corporation
Louis Suarez-Potts, Oracle Corporation
Alex Wang, Beijing Sursen International Information Technology Co., Ltd.
Robert Weir, IBM
CD 05 Rev 1 (2010-09-08) Added revision history appendix, non-normative references
CD 05 (2010-07-12) Updated based on public review comments
CD 04 (2010-02-10) Corrected error in description of chart URI
CD 03 (2009-11-27) Fixed numbered lists.
CD 02 (2010-11-02) Explain OIC TC in intro, more detail in “Priority Areas for Improvement”, spelling corrections
CD 01 (2009-10-20) Initial version
1'Product' here is used in its neutral sense, as “something produced”. It is not intended to connote a commercial use.