
|
Search Web Services Version 1.0 Discussion Document 2 November 2007 |
URIs:
http://docs.oasis-open.org/search-ws/v1.0/DiscussionDocument.doc
http://docs.oasis-open.org/search-ws/v1.0/DiscussionDocument.pdf
http://docs.oasis-open.org/search-ws/v1.0/DiscussionDocument.html
Technical Committee:
OASIS Search Web Services TC
Chair(s):
Ray Denenberg
Matthew Dovey
Related work:
This specification replaces or supercedes:
This specification is related to:
Status:
This document has no official status. It was prepared by the OASIS Search Web Services TC as a strawman proposal, for public review, intended to generate discussion. It is not a Committee Draft.
Purpose of this Document
This specification is based on the SRU (Search Retrieve via URL) specification which can be found at http://www.loc.gov/standards/sru/. It is expected that this standard, when published, will deviate from SRU. How much it will deviate cannot be predicted at this time. The fact that the SRU spec is used as a starting point for development should not be cause for concern that this might be an effort to fast track SRU. The committee hopes to preserve the useful features of SRU, but not to preserve those that are not considered useful.
The OASIS Technical Committee developing this standard has decided to request OASIS to release this as a discussion document. Detailed review of this document is premature at this point, but feedback on the functionality and approach is solicited.
Open Issues
There are several current open issues before the committee not reflected in the body of the document.
There is a wiki for the committee at http://wiki.oasis-open.org/search-ws/FrontPage, and an issues list at http://wiki.oasis-open.org/search-ws/issues
These issues are summarized here:
Notices
Copyright © OASIS® 2007. All Rights Reserved.
All capitalized terms in the following text have the meanings assigned to them in the OASIS Intellectual Property Rights Policy (the "OASIS IPR Policy"). The full Policy may be found at the OASIS website.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published, and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this section are included on all such copies and derivative works. However, this document itself may not be modified in any way, including by removing the copyright notice or references to OASIS, except as needed for the purpose of developing any document or deliverable produced by an OASIS Technical Committee (in which case the rules applicable to copyrights, as set forth in the OASIS IPR Policy, must be followed) or as required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be revoked by OASIS or its successors or assigns.
This document and the information contained herein is provided on an "AS IS" basis and OASIS DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY OWNERSHIP RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
OASIS requests that any OASIS Party or any other party that believes it has patent claims that would necessarily be infringed by implementations of this OASIS Committee Specification or OASIS Standard, to notify OASIS TC Administrator and provide an indication of its willingness to grant patent licenses to such patent claims in a manner consistent with the IPR Mode of the OASIS Technical Committee that produced this specification.
OASIS invites any party to contact the OASIS TC Administrator if it is aware of a claim of ownership of any patent claims that would necessarily be infringed by implementations of this specification by a patent holder that is not willing to provide a license to such patent claims in a manner consistent with the IPR Mode of the OASIS Technical Committee that produced this specification. OASIS may include such claims on its website, but disclaims any obligation to do so.
OASIS takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on OASIS' procedures with respect to rights in any document or deliverable produced by an OASIS Technical Committee can be found on the OASIS website. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this OASIS Committee Specification or OASIS Standard, can be obtained from the OASIS TC Administrator. OASIS makes no representation that any information or list of intellectual property rights will at any time be complete, or that any claims in such list are, in fact, Essential Claims.
The names "OASIS", [insert specific trademarked names, abbreviations, etc. here] are trademarks of OASIS, the owner and developer of this specification, and should be used only to refer to the organization and its official outputs. OASIS welcomes reference to, and implementation and use of, specifications, while reserving the right to enforce its marks against misleading uses. Please see http://www.oasis-open.org/who/trademark.php for above guidance.
Table of Contents
4 The searchRetrieve operation
4.3 Version: the “version” Parameter
4.6.1 Diagnostic Categories: Fatal vs. Non-fatal, and Surrogate Vs. Non-Surrogate
4.7 Extensions: the “extraRequestData’, ‘extraResponseData’, and xtraRecordData’ Parameters
4.8 Echoing the Request: The “echoedSearchRetrieveRequest” Parameter
4.9 Stylesheets: the ‘stylesheet’ Parameter
8.3.2 SOAP Parameter Differences
8.3.3 Extension Parameters via SOAP
D.1 OpenSearch Description Document
D.3 OpenSearch Response Elements
E. Authentication, Authorization, and Access Control
E.2 Authorization and Access Control
E.7 Web Services Security and Security Assertion Markup Language (SAML) Security Tokens
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC2119].
[RFC2119] S. Bradner, Key words for use in RFCs to Indicate Requirement Levels, http://www.ietf.org/rfc/rfc2119.txt, IETF RFC 2119, March 1997.
The Search web service is a means of opening a database to external enquiry in a standardized manner that facilitates discovery of query and response possibilities and makes it possible for heterogeneous databases to be queried simultaneously with the same or similar queries. Client software can be easily configured using a standardized XML explain document that is accessible from the base URL or via the explain operation. In contrast with protocols such as SQL and XQuery, detailed knowledge of a database’s structure is not necessary as the explain document contains parsable information on server defaults, searchable indexes and record schemas that are returned in the response.
Context sets can be made for use with the search web service that define standard index names and search attributes thus facilitating multi-database searching via either a single or similar searches. Profiles can be registered combining context sets and record schemas and so ensure inter-operability in a variety of domains.
Two kinds of enquiry access are defined; search via keywords or phrases that returns a result set of records and scan via terms that returns a list of terms in an index.
A search or scan can be expressed in a simple URL, enabling a search to be embedded in any web page. The server may send the results with an accompanying XML style sheet, thus the service can be widely used in web pages without any underlying programming.
CQL, the Contextual Query Language, is a formal language for representing queries to information retrieval systems such as web indexes, bibliographic catalogs and museum collection information. The design objective is that queries be human readable and writable, and that the language be intuitive while maintaining the expressiveness of more complex languages.
Traditionally, query languages have fallen into two camps: Powerful, expressive languages, not easily readable nor writable by non-experts (e.g. SQL, PQF, and XQuery);or simple and intuitive languages not powerful enough to express complex concepts (e.g. CCL and google). CQL tries to combine simplicity and intuitiveness of expression for simple, every day queries, with the richness of more expressive languages to accommodate complex concepts when necessary.
A CQL query consists of either a single search clause [example a], or multiple search clauses connected by boolean operators [example b]. It may have a sort specification at the end, following the 'sortBy' keyword [example c]. In addition it may include prefix assignments which assign short names to context set identifiers [example d].
Examples:
a. dc.title = fish
b. dc.title = fish or dc.creator = sanderson
c. dc.title = fish sortBy dc.date/sort.ascending
d. > dc = "info:srw/context-sets/1/dc-v1.1" dc.title any fish
A search clause consists of either an index, relation and a search term [example a], or a search term by itself [example b]. If the clause consists of just a term, then the index is treated as 'cql.serverChoice', and the relation is treated as '=' [example c]. (Therefore example b and c are semantically equivalent.)
Examples:
Search terms MAY be enclosed in double quotes [example a], though need not be [example b]. Search terms MUST be enclosed in double quotes if they contain any of the following characters: < > = / ( ) and whitespace [example c]. The search term may be an empty string [example d], but must be present in a search clause. The empty search term has no defined semantics.
Examples:
An index name always includes a base name [example a] and may also include a prefix [example b], which determines the context set of which the index is a part. The base name and the prefix are separated by a dot character ('.'). If multiple '.' characters are present, then the first should be treated as the prefix/base name delimiter. If the prefix is not supplied, it is determined by the server. Examples:
Examples:
The relation in a search clause specifies the relationship between the index and search term. It also always includes a base name [example a] and may also include a prefix providing a context for the relation [example b]. If a relation does not have a prefix, the context set is 'cql'. If no relation is supplied in a search clause, then = is assumed, which means that the relation is determined by the server. (As is noted above, if the relation is omitted then the index MUST also be omitted; the relation is assumed to be A=@ and the index is assumed to be cql.serverChoice; that is, the server choses both the index and the relation.)
Examples:
Relations may be modified by one or more relation modifiers. Relation modifiers always include a base name, and may include a prefix for a context set [example a] as above. If a prefix is not supplied, the context set is 'cql'. Relation modifiers are separated from each other and from the relation by forward slash characters('/'). Whitespace may be present on either side of a '/' character, but the relation plus modifiers group may not end in a '/' [example b]. Relation modifiers may also have a comparison symbol and a value. The comparison symbol is any of = < <= > >= <>. The value must obey the same rules for quoting as search terms, above [example c].
Examples:
Search clauses may be linked by boolean operators. These are: and, or, not and prox [example in 3.1.8]. Note that not is 'and-not' and must not be used as a unary operator. Boolean operators all have the same precedence; they are evaluated left-to-right. Parentheses may be used to override left-to-right evaluation [example b].
Examples:
a. dc.title = “monkey house” and dc.creator = vonnegut
b. dc.title = “monkey house” not dc.creator = vonnegut
c. dc.title = fish or dc.creator = sanderson
d. dc.title = fish or (dc.creator = sanderson and dc.identifier = "id:1234567")
Booleans may be modified by one or more boolean modifiers, separated as per relation modifiers with '/' characters. Again, boolean modifiers consist of a base name and may include a prefix determining the modifier's context set [example a]. If not supplied, then the context set is 'cql'. As per relation modifiers, they may also have a comparison symbol and a value [example b].
Examples:
Basic proximity modifiers are defined in the CQL context set .[reference]. Proximity units 'word', 'sentence', 'paragraph', and 'element' are defined there and may also be defined in other context sets. Within the CQL set they are explicitly undefined. When defined in another context set they may be assigned specific meaning.
Thus compare "prox/unit=word" with "prox/xyz.unit=word". In the first, 'unit' is a prox modifier from the CQL set, and as such its values are undefined, so 'word' is subject to interpretation by the server. In the second, 'unit' is a prox modifier defined by the xyz context set, which may assign the unit 'word' a specific meaning.
The context set xyz may define additional units, for example, 'street':
prox/xyz.unit="street"
This approach, 'prox/xyz.unit="street"', is chosen rather than 'Prox/unit=xyz.street' for the following reason. In the first case, 'unit' is a modifier defined in the xyz context set, and 'street' is a value defined for that modifier. In the second, 'unit' is a modifier from the cql context set, with a value defined in a different set. so its value would have to be one that is defined in the cql context set. This approach is chosen to avoid pairing a modifier from one set with a value from another, which can lead to unpredictable results.
Queries may include explicit information on how to sort the result set generated by the search. (See result set model [reference].)
The sort specification is included at the end, and is separated by a 'sortBy' keyword. The specification consists of an ordered list of indexes, potentially with modifiers, to use as keys on which to sort the result set. If multiple keys are given, then the second and subsequent keys should be used to determine the order of items that would otherwise sort together. Each index used as a sort key has the same semantics as when it is used to search.
Modifiers may be attached to the index in the same way as to booleans and relations in the main part of the query. These modifiers may be part of any context set, including the CQL context set and the Sort context set [reference]. This is the only time when a modifier may be attached to an index. If a modifier may be used in this way it should be stated in the description of its semantics. As many types of search also require specification of term order (for example the <, > and within relations), these modifiers are often specified as relation modifiers.
Examples:
Note: The use of Prefix Maps is expected to be uncommon.
A Prefix Map may be used to assign context set names to specific identifiers in order to be sure that the server maps them in a desired fashion. It may occur at any place in the query and applies to anything below the map in the query tree. A prefix assignment is specified by: '>' shortname '=' identifier [example a]. The shortname and '=' sign may be omitted, in which case it sets a default context set for indexes [example b].
Examples:
a.
> dc =
"info:units/direct-current" dc.voltage > 12
This example illustrates that while Adc@ is almost always used as
the prefix for the Dublin Core context set, this is not always so, as in this
case it is used for the AdeepCustard@ context set.
b.
>
"info:units/direct-current" voltage > 12
This query has the same meaning as example a.
All parts of CQL are case insensitive apart from user supplied search terms, values for modifiers and prefix map identifiers, which may or may not be case sensitive. If any case insensitive part of CQL is specified with mixed upper and lower case, it is for aesthetic purposes only.
Following is the Backus Naur Form (BNF) definition for CQL. ( "::=" represents "is defined as".)
|
sortedQuery |
::= |
prefixAssignment sortedQuery | scopedClause ['sortby' sortSpec] |
|
sortSpec |
::= |
sortSpec singleSpec | singleSpec |
|
singleSpec |
::= |
index [modifierList] |
|
cqlQuery |
::= |
prefixAssignment cqlQuery | scopedClause |
|
prefixAssignment |
::= |
'>' prefix '=' uri | '>' uri |
|
scopedClause |
::= |
scopedClause booleanGroup searchClause | searchClause |
|
booleanGroup |
::= |
boolean [modifierList] |
|
boolean |
::= |
'and' | 'or' | 'not' | 'prox' |
|
searchClause |
::= |
'(' cqlQuery ')' | index relation searchTerm | searchTerm |
|
relation |
::= |
comparitor [modifierList] |
|
comparitor |
::= |
comparitorSymbol | namedComparitor |
|
comparitorSymbol |
::= |
'=' | '>' | '<' | '>=' | '<=' | '<>' | '==' |
|
namedComparitor |
::= |
identifier |
|
modifierList |
::= |
modifierList modifier | modifier |
|
modifier |
::= |
'/' modifierName [comparitorSymbol modifierValue] |
|
prefix, uri, modifierName, modifierValue, searchTerm, index |
::= |
term |
|
term |
::= |
identifier | 'and' | 'or' | 'not' | 'prox' | 'sortby' |
|
identifier |
::= |
charString1 | charString2 |
|
charString1 |
:= |
Any sequence of characters that does not include any of the following: whitespace ( (open parenthesis ) ) (close parenthesis) = < > '"' (double quote) / If the final sequence is a reserved word, that token is returned instead. Note that '.' (period) may be included, and a sequence of digits is also permitted. Reserved words are 'and', 'or', 'not', and 'prox' (case insensitive). When a reserved word is used in a search term, case is preserved. |
|
charString2 |
:= |
Double quotes enclosing a sequence of any characters except double quote (unless preceded by backslash (\)). Backslash escapes the character following it. The resultant value includes all backslash characters except those releasing a double quote (this allows other systems to interpret the backslash character). The surrounding double quotes are not included. |
|
|
||