WSDL Tales From the Trenches, Part 3
This article is the third and final part of the WSDL Tales from the
Trenches series, and in it I concentrate on the data in web services.
More specifically, I examine the type definitions and element declarations
in the types element of a WSDL document. Such types and
elements are for use in the abstract messages, the message
elements in a WSD.
WSDL does not constrain data definitions to W3C XML Schema (WXS). However, alternatives to WXS are not covered in this article: the goal of the series is to provide help and guidance with current real-world problems, and I have not seen any of the alternatives to WXS being used for web services on a significant scale to date. This may change in the future: while only the WXS implementation is discussed in the WSDL 1.1 spec, it was always the intention of the WSDL designers to provide several options. The WSDL 1.2 draft's appendix on Relax NG brings this closer to realization.
Data modeling with WXS is not for the faint-hearted. It presents a lot of pitfalls. This article will point some of these out and helps you avoid them. At the very least, it should caution you to tread carefully. I will not attempt to explain WXS. There is a wealth of good texts that do so; this article focuses on how to do basic data modeling for web services. Many of the more advanced topics are avoided.
Data may be defined directly in the types element of a
document containing abstract messages. The recommended practice, however,
is to import a separate document; the previous installment discussed the
increased readability, extendibility and opportunities for reuse this
brings.
This can be done by using WSDL's import element or by
using WXS's. Although they have the same name, they are different
elements as they reside in different namespaces. In order to distinguish
them, I will refer to WSDL's element as wsdl:import and use
xsd:import to denote that of WXS. I will explain the
difference between them with examples of both mechanisms.
stockquote.wsdl
uses wsdl:import to import another
WSDL document that only contains data definitions. In other words,
the only top level element is types.
The
2nd variant does not import a WSDL document but a
schema. In order to do so, it must use xsd:import as a
child element of schema, which, in turn is a child of
types.
Note that the WSDL 1.1
specification's example 2, a stockquote service, does not do either of
these: it uses wsdl:import to import a schema at top level.
However, WS-I (draft) basic profile clarifies the import mechanism in rules
2001 to 2004 and castigates the W3C Note for "... incorrectly
show[ing] the WSDL import statement being used to import WXS
definitions".
The above examples are in essence the same as those the WS-I basic
profile offers as a correction to WSDL 1.1's, except that in the basic
profile examples the imported and importing element have the same target
namespace. In the case of xsd:import, this is wrong; the WXS
spec does not allow it. In the case of wsdl:import, it is
unfortunate; as pointed out in the previous installment, this is bad style
and should have been disallowed.
If it takes several documents to define a schema with a single
namespace, xsd:include or xsd:redefine should be
used.
This section is about what data definitions should be exposed and which should be hidden. The trade off is between the potential for reuse and a narrow interface.
|
Related Reading
XML Schema |
Rule 2203 of the basic profile stipulates that abstract message parts,
bound to a concrete message transporting an RPC invocation, should be
defined using the type attribute.
Rule 2204 states that abstract message parts used in document-style
invocation should have an element attribute. If you are
using SOAP, it is a good idea to try to stick to these rules, even though
it makes a mockery of the "abstract message" doctrine. Therefore, there
must be an exposed type definition for data passed as a parameter to an
RPC invocation and an exposed element declaration for a document-style
invocation. In the latter case, this means that the types
element may well end up with mainly element declarations and little or no
type declarations. It looks confusing but, as Roald Dahl's BFG said,
"what I mean and what I say are entirely different things."
The Russian doll design style defines root elements globally. Elements that cannot be a document's root are defined as the need arises and so are attributes and types; these definitions are nested in the definitions that use them. Such definitions nested inside another definition are said to be local and cannot be reused in other definitions, neither by other components in the same schema nor by external components. Moreover, type definitions are anonymous and cannot be referenced.
A salami slice, on the other hand, declares all elements globally. A third design style is referred to as Venetian blinds. Venetian blinds define all types globally but only expose the elements that can be used as root element of a document.
Example 1, 2 and 3 illustrate the respective styles. All three have this instance document among their productions.
Clearly, none of these styles is optimal with respect to the trade off
presented. However, it is instructive to contrast the three styles with
respect to the set criteria. In a web services context, the equivalent of
appearing as a root element of a document is to occur as the value of the
element attribute on an abstract message
part. Since neither Russian doll nor salami slice exposes
types, they cannot be used if you want to do RPC style invocation.
Venetian blinds, on the other hand, works with both RPC and document style
invocation. Venetian blinds encourage the reuse of types since it defines
them all globally. However, some types may not be intended for reuse
while their global definition makes the interface less narrow.
For a document style web service, Russian doll could not be improved upon if the only objective were a narrow interface. It does not score well on the reuse front though. Salami slice sits at the other end of this spectrum with a high score for reuse and a low one for narrowness.
|
Namespaces were discussed briefly in the previous installment. There we asked the question, what goes into the WSD's target namespace. Here I address the question what goes into a W3C XML Schema namespace. The rules were briefly reviewed in the previous article, but here we go into more detail with the aid of some examples.
Elements, types and attributes that belong to a namespace are said to be qualified. The declaration of a target namespace is a necessary, but not sufficient condition for elements, types and attributes to be qualified. So when are they qualified and when unqualified?
Let us deal with types first, they are easy: globally defined types, both simple and complex, are always qualified. Locally defined types are anonymous and so there is no way of referencing them; the question to which namespace they belong is purely academic.
Global element declarations are also easy: globally declared elements are qualified.
To illustrate what we know so far, this
instance document is validated by this
schema. We see indeed that the 2 globally defined elements
Element and Response are part of the target
namespace; the locally defined Collection element is not.
Whether or not attributes and locally defined elements are qualified is
governed by the form attribute. The attribute can take 2
values: qualified and unqualified. Therefore,
in order to qualify the Collection element in our previous
example, it can be reworked as so.
You will find that it validates this
document.
form is not a required attribute, neither when declaring
attributes nor local elements. form is assigned a value
implicitly, either by respectively the value of the
elementAttributeDefault and attributeFormDefault
attribute on the schema element, or by the default value of
these attributes; the default value is unqualified in each
case. So here
is another schema that validates the
document.
Note that the Russian doll and Venetian blinds example schemas must stipulate that elements are qualified by default in order to validate the same instance document as the salami slice example.
WSDL 1.1 recommends setting the elementFormDefault to
qualified and keeping the default for
attributeFormDefault. This should minimize the use of
explicit namespace qualifiers if you judiciously set the schema's target
namespace as the default namespace in your messages.
We have only skimmed the surface here; W3C XML Schema (see Resources for a full reference) devotes a complete chapter to controlling namespaces. However, the questions that you will most likely encounter are covered.
W3C XML Schema has 3 compositor elements that construct
complex data types from simpler ones: sequence,
choice and all. Particles are nested
inside compositor elements.
A sequence defines a compound structure in which the particles
occur in order. The particles within a choice are mutually
exclusive. However, there may be multiple occurrences of the chosen
particle. all defines an unordered group. For all three
compositors, the number of legal occurrences of the particles within them
is governed by the maxOccurs and minOccurs
attributes on those particles. These attributes are not required and
their default value is 1.
The simplest particle is an element.
sequence and choice can both act as particles
too. all cannot.
The sequence compositor is the one that is most often
encountered in WSDs. This seems a good choice; even if, conceptually,
particles could occur in any order, nailing down the order will make
parsing of messages that bit easier. However, implementations often do
not observe the order constraints. This can be shown by invoking a web
service with elements in a different order from the one laid down by a
sequence: it often does not seem to matter. That is not such a bad thing.
After all, if the server is more liberal in what it accepts than it
strictly needs to be, this does not harm well-behaved clients and it
offers some margin for error on more sloppily implemented clients. In
other words, a server that did this can hardly be accused of being in
breach of contract. Not so if the server cannot guarantee the order of
the particles that are being sent back. Faced with such a server
implementation, I spent a good deal of time working through the
ramifications of this once upon a time.
The first reflex is to replace sequence with
all compositors. However, be aware that the remedy is not
without its problems since the expressiveness of this compositor has been
severely curtailed in the WXS spec. A detailed account of why this is so
and what the precise constraints are, is beyond the current scope.
However, the main limitation has already been pointed out:
all cannot be used as a particle. Since derivation by
extension in effect uses the compositor of the base type as a particle in
the subtype, opportunities for reuse of types defined with
all are limited. Derivation is covered in further detail in
a dedicated section.
The current WXS Recommendation is 1.0 and its namespace is
http://www.w3.org/2001/XMLSchema. However, some
implementations still being used today follow the specifications of
previous working drafts, e.g.
http://www.w3.org/1999/XMLSchema. This is unfortunate and
the perpetrators should be encouraged to migrate to the released standard,
but if you should come across such implementations, here are two of the
common pitfalls. Firstly, there is a WXS data type in common use that has
changed from the 1999 to the 2001 version: 1999's timeInstant
became 2001's dateTime. Make sure that the data type you use
fits the version of WXS. Secondly, derivations also changed significantly
between 1999 and 2001. These will be covered in the following
section.
Derivation is a technique to define subtypes of a given base type. There are two kinds of derivation in WXS: extension and restriction. The former adds components at the end of the content model of the base type, the latter constrains the base type. Hence valid instances of a subtype derived by extension are not necessarily valid instances of the base type. Valid instances of a subtype derived by restriction, on the other hand, are always valid instances of the base type.
A subtype may be used anywhere where its base type is used, unless
otherwise specified. This may have the following impact on message
definitions: assume that a message definition declares a part with type
Foo, and Bar is derived by extension from
Foo. A party may send an element of type Bar in
such message. The recipient may be unable to validate this message.
Fortunately, it is possible to turn off the ability to substitute subtypes
for base types by using the block attribute on the base type
or on an element declared to be of a given base type.
Beware of derivation by extension, that is the message of this section so far. But what with derivation by restriction? From the discussion so far, it seems reasonable enough. However, using it may seem less attractive if the need is realized to list each particle of the content model of the subtype explicitly. This makes for very verbose definitions. It also does not bring the modularity benefits that an inheritance hierarchy in an OO programming language might bring: common features are not factored out, but must be repeated in each subtype. This is a change w.r.t. W3C XML Schema 1999 that caused a good deal of confusion.
Defining an array is one of the most confusing issues in WSDL. It has also caused a great deal of interoperability problems. Proceed with caution; a common approach is to extend the Array type defined in the SOAP encoding schema. In fact, this is mandated by WSDL 1.1 (see section 2.2). I was therefore surprised to see that the rules 2110 through to 2112 of the WS-I Basic Profile Working Group overrules this. On the other hand, I understand their position: WSDL 1.1 makes a pig's ear of array specifications. The basic profile's approach, on the other hand, is simple.
When I originally planned this article, it was my intention to write a good deal about SOAP arrays, how to use them in WSDs that are as near correct as is possible given the flaws in WSDL 1.1. However, given the basic profile's recommendation, the sensible thing is to avoid them altogether.
The purpose of this article was to flag some of the issues that require attention when modeling data. You should be underestimate neither the importance of defining data nor the complexity of the task. It is important because the data passed across the web service interface largely determine the quality of the interface. It is complex because data modeling is inherently complex. Nonetheless, I cannot help feeling that XML W3C Schema 1.0 does not mitigate this complexity adequately. I look forward to tools better suited to data modeling for web services.
The W3C has published two normative documents on the XML Schema: XML Schema Part 1: Structures and XML Schema Part 2: Datatypes. There is also a non-normative primer.
XML Schema by Eric van der Vlist, published by O'Reilly, 2002, proved to be an invaluable companion in my encounters with W3C XML Schema. Warmly recommended to anyone who is serious about data modeling with WXS.
xFront has an item on global versus local element and type declarations in its excellent best practices section. While you are browsing the xFront, do have a look at what they have to say about web services as well, which is controversial and thought-provoking.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.