Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

Google's Gaffe
by Paul Prescod | Pages: 1, 2, 3

Handling the Response

Here is the Google SOAP API's response message (formatted for readability, the original is available).

HTTP/1.1 200 OK
Date: Thu, 18 Apr 2002 01:41:08 GMT
Content-Length: 1325
Content-Type: text/xml; charset=utf-8

<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope 
  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
  xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
  xmlns:xsd="http://www.w3.org/1999/XMLSchema">
  <SOAP-ENV:Body>
    <ns1:doGoogleSearchResponse 
      xmlns:ns1="urn:GoogleSearch" 
      SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
      <return xsi:type="ns1:GoogleSearchResult">
        <documentFiltering 
          xsi:type="xsd:boolean">false</documentFiltering>
        <estimatedTotalResultsCount
          xsi:type="xsd:int">0</estimatedTotalResultsCount>
        <directoryCategories 
          xmlns:ns2="http://schemas.xmlsoap.org/soap/encoding/"
          xsi:type="ns2:Array"
          ns2:arrayType="ns1:DirectoryCategory[0]">
        </directoryCategories>
        <searchTime xsi:type="xsd:double">0.071573</searchTime>
        <resultElements 
          xmlns:ns3="http://schemas.xmlsoap.org/soap/encoding/"
          xsi:type="ns3:Array"
          ns3:arrayType="ns1:ResultElement[0]">
        </resultElements>
        <endIndex xsi:type="xsd:int">0</endIndex>
        <searchTips xsi:type="xsd:string"></searchTips>
        <searchComments xsi:type="xsd:string"></searchComments>
        <startIndex xsi:type="xsd:int">0</startIndex>
        <estimateIsExact
          xsi:type="xsd:boolean">false</estimateIsExact>
        <searchQuery
          xsi:type="xsd:string">constantrevolution
            rules xml</searchQuery>
      </return>
    </ns1:doGoogleSearchResponse>
  </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

Note that the resultElements element is empty because we asked for zero hits. I warned you that we would have quite a bit of XML to work with even without looking at any hits.

HTTP allows any media type in the response to a message. It could return any XML vocabulary whatsoever. The WSDL definition of the Google SOAP API already embeds a simple schema. I will use that as the basis of a schema for responses to search requests. I merely have to remove a few SOAP-isms (there is an SOAP-Enc:arrayType attribute, the SOAP-ENV:Envelope element, etc.) and choose a new root element type (I chose searchResult).

I've called the new language GoogleML. I've written a tiny XSLT stylesheet, pureGoogle.xsl, which translates GoogleSOAP into GoogleML. The resulting documents are smaller and simpler.

And here's a GoogleML equivalent to the SOAP response:

HTTP/1.1 200 OK
Date: Thu, 18 Apr 2002 02:29:56 GMT
Content-Type: text/xml

<searchResult xmlns="http://www.prescod.net/google_search_result">
    <documentFiltering>false</documentFiltering>
    <estimatedTotalResultsCount>0</estimatedTotalResultsCount>
    <directoryCategories/>
    <searchTime>0.03261</searchTime>
    <resultElements/>
    <endIndex>0</endIndex>
    <searchTips/>
    <searchComments/>
    <startIndex>0</startIndex>
    <estimateIsExact>false</estimateIsExact>
    <searchQuery>constantrevolution rules xml</searchQuery>
</searchResult>

HTTP provides the envelope so the redundant SOAP:Envelope would be redundant. The types are known in advance so they are stripped (although I could just as easily have left them in). GoogleML documents are not constrained to the subset of XML supported by SOAP. They may use any feature in standard XML, including a DOCTYPE, a DTD and processing instructions.

Handling the Other Methods

Next let's look at the getCachedPage method. Here is the original SOAP request (again formatted, original here).

POST /search/beta2 HTTP/1.1
Host: api.google.com
Accept-Encoding: identity
Content-Length: 577
SOAPAction: urn:GoogleSearchAction
Content-Type: text/xml; charset=utf-8

<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope
  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
  xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
  xmlns:xsd="http://www.w3.org/1999/XMLSchema">
  <SOAP-ENV:Body>
    <ns1:doGetCachedPage xmlns:ns1="urn:GoogleSearch" 
      SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
      <key xsi:type="xsd:string">0000</key>
      <url xsi:type="xsd:string">
          http://www.constantrevolution.com</url>
    </ns1:doGetCachedPage>
  </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

Now compare that to an HTTP-style cachedPage request:

GET /cgi/cached_page.py?key=0000&url=http%3A%2F%2Fwww.constantrevolution.com HTTP/1.0
Host: mymachine
User-agent: Python-urllib/1.15

Believe it or not, there is an even more dramatic improvement in the response. Here is the SOAP response (I have elided the majority of the embedded base64-encoded data and reformatted; unformatted version).

HTTP/1.1 200 OK
Date: Thu, 18 Apr 2002 03:01:40 GMT
Content-Length: 10744
Content-Type: text/xml; charset=utf-8

<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope 
  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
  xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
  xmlns:xsd="http://www.w3.org/1999/XMLSchema">
  <SOAP-ENV:Body>
    <ns1:doGetCachedPageResponse
      xmlns:ns1="urn:GoogleSearch"
      SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
      <return
        xmlns:ns2="http://schemas.xmlsoap.org/soap/encoding/"
        xsi:type="ns2:base64">PG1lEFESF132FE...</return>
    </ns1:doGetCachedPageResponse>
  </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

The SOAP API must base64-encode the data because the result is HTML (not well-formed XML) and perhaps could eventually include binary formats like Word documents.

Here is the HTTP-style response:

HTTP/1.1 200 OK
Date: Thu, 18 Apr 2002 03:44:34 GMT
Content-Type: text/html

<html><head>...</head></html>

Because HTTP has no problem directly embedding HTML (or any other textual or binary data type), there is no reason to base64-encode the data. Base64 data is always more verbose than unencoded binary data; the HTTP/URI version of the service will always save bandwidth and CPU power. By the way, Google already provides a service to get a cached page using exactly this technique.

For brevity, I'll skip comparing the doSpellingSuggestion requests and go directly to the responses. Consider the old-style SOAP doSpellingSuggestion response (unformatted version):

HTTP/1.1 200 OK
Date: Thu, 18 Apr 2002 03:01:40 GMT
Content-Length: 10744
Content-Type: text/xml; charset=utf-8

<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope
  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
  xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
  xmlns:xsd="http://www.w3.org/1999/XMLSchema">
  <SOAP-ENV:Body>
    <ns1:doSpellingSuggestionResponse
      xmlns:ns1="urn:GoogleSearch"
      SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
      <return xsi:type="xsd:string">britney spears</return>
    </ns1:doSpellingSuggestionResponse>
  </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

Compare that with the HTTP version:

HTTP/1.1 200 OK
Date: Thu, 18 Apr 2002 04:08:27 GMT
Content-Type: text/plain

britney spears

I could have used XML for the response, but it's overkill for this job.

Statically Declaring Types

Many people assume that the difference between HTTP-based services and RPC-based services is that the former are loosely or dynamically typed and the latter are strongly or statically typed. That's not especially true. The choice between HTTP and SOAP is a choice between protocols. The decision to statically type-check information passing across the wire has more to do with service description. The two issues are totally separate.

I've already demonstrated how one can strongly type-declare the responses to HTTP-based services using W3C XML Schema. If you want to type-declare the URI query parameters, then you can use a language designed for type-declaring HTTP-based services like Web Resource Description Language (WRDL). WRDL is still under development, but you can already solve this problem today using the more popular WSDL.

Although WSDL is most often used with SOAP, it can in fact type-declare the parameters for simple HTTP services. Here is the relevant bit of a WSDL for my HTTP version of the doGetCachedPage method:

<operation name="doGetCachedPage">
  <http:operation location="/cached_page"/>
  <input>
    <http:urlEncoded/>
  </input>
  <output>
   <mime:content type="text/html"/>
  </output>
</operation>

It turns out that WSDL's handling for the <http:urlEncoded/> works almost perfectly. It gets the parameter names from the operation's part names. If you know WSDL, then that will probably be clear to you. If not, don't worry about it, it won't affect your understanding of what follows. The input description for the other two methods is identical, so we will concentrate on the output elements.

The output for the doSpellingSuggestion has a media-type of "text/plain":

<output><mime:content type="text/plain"/></output>

Finally, the output for doGoogleSearch is in XML so we declare that directly:

<output><mime:mimeXml part="searchResult"/></output>

This refers to a part named searchResult which is based upon an element type of the same name.

I do not want to oversell WSDL's HTTP features. You cannot define sophisticated HTTP-based web services with WSDL. It falls down as soon as a web resource generates links to another web resource. WSDL cannot express the data type of the target resource. In other words it can describe only one resource and not the links between resources. SOAP lacks a first-class concept of resource and especially lacks a syntax for linking them. It is thus not surprising that WSDL inherits this flaw. Nevertheless, I hope that this weakness will be corrected in future versions of WSDL. In the meantime, this is the primary reason for the existence of the WRDL language.

Comment on this articleDoes SOAP really add anything to Google, or do you agree with the author's sentiments? Share your opinion in our forum.
Post your comments

But you do not have to wait for WRDL. WSDL is sufficient for Google's current API because the API does not make use of hyperlinks. This is a common failing of SOAP-based APIs which follows from the component-centric thinking that SOAP encourages.

What this means practically is that it is possible to generate statically typed APIs for languages like C# and Java for the Google/HTTP interface. For instance, I can generate a C# interface from an HTTP-based WSDL description. The strongly-typed code is functionally identical to the SOAP version. The C# generated from the HTTP/WSDL can have exactly the same strongly typed interface as the C# for the SOAP/WSDL version.

Here is the strongly typed C# code to do a Google search. The only thing that has changed from the version shipped with the Google API is the class name:

PureXMLGoogleHTTPBinding s = new
PureXMLGoogleHTTPBinding();
// Invoke the search method
GoogleSearchResult r = s.doGoogleSearch(keyBox.Text, 
    searchBox.Text,
    0, 1, false, "", 
    false, "", "", "");

Strong type checking (in C# and VB.NET) and easy API use (in a language like Python or Perl) are completely unrelated to the choice of protocol. It is .NET's WSDL and XML Schema features which handle the strong type checking, not its SOAP features. According to Don Box (one of the key inventors of SOAP), "At this point, I believe most SOAP plumbers have conceded that XML Schema will be the dominant type system and metadata format for interop."

HTTP works equally well with XML Schema, RELAX NG, Schematron and DT4DTDs because it strongly separates protocol data (which is in MIME format) from payload data (which is in XML).

In the spirit of truth in advertising, to get it actually working I did have to work around several bugs in Microsoft's WSDL toolkit. I hope that WSDL implementors will its HTTP binding seriously and implement it properly. Once the bugs are fixed, Microsoft's WSDL/HTTP interface to the service for C# and VB programmers will be the same in every detail as the WSDL/SOAP interface. This is true of any complete implementation of the WSDL specification because it is WSDL and XML Schema that define the service's interface, not SOAP.

In the meantime, the bugs really do not matter because Google ships a .NET interface to the service in its toolkit. Google could easily ship an HTTP-based interface. To client code there would be no difference. The same argument applies to the supplied Java binding. It would be relatively simple to ship an HTTP-based binding instead of the existing SOAP one.

Pages: 1, 2, 3

Next Pagearrow