XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Building a Worldwide Lexicon

May 10, 2002

This article describes a new system for networking dictionaries and translation services on the Web. Think of this as GNUtella for language services. While the system described in this article may appear to be a huge undertaking, it will be built from many relatively simple components that talk to each other via a common client/server interface based on SOAP (Simple Object Access Protocol).

This article is also a call to action. The ultimate goal of the Worldwide Lexicon (WWL) project is to improve multilingual communication, and to make language services easily accessible to a wide range of Internet applications. The potential uses for the system are extensive.

Why should you care? Simply turn on the television or read the news. These are dangerous times, in part because of poor communication between cultures. The most important barriers today are language barriers. The Worldwide Lexicon project has the potential to reduce these barriers somewhat, by enabling people to communicate more effectively in other languages using a variety of tools.

For WWL to succeed requires the talents of many people, both to retrofit existing applications (such as Web dictionary servers or IM/chat clients), and to build new services based on the WWL protocol. This is a challenging project, but also an interesting and potentially valuable one. If after reading this article, you would like to take time out to help, visit the WWL Web site at http://www.worldwidelexicon.org to learn more.


The Worldwide Lexicon (www.worldwidelexicon.org) is an initiative to create a peer-to-peer system that allows programs and their users to automatically locate and communicate with dictionaries, encyclopedias, translation servers, and semantic networks throughout the Web.

The Worldwide Lexicon is also inspired by distributed computing projects such as SETI@home, and will allow participating dictionaries and encyclopedias to open their systems to user submissions (more on this in a moment), and to poll a large number of Internet users to submit definitions, score translations, and more.

SETI@home taps idle CPUs to crunch numbers; WWL servers will tap idle Internet users to provide information, review work from other users, and so forth. WWL asks computers to do what they excel at (manage large volumes of information), and asks humans to do what they excel at (infer meaning, describe things, etc.).

The foundation of the Worldwide Lexicon is a simple protocol based on SOAP. The Worldwide Lexicon protocol defines a small set of SOAP methods that creates a common interface to build client and server applications. The protocol provides three basic services:

  • Allows client applications to automatically discover WWL servers by invoking a single SOAP method on a supernode (i.e. find WWL servers that host English-Urdu translations). This enables WWL clients to automatically locate WWL servers based on the desired language and services required.
  • Allows clients to submit queries to WWL-compliant dictionaries, encyclopedias, and semantic network servers (i.e. find synonyms for the English word "orb," find Spanish translations for the word "beach, " etc.).
  • Allows clients to poll WWL servers to fetch requests for user contributions, translations, or peer review. This is one of the most interesting facets of the Worldwide Lexicon, and will be discussed later in this article.

GNUtella for Dictionaries

There are hundreds, perhaps thousands, of dictionaries, encyclopedias and translation servers scattered throughout the Web. They all perform the same basic functions. The problem is that these are mostly homegrown systems. Each dictionary has a slightly different front-end or CGI script. Consequently, all this information is fragmented and bottled up behind proprietary front-ends.

The Worldwide Lexicon solves this problem by creating a simple and easily implemented server-discovery mechanism. One of the methods defined in the WWL protocol (WWLFindServers), allows a client application to contact a supernode to request a list of currently active WWL servers for a specific language or language pair. WWL supernodes, like GNUtella directory servers, simply maintain a list of active sites and what services they provide (which in turn may keep lists of their peers).

Implementing this in client software is easy, and requires only a few lines of code, as illustrated by the following example (written in Visual Basic using the PocketSOAP tool, which I highly recommend for SOAP novices).

set pf = CreateObject("pocketsoap.Factory")
set wwl = pf.CreateProxy(WWLsupernode)
serverlist = wwl.WWLFindServers("english.adolescent","english","","dict")

That's pretty simple. These three generic lines of code allow you to locate WWL servers on the fly (for example, you could use this to build a Web browser plug-in that performs generic dictionary and encyclopedia queries. This example returns a serverlist object that contains a list of servers that match the search criteria, their proxy addresses, etc.

Once you've located a WWL server, the next step is to send a query. This also requires just a few lines of code.

set wwl_server = pf.CreateProxy(serverlist(1).wdsl))
results = wwl_server.WWLTranslate("english.adolescent","english","rad")

This example returns a results object which contains an array of possible translations for the search term.

Congratulations, you can now proceed to decode your teenage daughter's utterances. Or you could just as easily say WWLTranslate("english.british","english.us","cheeky"). Or WWLTranslate("english","espanol","beach"). Or, you could submit a sentence or paragraph to a full-text translation server.

Regardless of whether you want to look up a definition within a language (e.g. look up an encyclopedia entry for The Beach Boys), or translate a word or phrase between languages, or submit a full text to a machine translation server, the procedure is exactly the same. You use WWLFindServers() to hunt for a Worldwide Lexicon server that can handle your request. Then you submit a query using one of three simple functions (WWLSearchText, WWLTranslate, or WWLQuery).

Of course, you'll want to add some extra code to trap errors and to process the returned results. For example, more sophisticated WWL servers will recognize WWLQuery, which allows clients to submit SQL-like queries (for example, to search for synonym for a word that can only be used as a noun). Others will recognize the simpler WWLSearchText and WWLTranslate methods. But even after adding these few extra features, it is still a very simple interface.

Pages: 1, 2, 3, 4

Next Pagearrow