Building a Worldwide Lexicon
by Brian Jepson
|
Pages: 1, 2, 3, 4
Multilingual Chat
Multilingual chat is another interesting (and challenging) application for the Worldwide Lexicon. Imagine being able to use a chat program that assists you in composing messages in another language.
The chat program could do this in several ways: 1) it could use WWL to locate and query full-text translation servers (to request automatic machine translations for your incoming or outgoing texts); 2) it could use dictionary tools to assist you in composing messages in another language; and/or 3) it could use WWL-IM gateway servers to request other live users to translate a word or text.
Machine Translation
Thanks to WWL, adding machine translation to a chat program is very easy. The chat program simply invokes WWLFindServers() to fetch a list of active MT servers for a language pair, and then invokes WWLTranslate() to ask one of these servers to translate the message text.
The chat program does not need to know anything about how the MT server processes and translates text. (For more info see Using WWL To Talk To Machine Translation Servers.)
This will be very easy to implement. The only caveat is that machine translation often misinterprets words, especially slang words and metaphors. This approach should be complemented by dictionary queries, and, if needed, by queries to live agents.
NOTE: Even if automatic translations occasionally produce inaccurate results, chat is an interactive medium. Users can simply try sending the message again with different words.
Inline Dictionaries and Translation Aids
A user who understands a foreign language, but has a weak vocabulary, may use the chat client in a different mode. Instead of asking the chat client to relay messages to a full-text translation server, the program uses word/phrase dictionaries to guide the user in composing messages in another language.
What it would do is monitor you as you type. Each time you type a word or phrase that it does not know the chat client would query a WWL server to lookup possible translations for the entry. If there was a direct translation, the chat client would insert this automatically. If the word had many uses or meanings, it would force you to clarify or disambiguate your statement via a dialog box, extra keystroke, etc.
While this would be very tedious for document translation, chat is a real-time medium. Users have already adapted to chat systems by creating their own grammar and vocabularies. A cleverly designed WWL chat client would require the user to do some extra work, but not much. Users would learn to use words that have specific meanings, and to use simple word order.
While such a system would not produce perfect translations, it might suffice for informal real-time communication between users who speak different languages. It would certainly be a useful tool for people who know a language, but have a poor vocabulary. For example, I studied Latin and Spanish in school, so I have a general understanding of several European languages. My vocabulary, however, is terrible. A tool like this would allow me to communicate more effectively, and also help me to improve my vocabulary.
See Also:
Other Applications
These are just two examples. What is interesting about WWL is that it will enable developers to embed dictionary and semantic net features in any program that can invoke SOAP methods.
Other ideas -- programs that automatically generate glossaries for Web sites (usually a tedious chore for Webmasters); smarter email filtering software; or even a perverted version of Microsoft's infernal paper clip that informs its user of the sexual or scatological connotations of seemingly innocent words.
In short, any application that could benefit by being able to query a dictionary or semantic net could use WWL.
This Is All Bullshit, It Will Never Work
When people first learn of this system, the initial response is usually TMMP (too many moving parts). At first glance, it does seem like a complicated system, one that is perhaps destined to collapse under its own weight.
While the system as a whole can be used to accomplish some nifty tricks, it is composed of a collection of simple elements. These elements each perform a specific task, are easy to code, and do not need to know a great deal about other components of the system. This enables developers or information providers to focus on a specific aspect of the system without worrying about what everybody else is up to. Some examples:
- Supernodes (directory servers): These are the servers your client app talks to when it needs to find an english.adolescent --> english.oldfart dictionary. All they do is match clients up with active WWL servers that can field their query. They don't process dictionary searches. They don't process user contributions. They just say, "You're looking for an English-Urdu server? Here you go. Now move along."
- Read-only dictionary servers: These WWL servers allow clients to perform queries, but do not accept contributions from users. They implement some methods defined in the WWL protocol, but not others. Because they do not accept user contributions, they don't need to know anything about the post procedures, the Lexicon@Home client, etc. They just accept lookup queries, and reply with WWL compliant results.
- Gateway servers: These servers simply translate incoming SOAP/WWL requests into other protocols (e.g. DICT, Jabber, proprietary HTTP CGI), and then report the results back via the SOAP/WWL interface.
- Lexicon@Home client: All this application does is sense when the user is apparently available to do a small amount of work, and to invoke two methods on a WWL server that the user has volunteered to contribute to. If the WWL has some work for the user, it replies with a job ID and a Web URL. The client points the user's Web browser (or its own mini-browser) at this URL. It doesn't know anything about the internal details of how the WWL server assigns jobs to users. It also doesn't know anything about a particular WWL server's data entry procedure. It just points the browser to the URL -- the code for the data entry form is served by the WWL server handling the request.
- Read/write WWL servers with editor- or user-controlled submissions: Each Worldwide Lexicon server that allows public submissions will probably approach this slightly differently. Some will ask their users to provide detailed information (for example, to fully conjugate verbs), while others will collect simpler entries. The decisions about what data to request, and about how it is stored internally are left to each WWL server owner.
- Client applications: Most client applications that use the Worldwide Lexicon system do not need to how to do anything besides locate WWL servers (using the WWLFindServers method) and submit queries to them. For example, a Web browser or text editor plug-in that allows its user to fetch definitions for words or phrases doesn't need to know anything about how to post new entries to the system. A fancy client program might want to implement these features.
WWL will work. Whether it will succeed or not is the question. Other open source and peer-to-peer projects started as grassroots efforts have demonstrated that you don't need the backing of a corporate titan, just a community of users committed to a project.
In order to succeed, the project will require the talent and goodwill of many people. Such a system will offer some compelling benefits (especially the full-text translation applications). If you would like to learn more or contribute to the system, visit our site at www.worldwidelexicon.org.
