XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Web Services for Bioinformatics

May 14, 2002

At the January 2002 O'Reilly Bioinformatics conference, Lincoln Stein delivered a keynote address on "Building a Bioinformatics Nation." In this talk, Lincoln argued that current biological databases are islands unto themselves, much like the Italian city states of the Middle Ages. He also proposed that a more formalized Web Service model could link disparate systems, and thereby create a more unified set of bioinformatics tools and databases. (For more on Lincoln's talk see his recent Nature article).

This article follows up on Lincoln's talk and explores two bioinformatic services you can try out today. By examining these specific services, we get a bird's eye view of the Web Service protocol stack, including WSDL and SOAP. Looking at working services also provides much food for thought. For example, the recently released Google API provides a glimpse of the future of business Web Services. In much the same vein, the two examples discussed here offer a glimpse of the future of bioinformatic services.

This article assumes you are familiar with the basic terminology of Web Services. If you need a quick introduction, check out my Web Services FAQ. For an introduction to Web Services for bioinformatics, take a look at Lincolnís PowerPoint slides from the O'Reilly conference.

XEMBL

Our first example is the XEMBL service from the European Bioinformatics Institute. XEMBL provides complete access to the EMBL Nucleotide Sequence Database. This database is produced in collaboration with GenBank and the DNA Database of Japan, and currently provides access to over 16.8 million records, consisting of 19.6 billion nucleotides (see EMBL Database Stats.) It also provides access to completed genomes, including the human genome, the fruit fly, and C. elegans.

XEMBL is a recently released interface that provides easy XML access to the complete EMBL database. Access is provided via two main methods. The first is a REST-like interface whereby users specify parameters within a URL, and XEMBL returns a complete XML document. The second is a SOAP interface whereby users specify parameters within SOAP messages and XEMBL returns a complete XML document within a SOAP response.

In responding to the current debate between REST and SOAP, you can see that the XEMBL group has not taken sides, and simply chosen both. This is in line with one of Lincoln's main points -- databases should provide multiple modes of access to data, from HTML, XML, and SQL, all the way to SOAP.

For the REST-like or SOAP interfaces, XEMBL expects two main parameters: an ID and a format. The ID specifies a unique international accession code; for example, SC49845 specifies the AXL2 gene in baker's yeast. The format indicates the XML format of the returned document. Two format options are currently supported: BSML (Bioinformatics Sequence Markup Language) and AGAVE (Architecture for Genomic Annotation, Visualization and Exchange). Other formats, including GAME and BIOML, are planned for future releases.

Accessing the XEMBL REST Interface

To access the XEMBL REST interface, you simply need to specify the XEMBL URL and specify the ID and format as URL parameters. For example, this URL: http://www.ebi.ac.uk/cgi-bin/xembl/XEMBL.pl?id=SC49845&format=Bsml retrieves the SC49845 record in BSML format.

To create a Java client to XEMBL, you can easily use any number of XML parsers. Example 1 below illustrates the use of JDOM. The program expects two command-line arguments: an ID followed by an XML format.

Example 1: XEMBLClient, Version 1: REST Interface


package com.ecerami.bio;

import java.lang.StringBuffer;
import org.jdom.input.SAXBuilder;
import org.jdom.JDOMException;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;

/**
* Sample XEMBL Client Program using JDOM
* For details regarding XEMBL, go to:  http://www.ebi.ac.uk/xembl/
**/
public class XEMBLClient1 {
	private String baseURL = "http://www.ebi.ac.uk/cgi-bin/xembl/XEMBL.pl?";

	public XEMBLClient1 (String id, String format) throws Exception {
		System.out.println ("Connecting to XEMBL...");
		System.out.println ("Retrieving ID:  "+id);
		System.out.println ("Format:  "+format);
		connect (id, format);
	}

	private void connect (String id, String format) throws Exception {
		//  Build document;  validation is turned off
		SAXBuilder builder = new SAXBuilder (false);

		//  Do not load External DTDs
		builder.setFeature(
			"http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

		//  Create XEMBL URL;  append id and format
		StringBuffer url = new StringBuffer (baseURL);
		url.append ("id="+id);
		url.append ("&format="+format);

		System.out.println ("Using URL:  "+url.toString());
		Document doc = builder.build (url.toString());
		XMLOutputter outputter = new XMLOutputter();
		outputter.output(doc, System.out);
	}

	public static void main (String[] args) throws Exception {
		if (args.length != 2) {
			System.out.println ("Usage:  XEMBLClient1 [ID] [Format]");
			System.out.println ("Where Format is:  Bsml or sciobj (for AGAVE)");
			return;
		}
		XEMBLClient1 client = new XEMBLClient1(args[0], args[1]);
 	}
}

As you can see in Example 1, you access XEMBL by specifying the base URL and appending the id and format parameters. JDOM takes care of the rest by downloading the specified XML file, parsing its contents, and making the contents available to your application. In Example 1, the code simply outputs the contents of the XML file, but you can also use JDOM to extract any specific elements within the returned XML document.

Pages: 1, 2

Next Pagearrow