Scientists have already recorded the genetic codes of about 100 organisms, many of them agents of disease, as well as more than 700 viruses (which are technically not independent living things, because they can't reproduce themselves).
By far the most complex of these disease-causing microbes is Plasmodium falciparum, the deadliest of the four species of malaria parasites, whose genetic code for all practical purposes is now complete. The official paper is expected to be published over the summer.
Plasmodium falls into the family of protozoans - single-celled organisms that are far more complex than bacteria or viruses. In fact, the parasite's genetic code more closely resembles the human code than it does a bacterium's.
An estimated two million to three million people die every year from malaria, making it one of the world's top killers, along with tuberculosis and AIDS. Current medications are often ineffective, because plasmodium is a moving target, evolving into new strains resistant to whatever drugs doctors invent to combat them.
"Without the complete genome sequences, it's been like looking for something under a lamppost. With the genome, we shine light on the whole biology of these organisms," said Michael Gottlieb, chief of parasitology at the National Institute of Allergy and Infectious Diseases.
Throughout history, the vast majority of deaths have come from infectious diseases. "If you live in Africa, infectious diseases are your number-one concern," Roos said.
Noninfectious illnesses such as cancer and heart disease now kill the majority of Americans. But AIDS - and the flu, E. coli, salmonella, West Nile virus, and hundreds of others - have shown that people in the United States remain vulnerable to new infectious diseases or strains, or to old pathogens that have evolved resistance to existing drugs.
Meanwhile, using the human genome to combat noninfectious diseases is proving extremely difficult. It's been years, in some cases decades, since scientists found the human genetic flaws responsible for the simplest of these diseases - the ones caused by a single gene, such as cystic fibrosis, Huntington's chorea, and dozens of other often-fatal inherited illnesses. Most of them are still incurable.
Cancer, diabetes and most of the other big noninfectious diseases are thought to be linked to complex combinations of many human genes that will take far longer to figure out.
And yet a growing number of scientists suspect that even some of these more common diseases, including Alzheimer's, heart disease and several mental illnesses, also have an infectious component - a bug that acts as a trigger.
Perhaps, they argue, by studying the comparatively simple genome of a pathogen, these diseases can be stopped before they begin to develop in the human body.
The genome of any living thing refers simply to its entire genetic code, which is stored in the DNA coiled up in nearly every cell. The code is written in a chemical alphabet made up of four characters, denoted as A, T, C and G.
Human DNA is complicated, with a string of three billion characters. Most pathogens carry only a tiny fraction of that number. But the DNA of the malaria-causing plasmodium is complex, made up of 28 million characters, Roos said. A massive effort was required to decipher the whole thing.
The actual reading out of those millions of code letters was done at Stanford University, the Institute for Genomic Research in Rockville, Md., and the Sanger Center in Cambridge, England.
Once the work was completed, scientists had an impressive collection of data that was essentially an uncracked code - a huge string of A's and T's and C's and G's.
Roos' job at Penn's Genomics Institute is to take the data emerging from the sequencing centers and make it accessible to the general scientific community in ways that allow individual researchers to ask their own questions.
"People often ask me, What does it mean that we've 'sequenced' the human genome?" Roos said. To the intense, 45-year-old biologist, the meaning lies in what use it has to scientists. "Our goal is to allow people to ask questions using the genomes that they couldn't have asked before."
The sequences of characters that make up the DNA are divided into small segments - the genes - that hold recipes for making proteins. These proteins are the substances that actually run the organism: hormones that affect growth, mood and reproduction; enzymes that digest food; and chemical messengers that relay information to and from the brain.
As the plasmodium genome has been sequenced in sections over the last few years, Roos' team and his colleagues in Chris Stoeckert's lab, also at Penn, have compiled and sorted the code letters for its various genes into a gigantic database that scientists can search. Already, it gets more than 10,000 hits a day from researchers in more than 100 countries.
To demonstrate how a scientist might use his database, Roos types into a desktop computer the word protease - a kind of enzyme that many organisms use in digestion. The genetic code for plasmodium presumably would carry the recipes for making a number of different types of protease, just as the human genome does.
The computer interfaces with other databases and, within seconds, Roos gets a list of 503 genes that possibly hold recipes for the various kinds of protease that the malaria parasite is capable of generating. Another click brings up the exact sequences - the combination of letters that make up the recipes.
With these codes in hand, scientists working to develop a new drug or vaccine can compare the plasmodium's enzyme blueprints with those of other parasites as well as with enzymes made in human cells, recorded in other databases. For example, they can narrow down the possibilities by researching what similar drugs might already exist to fight other parasites.
The comparison with human genes is crucial. In creating new drugs, scientists want to figure out how to destroy the parasite without causing major side effects in the infected human body. Ideally, a drug should target some enzyme or other part of the bug's biology that has no counterpart in people.
That can be difficult with plasmodium and other protozoans that have a strong resemblance to human cells. All living things can be divided into two categories. The simple organisms, called prokaryotes, encompass the world of bacteria. The complex organisms, called eukaryotes, include protozoans, fungi, plants and animals, including humans.
While many of plasmodium's genes are identical to those in human cells, others are quite different. Among those would be the genes that hold instructions for making protease, which is essential to the microbe's survival. Plasmodium eats the hemoglobin in red blood cells, Roos said, and requires the protease to digest it.
If you disable the protease, the parasite starves.
Many pathogens rely on protease. Even HIV, a simple virus, uses a type of the enzyme - and "protease inhibitors" are now a drug of choice in keeping AIDS at bay.
Efforts are under way elsewhere to use the genetic information gleaned from a number of other human pathogens. Among them is anthrax, which is being studied partly for forensic purposes.
Within the next month, scientists expect to announce a comparison between the organisms found in the Florida office of American Media Inc. - site of the first death in last fall's bioterror attacks - and the standard strains used in U.S. laboratories. That could help determine where the anthrax originated and better target the manhunt.
And at the National Institutes of Health, scientists are using the sequence for the bacteria that causes TB in their search for new drugs. Current medications must be taken for six months to be effective, said Clifton Barry, chief of TB research at the institutes.
In developing countries, people often can't get the full supply, allowing time for the Mycobacterium tuberculosis to evolve into drug-resistant strains, he said. The hope is that by using the genetic code, scientists will be able to invent new drugs that work faster.
"The genome allows you to gain more insight into the biology of the organism," said Barry, "and use that to figure out which steps will effect a cure."
Contact Faye Flam at 215-854-4977 or firstname.lastname@example.org.