skip primary navigation
NIEHS NIH health and human services EHP Home Page



Advanced Search
Become a Print Subscriber
Subscribe online today!
Subscriber Services
Buy EHP Publications
View Shopping Cart
Advertising Information
EHP 2004 Children's Health Issue
EHP is now Open Access

www.genelogic.com/toxicogenomicseducation

www.mdbiotechinc.com

www.embitec.com

www.firstgov.gov
skip secondary navigation

Science Selections


 

Supervised Sorting
Training Computers to Classify Toxicants

A new application of an existing computer learning system can improve the use of gene expression profiles to classify toxicants to which an animal has been exposed, according to work published this month by Guido Steiner and colleagues at the pharmaceutical company F. Hoffmann-La Roche [EHP 112:1236-1248]. The authors write that these investigations could help separate transcriptional changes that are of relevance for the mode of toxicity from mere bystander effects--coincidences that have little predictive value and that can be amplified by other computational approaches.

Predicting how any given toxicant will affect an organism is possible if similar compounds produce comparable changes in gene expression. Identifying useful panels of genes whose expression profiles change predictably in response to toxicants is a major goal for predictive toxicology. Finding effective ways to sort, interpret, and anticipate changes in expression is critical for moving these observations into a practical understanding of toxic effects. Background "noise" and the natural variation between experimental animals as well as compound-related characteristics in pharmacologic and toxic action complicate matters by creating variability in the analyzed data.

Old computers learn new tricks. Computational tools known as support vector machines discern relevant gene expression data from samples and apply what they learn in order to classify new compounds.
image credit: Photodisc

Other approaches to sorting gene expression data for toxicogenomics have included several "unsupervised" methods, in which modeling programs search for patterns within data and generate models of toxicity without being given any hint as to what kinds of expression patterns the researchers expect to find. The strength of this approach is that it allows for unbiased data exploration. On the other hand, it is not guaranteed to primarily retrieve information that is relevant for addressing the scientific problem at hand.

The approach reported by Steiner and colleagues is different. This group has applied a "supervised" method to toxicogenomics using what are called support vector macSupervised Sortinghines (SVM). SVMs--which are computational tools, not physical machines--take advantage of additional data in the form of pathology and serum measurements that are fed into the algorithm. These data are used to assign gene expression profiles to specific modes of toxicity. The SVM identifies the most relevant information for discriminating among the given samples. After learning from a "training set" of biological samples, the model should be able to correctly classify new samples exposed to compounds that the SVM has not encountered before. Therefore, the method has to construct classification rules that still work with data different from the initial training data.

Steiner and colleagues used an SVM to find classification rules connecting patterns of gene expression in response to a series of known or suspected hepatotoxicants. The predictive genes were picked using another computational tool, recursive feature elimination (RFE), which is an integral part of SVM creation. In RFE, the computer produces a ranking of all the features that it uses to define a fingerprint--in this case, the expression profiles of each gene on a microarray. Then it calculates how much each feature contributed to that fingerprint. Uninformative or redundant features tend to be eliminated in an iterative process as less relevant, allowing refinement of the fingerprint's definition to include only its most reliable features. These compact signatures can then be used to identify the class of toxicant to which an animal has been exposed.

In testing the system, the authors looked at 28 hepatotoxicants and 3 nonhepatotoxic compounds. Looking in rats, they laid out gene expression profiles, clinical chemistry, hematology, and histopathology for the different chemicals at various time points following exposure. In addition to discriminating between compounds that are hepatotoxic and ones that are not, their predictive models were in most cases also able to predict what kind of toxicant the animals had been exposed to--a direct-acting one that causes damage itself, a cholestatic one that interferes with bile, or a steatotic one that drives buildup of fat in the liver. By the same strategy, the SVM was able to recognize animals that had been exposed to the hepatotoxicant galactosamine but failed to respond with the typical necrosis and inflammation of the liver.

Pharmacologic activity can alter gene expression in the liver without necessarily signaling hepatotoxicity. The SVM correctly identified 3 tested pharmacoactive compounds as nonhepatotoxic and also correctly identified 3 hepatotoxic compounds whose mechanisms of toxicity were not included in the data sets used to train the machines. In two out of three cases, the SVM was able to correctly identify the general mode by which these compounds were toxic.

The models were extended to unknown rat strains, as well. After identifying expression profiles in Wistar rats induced by several peroxisome proliferator-activated receptor (PPAR) agonists, the SVM was used to look at data for the livers of Sprague-Dawley rats exposed to another PPAR agonist, WY14643. The SVM was able to correctly recognize both Sprague-Dawley rats exposed to WY14643 and the control animals, and could predict that treatment with WY14643 would stimulate peroxisome proliferation.

So far, the work has dealt predominantly with toxicant concentrations that yield substantial and largely unambiguous effects. To optimize the system to more accurately predict subtle changes, toxicologists and bioinformaticians will need both further improvements in computational methods and a larger database linking compounds and their effects on gene expression.

Victoria McGovern

It's All in the Interaction
Quantitating Gene Networks

Toxicologists who use microarrays hope to uncover relationships that link gene expression data to signal transduction pathways, gene networks that are often used to describe the sequence of biochemical events controlling cellular function. The large quantities of data generated by microarray studies generally are examined qualitatively--for example, by comparing whether one gene is turned on relative to another. These qualitative relationships, however, fail to describe how genes in a network influence each other. Still in their infancy are tools that quantitate the complex relationships within gene networks more comprehensively than simple correlations between pairs of genes. Now, for the first time, researchers describe a new quantitative statistical technique that assesses the interactions of genes in a network [EHP 112:1217-1224].

The team, led by Hiroyoshi Toyoshiba of the NIEHS Laboratory of Computational Biology and Risk Assessment, created a statistical software program that verifies concurrently that the expression of one gene is linked to the expression of several others. The first proof-of-concept demonstration evaluated genes that are directly responsive to tetrachlorodibenzo-p-dioxin (TCDD; a ubiquitous environmental pollutant and known human carcinogen) and their effect on the retinoic acid signal transduction pathway.

Signal transduction pathways respond to different environmental conditions; they are like molecular circuits that detect and integrate diverse external signals to alter gene transcription. This results in changes in enzyme activities as well as the production of abnormal levels of proteins, which further results in changes in biochemical processes. Alterations in signal transduction pathways can lead to cancer and other disorders.

Notating networks. A new statistical package goes beyond qualifying interactions between a single pair of genes to describe how multiple genes within a network influence expression.
image credit: Photodisc, Matt Turner/EHP, Chris Reuther/EHP

Toyoshiba and colleagues had earlier identified genes that are altered in lung airway epithelial cells after exposure to TCDD. Starting with microarrays composed of 2,000 genes that are known to be expressed in response to environmental toxicants, the researchers had identified 11 genes that responded significantly to TCDD in two different lung cell lines. These genes appeared to be involved in the effects of TCDD on the retinoic acid signal transduction pathway.

The researchers constructed a hypothetical model of the retinoic acid signal transduction pathway that describes how the 11 genes interrelate. Based on published reports on retinoic acid metabolism, the model postulated that dietary vitamin A (retinol) is converted first to retinal and then to retinoic acid by alcohol dehydrogenases and, possibly, by cytochrome P450 enzymes. Once synthesized, retinoic acid enters the cell nucleus. There, it binds retinoic acid receptor beta, which, in turn, alters the expression of genes that may play a role in tumor formation. The hypothetical model included genes that produce three alcohol dehydrogenases, a cytochrome P450 enzyme, retinoic acid binding proteins and receptors, and four nuclear proteins.

Following exposure to three concentrations of TCDD, the expression levels of the 11 genes were calculated relative to unexposed controls. Statistical methods were applied to these data to test the hypothetical linkages between TCDD-responsive genes and the retinoic acid signal transduction pathway. These tests confirmed strong linkages between the genes included in the hypothetical model.

Epidemiological studies show a strong association between TCDD and lung cancer; the model offers a potential explanation for how TCDD damages the lungs. TCDD appears to activate genes associated with the synthesis of retinoic acid, which--through the retinoic acid signal transduction pathway--turns on nuclear genes that promote cell proliferation and carcinogenesis. Scientists can focus future experiments on particular genes directly related to TCDD-induced tumor progression.

The new statistical tool makes it possible to understand biological pathways in cells, tissues, organs, and whole organisms. It can be expanded to include other relevant data, such as protein levels in cells. These data can be combined with pharmacological models to present a true systems biology approach to quantifying risks from exposure to xenobiotics such as TCDD, suggest the authors. Other researchers can obtain the statistical software by contacting laboratory director Christopher Portier at portier@niehs.nih.gov.

Carol Potera