Methods used in Functional Genomics and Systems Biology

Download a PDF

Systems biology is a comprehensive quantitative analysis utilized to understand in which way all the components of a biological unit interact functionally over time. The joint behavior of a set of genes o phenomena within a system allow to identify significant changes where the variations are not significantly different. It is only in the coherent behavior of an inferior level, associated with a higher-level entity that makes a pattern evident.

In our lab, in collaboration with the Columbia Genome Center (MAGNet - Multiscale Analysis of Genomic and Cellular Networks) we are currently trying to understand the functioning of the human body during health and disease by applying the systems biological approach.

Wide genome array technology is being adopted rapidly by the medical community. After identifying a medical problem by methods of outcomes research, translational research, including gene expression methods may help in problem solving. Excellent reviews have been published during the past years and we have linked them below so they can be easily retrieved.

The first step is an appropriate study design with an adequate number of biological (different 'biological' cases) and technical (same case repeated one or more times) replicates. It seems that a minimum of 5 biological cases per experiment seems to be adequate for experiments comparing two different groups. Of great importance is the minimization of the systematic 'methodological' error which is highly related to different conditions of processing the samples, for example on different days or by different operators.

Evaluation of the quality of the arrays by an experienced  researcher is important in the preprocessing phase of the experiment which include image analysis, normalization and transformation of the data.

Once the data is prepared, different analytical algorithms that have been developed in "user-friendly" interphases can be helpful to extract conclusions. One of the challenges is to identify which genes become differentially regulated in relation to a disease or condition and to understand the meaning of the findings as the researcher will probably be faced to hundreds or thousands of genes that become differentially regulated. For this purpose, several tools are being continuously developed by several expert multidisciplinary groups.

As elegantly described by Eisen et al., Hierarchical clustering is a method to Represent complex gene expression data by statistical organization and graphical display. By using this approach, relationships among objects (genes) are represented by a tree whose branch lengths reflect the degree of similarity between the objects. The computed trees can be used to order genes so that genes or groups of genes with similar expression patterns are adjacent and can then be displayed graphically. Genes with similar expression patterns are likely to represent similar biological processes.

Widely accepted is "SAM" ('Significance Analysis of Microarrays'), a software package developed at Stanford University. Detailed information can be found on the SAM-Stanford website. Of high interest for the development of gene classifiers is the "PAM" ('Prediction Analysis of Microarrays') algorithm, useful and powerful to identify a small set of genes that are highly correlated with certain biological or pathological processes and use these genes for developing tools that allows screening and prevention of a given disease.

The understanding of the biology, the real meaning of the findings is a major challenge. The Gene Ontology (GO) project addressed this problem and developed a three structured, controlled vocabulary (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner grouping genes according to the cellular process, compartments or biological function grouping them into well defined GO terms, but again it become difficult to extract conclusions when dealing with thousands of genes at the same time. The Genomics and Bioinformatics group at the NIH, took a leading role in developing tools that facilitate the extraction of accurate information in a batch processing scale. Highthroughput GoMiner website can be accessed from our links page and the reference below.

Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether a set of genes shows concordant differences between two biological states or phenotypes. The method focuses on gene sets groups of genes that share common biological function, chromosomal location or regulation.

The reconstruction of cellular networks using reverse engineering algorithms is a field which is constantly evolving. The Columbia Genome Center (MAGNet - Multiscale Analysis of Genomic and Cellular Networks) is taking a leading role in the development of this field under the direction of Dr. Andrea Califano. Dr. Califano and his group published an Algorithm for the Reconstruction of Accurate Cellular Netowrks. We believe that the application of these systems biology approach to the understanding of complex biological systems in health and disease is highly relevant and promissory towards the development of a predictive, preventative and personalized approach in heart transplantation medicine.

GeneWays is a system developed by Dr. Andrey Rzhetsky at Columbia University for automatically extracting, analyzing, visualizing and integrating molecular pathway data from the research literature focusing on interactions between molecular substances and actions that can be graphically displayed.

 

Links and bibliography of interest

 

Reviews

Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus.Nat Rev Genet. 2006 Jan;7(1):55-65. Review. Erratum in: Nat Rev Genet. 2006 May;7(5):406.

Segal E, Friedman N, Kaminski N, Regev A, Koller D. From signatures to models: understanding cancer using microarrays. Nat Genet. 2005 Jun;37 Suppl:S38-45.

Barabasi AL, Oltvai ZN. Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004 Feb;5(2):101-13

Holloway AJ, van Laar RK, Tothill RW, Bowtell DD. Options available--from start to finish--for obtaining data from DNA microarrays II. Nat Genet. 2002 Dec;32 Suppl:481-9. Review.

Churchill GA. Fundamentals of experimental design for cDNA microarrays. Nat Genet. 2002 Dec;32 Suppl:490-5.

Quackenbush J. Microarray data normalization and transformation.
Nat Genet. 2002 Dec;32 Suppl:496-501.

Slonim DK. From patterns to pathways: gene expression data analysis comes of age.
Nat Genet. 2002 Dec;32 Suppl:502-8.

Yang YH, Speed T. Design issues for cDNA microarray experiments. Nat Rev Genet. 2002 Aug;3(8):579-88.

Lockhart DJ, Winzeler EA. Genomics, gene expression and DNA arrays. Nature. 2000 Jun 15;405(6788):827-36.

 

Methods

Zeeberg BR, et al. High-Throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID). BMC Bioinformatics. 2005 Jul 5;6:168.

Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A. Reverse engineering of regulatory networks in human B cells.Nat Genet. 2005 Apr;37(4):382-90.

Rzhetsky A, et al. GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. J Biomed Inform. 2004 Feb;37(1):43-53.

Tibshirani R, Hastie T, Narasimhan B, Chu G. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A. 2002 May 14;99(10):6567-72.

Tusher VG, Tibshirani R, Chu G. ignificance analysis of microarrays applied to the ionizing radiation response.Proc Natl Acad Sci U S A. 2001 Apr 24;98(9):5116-21.

Alter O, Brown PO, Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A. 2000 Aug 29;97(18):10101-6.

Golub TR, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999 Oct 15;286(5439):531-7.

Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863-8.

Weinstein JN, et al. An information-intensive approach to the molecular pharmacology of cancer. Science. 1997 Jan 17;275(5298):343-9.

 

Open source platforms and software for academic use

GeWorkbench (genomics Workbench) is a Java-based open-source platform for integrated genomics. GeWorkbench is the Bioinformatics platform of MAGNet, the National Center for the Multi-scale Analysis of Genomic and Cellular Networks.

Genesis Institute for Genomics and Bioinformatics, by Alexander Sturn at Graz University of technology and is available free of charge to academic, government, and other nonprofit institutions for noncommercial, nonprofit internal research purposes.

SAM 'Significance Analysis of Microarrays' (Stanford University)

PAM 'Prediction Analysis of Microarrays' (Stanford University)

HTGM 'High-Throughput GoMiner' web interface (Genomics and Bioinformatics Group, National Cancer Institute)

GSEA 'Gene Set Enrichment Analysis' (Broad Institute, Massachusetts Institute of Technology)

 

**The information content of this page is only intended to provide easy access to information for the non experienced reader. Several papers and software sharing similar quality or characteristics may have been unintentionally omitted. We welcome your feedback and suggestions.

 

HOME ABOUT USPATIENT CARERESEARCHEDUCATIONCOLLABORATIONLINKS

Cardiac Transplantation Research - Division of Cardiology - Department of Medicine - CUMC - NYPH - Columbia University - New York City

This site was last updated 05/12/06