Projects
The Center for Evolutionary Functional Genomics has identified several core research areas in which it strives for excellence: comparative genomics, computational developmental biology, and software and database development.
Comparative Genomics
This research seeks to:
- Understand disease-associated mutations in the context of the long term evolutionary history of disease genes.
- Investigate the relationship between rates of molecular evolution and patterns of gene expression.
- Infer molecular timescales of vertebrate evolution based on protein clocks.
Coupling software and database technologies with evolutionary analysis techniques, researchers at the Center are seeking to understand human disease mutations through the lens of molecular evolution. Recent findings in this effort include the discovery that disease mutations are found to be statistically overabundant in conserved domains and underrepresented in variable regions. This discovery suggests that there is a non-additive influence of protein site conservation on the intragenic distribution of disease mutations. The importance of such findings underscores the merit of large-scale inquiry into patterns of neutral amino acid substitutions over the course of evolution.
The ability to construct a reliable timescale of vertebrate evolution has utility beyond the study of vertebrate species divergence, offering insights to the fields of developmental biology, archaeology, palaeontology and physiology. The Center researchers have developed robust methods for inferring a timescale of vertebrate evolution involving the evolutionary analysis of genomic sequence data to estimate rates of molecular and morphological change in light of patterns of macroevolution and biogeography. Molecular clock techniques employed by our researchers find that multigene divergence times for several large orders of mammals coincide closely with fossil-based estimates of species divergence.
Computational Developmental Biology
This research seeks to:
- Create a web-based resource enabling image-based query of embryonic in situ gene expression patterns (Fruitfly).
- Build informatics frameworks for the automated analysis of gene expression pattern data for other model organisms.
The elucidation of gene regulatory networks and developmental pathways from in situ embryonic image data is a powerful tool for developmental biologists. Advances in image acquisition methods and technologies have vastly accelerated the production of embryonic image data, far exceeding the developmental biologist’s ability to analyze the collection of available data.
The Center researchers are addressing this disparity in the development of FlyExpress, a web-based resource for the large-scale digital analysis of embryonic image data in Drosophila. This project allows developmental biologists all over the world to access tens of thousands of images of the gene expression patterns of developing fruitfly embryos. It is the only repository of fly image data and is available on the web for free.
In providing a means for the rapid analysis of tens of thousands of image-derived gene expression patterns, FlyExpress helps developmental biologists identify the cells in which a gene or genes are expressed (spatial regulation) and the developmental stage at which a gene or genes are expressed (temporal regulation). The characterization of these regulatory networks will eventually allow developmental biologists to understand the pathways involved in the development of a fertilized egg into a complex, multi-celled adult organism.
Our researchers hope to use the algorithms, statistical methods, and bioinformatics tools developed in the FlyExpress effort to develop a general informatics framework that can be employed in the development of similar, analytical resources for other model and non-model organisms.
Software and Database Development
Software and Database Development
This research seeks to:
- Develop MEGA, a sophisticated, user-friendly tool for comparative analysis of DNA and protein sequences.
- Create a database of functionally important differences in paralogous genes in vertebrates.
- Develop a web resource containing molecular and fossil timescales of vertebrate evolution.
Bioinformatics software tools are traditionally difficult to use, often taking the form of command-line utilities with cryptic file formats. In an effort to counter this trend and to provide biologists with an easy-to-use tool for comparative sequence analysis, Dr. Sudhir Kumar developed Molecular Evolutionary Genetic Analysis (MEGA) in 1993. Now in its third major release, MEGA has become one of the most popular and most highly cited bioinformatics tools available. Designed for exploring and analyzing aligned DNA or protein sequences from an evolutionary perspective, this software package has had more than 25,000 unique downloads to date.
![]()
MEGA 3 is integrated software that facilitates multiple aspects of large-scale, comparative sequence analysis through an intuitive, Graphical User Interface (GUI). It makes useful methods of comparative sequence analysis easily accessible to the scientific community for research and education. It provides several advanced visualization modules for the visual management of sequence data, web-based data mining, phylogenetic tree construction and analysis, distance matrix visualization, and a rich integrated help viewer.
In addition to advanced visualization, MEGA 3 provides a comprehensive repertoire of computational and statistical modules for the analysis of nucleotide and amino acid sequence data. This repertoire includes methods for evolutionary distance estimation that allow for the relaxation of the homogeneity assumption; methods for phylogenetic tree construction such as Unlimited Pair Group Method with Arithmetic Mean (UPGMA), Neighbour-Joining, Minimum Evolution, and Maximum Parsimony; methods for testing the molecular clock hypothesis; methods for studying positive selection and conducting tests of neutrality; and a robust sequence alignment method based on the ClustalW algorithm.
MEGA 3 is written by researchers for researchers and is designed to reduce the amount of time needed for mundane non-technical tasks in data analysis. MEGA3 comes with on-line help showing how to use different aspects of its user interface. Extensive details of statistical and computational methods available in MEGA3 are presented in the book "Molecular Evolution and Phylogenetics" (Nei and Kumar, Oxford University Press, 2000).
MEGA 3 is provided to the research community free of charge, and the latest version can be downloaded from the MEGA website (http://www.megasoftware.net).
Researchers at the Center have also recently found a way to mark evolution using molecular timescales. This is the Timescale Website, which provides an intuitive query interface to the Timescale Database, a comprehensive resource for retrieving information on the time of divergence of species.
The query interface allows a person to search for molecular divergence time estimates for a specified pair of taxa. The system is capable of handling common and scientific taxa names and can account for simple spelling mistakes. Upon receiving a valid query for a pair of taxa, the Timescale system determines the most inclusive taxonomic groups for the supplied taxa and displays all available molecular records that detail time estimates between members of these groups. In addition, the Timescale database may be queried by author name with the results of this query containing all molecular time estimates associated with the query author.

