Skip to main content
  • Poster presentation
  • Published:

A complete phylogenetic analysis coupling expression data from EST databases. An example with a family of genes: the peptidyl arginine deiminase genes


For functional annotation, similarity-based approaches [1] do not take into account all the information from comparative and evolutionary biology. They do not differentiate between orthologs and paralogs among homologs and, furthermore, the closest BLAST is often not the nearest neighbour [2]. Phylogenetic approaches taking into account duplication and speciation events are necessary to solve these problems. But they do not blend any data from transcriptional behaviour. Nevertheless, orthologs can have very similar 'molecular function' but undergo a different 'macroscopic function' because of a transcriptional shift.

Growing data for gene expression profiling are available in various databases concerning normal or pathological tissues (Expressed Sequence Tags [ESTs] from NR, TIGR, GeneNote, Gepis, etc.). Some works recently examined the correlation between evolution (duplication and speciation) of genes and expression divergence within and between species [3, 4], and some examine the expression profile between orthologous genes in sequenced species [5].


We performed a phylogenetic analysis of a protein family, using EST databases. This allowed us to enlarge the dataset of species containing homologs and consequently to improve the reconstruction of the genes' evolutionary history. We then extracted all the transcriptional data contained in EST databases, to decipher the gene expression pattern. Because gene annotation is currently labour intensive, we used a locally developed platform dedicated to phylogenetic annotation (named FIGENIX) [6]. We validated this approach on a family of genes possibly implied in rheumatoid arthritis; the peptidyl arginine deiminase (PADI) genes.


We show here a phylogenetic annotation with an enlarged dataset including EST contigs and expression data. It allowed us to integrate more functional data for analysis of a set of genes and permits us to give a transcriptional footprint of the gene. Our analysis showed that the PADI-2 paralog group have kept the ancestral molecular function coupled with a probable ancestral expression profile. These classified data permitted us to perform an updated footprint of the transcriptional data for each paralog group from this protein family.


We believe this method announces a new way to annotate uncharacterized ESTs. More than classical phylogeny, it allows highlighting of the transcriptional shift between paralogs, and is thus a good tool to improve annotation. It showed that functional shift can occur in differential tissue expression rather than in biochemical function of the protein.

This method of analysis is at its beginning and has to be extended to all kinds of expression database, including databases where expression data are normalized such as UniGene. In the future it cannot be ignored in annotating new unknown ESTs, underlined by DNA microarray assays for example.


  1. Altschul SF: Gapped BLAST and PSI-BLAST, et al: L a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  2. Koski LB, Golding GB: The closest BLAST hit is often not the nearest neighbor. J Mol Evol. 2001, 52: 540-542.

    Article  CAS  PubMed  Google Scholar 

  3. Gu Z, et al: Duplicate genes increase gene expression diversity within and between species. Nat Genet. 2004, 36: 577-579. 10.1038/ng1355.

    Article  CAS  PubMed  Google Scholar 

  4. Huminiecki L, Wolfe KH: Divergence of spatial gene expression profiles following species-specific gene duplications in human and mouse. Genome Res. 2004, 14: 1870-1879. 10.1101/gr.2705204.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  5. Yanai I, et al: Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics. 2004,

    Google Scholar 

  6. Gouret P, et al: 'Intelligent' automation of genomics annotation: expertise integration in a new software platform, Figenix. Genome Res. 2004,

    Google Scholar 

Download references


This work is supported by the French Society of Rheumatology (SFR).

Author information

Authors and Affiliations


Rights and permissions

Reprints and permissions

About this article

Cite this article

Balandraud, N., Gouret, P., Danchin, E. et al. A complete phylogenetic analysis coupling expression data from EST databases. An example with a family of genes: the peptidyl arginine deiminase genes. Arthritis Res Ther 7 (Suppl 1), P34 (2005).

Download citation

  • Received:

  • Published:

  • DOI: