Main Menu

Research

  Meetings/
Links

Virtual Conference
Participants

Past Events

Virtual Center

MINING OF MICROARRAY DATA

Advances in computer science and molecular biology have led to an exponential growth of genomic information. Microarrays are becoming an important tool for the analysis of gene expression and interaction in different signal transduction pathways. However, the rapid progress made in large-scale gene expression analysis standardization, improvement on microarray technologies (cDNA, oligonucleotides and protein microarrays), and the increasing accessibility to microarray chips and hardware has motivated a growing consensus for the need of public repositories analogous to the GenBank/EMBL/DDBJ. While microarray images and data repositories can be easy implemented for specie or genera, implementation of gene expression classification tools to achieve a holistic understanding of the biological processes using all the information the repository present new challenges.

The current limitation in the databases format raises not only numerous statistical and computational challenges. The lack of integration of data generated by different research disciplines represent an incomplete view of the biological process. While individual databases can be relatively easy implemented, it remains elusive how to develop inter-operative databases that can facilitate a holistic approach of the biological process, especially when the size of the data growth rapidly to become a data stream. Our research is focus in the use of Peano count tree (P-tree) as a new gene expression classifier. Red/green reflectance values from each spot in a microarray image are converted into an 8-bit bSQ file format. Each bit file then is converted in a quadrant base tree structure called peano count tree (P-tree) from which a data cube is constructed. Since P-trees are generated quickly, they can be considered as "data mining ready" technique.

We are proposing a multiinstitutional and multidisciplinary collaboration with the objective to identify the groups of genes that may play an important role in all organisms and those genes that make a specie unique regarding the response to hypoxic and anoxic stress. By performing microarray experiments and using the results to validate a virtual chip, we will not only be able to evaluate its accuracy, but also to optimize the training and improve the prediction efficiency. As this process develops, we will establish which important genes are common and which are different among species and/or genotypes to deal with hypoxic, anoxic and re-oxygenation stress. The data will be accessible and exchanged in the XML Microarray Markup Language (MAML) and will provide the minimum information about a microarray experiments (MIAME) proposed by the Microarray Gene Expression Database working group.