BACKGROUND: Environmental quality assessment is traditionally based on responses of reproduction and survival of indicator organisms. For soil assessment the springtail Folsomia candida (Collembola) is an accepted standard test organism. We argue that environmental quality assessment using gene expression profiles of indicator organisms exposed to test substrates is more sensitive, more toxicant specific and significantly faster than current risk assessment methods. To apply this species as a genomic model for soil quality testing we conducted an EST sequencing project and developed an online database.
DESCRIPTION: Collembase is a web-accessible database comprising springtail (F. candida) genomic data. Presently, the database contains information on 8686 ESTs that are assembled into 5952 unique gene objects. Of those gene objects approximately 40% showed homology to other protein sequences available in GenBank (blastx analysis; non-redundant (nr) database; expect-value < 10-5). Software was applied to infer protein sequences. The putative peptides, which had an average length of 115 amino-acids (ranging between 23 and 440) were annotated with Gene Ontology (GO) terms. In total 1025 peptides (approximately 17% of the gene objects) were assigned at least one GO term (expect-value < 10-25). Within Collembase searches can be conducted based on BLAST and GO annotation, cluster name or using a BLAST server. The system furthermore enables easy sequence retrieval for functional genomic and Quantitative-PCR experiments. Sequences are submitted to GenBank (Accession numbers: EV473060 - EV481745).
CONCLUSION: Collembase http://www.collembase.org is a resource of sequence data on the springtail F. candida. The information within the database will be linked to a custom made microarray, based on the Agilent platform, which can be applied for soil quality testing. In addition, Collembase supplies information that is valuable for related scientific disciplines such as molecular ecology, ecogenomics, molecular evolution and phylogenetics.
- Computational Biology
- Databases as Topic
- Databases, Nucleic Acid
- Environmental Monitoring
- Expressed Sequence Tags
- Gene Expression Profiling
- Information Storage and Retrieval
- Molecular Sequence Data
- Sequence Analysis, DNA
- Soil Pollutants
- Journal Article
- Research Support, Non-U.S. Gov't