Codon Usage and Splice-Site Tables Available From AAtDB

Published in Probe Volume 2(3): Fall 1992


Dr. J. Michael Cherry and Dr. Sam Cartinhour Department of Molecular Biology
Massachusetts General Hospital
and Department of Genetics
Harvard Medical School
Boston, MA

The ACeDB software used by the Arabidopsis thaliana database, AAtDB, includes several DNA and protein analysis utilities. Shown here are two examples, which were produced by the ACeDB software.

The utilities make use of the GenBank/EMBL features to identify coding regions and splice-site junctions. The software does not search sequences for these features. Both the splice-site table and the codon usage table can be determined either for all Arabidopsis sequences in the database or any user defined subset. As new sequences are added, it becomes an easy matter to recalculate the tables.

The RNA splice-site consensus utility uses the GenBank/EMBL features table entries to identify exon-intron junctions. The utility tabulates the results. The example provided in Table 1 was produced using all the Arabidopsis sequences currently found in AAtDB. The table is a frequency distribution. To find the most probable 5' exon-intron sequence, for example, simply find the highest frequency nucleotide for each position. In this example, the consensus sequence is: AAAG | GTAAGTT.

The codon usage table shown in Table 2 was also produced using all Arabidopsis sequences currently in AAtDB. The ACeDB utility relies in this case on the GenBank/EMBL feature tables to identify the protein coding regions. The results are then tabulated. (See Table 2.)