Parser Available for GenBank Flat File

Published in Probe Volume 1(1-2): Spring-Summer 1991


Robert Read and Matthew Witten
GenTools Project
University of Texas Center for High Performance Computing

A software system available from the GenTools project at the University of Texas Center for High Performance Computing may be of interest to those who need to extract information from the GenBank flat-file format.

The GenTools gbParse program parses GenBank flat-file entries and translates them into a prolog-like language. The software is expected to be useful to persons who cannot gain access to the (undoubtedly superior) relational format of GenBank implemented in the RDBMS Sybase, or to those who wish to write special programs to extract information from the feature tables.

The parser software has been written using the UNIX and Free Software Foundation tools Flex and Yacc (or Bison). A C programmer can easily adapt the source code to produce output in any other required format.

The software is now in -release. It has been tested on a Sun SPARC Station and on a VAX/VMS system. Programmers might like to see the code (grammar) that has been written even if they do not intend to use it, as it represents the most concrete description of the GenBank format, including the feature table.

Although the program translates 99 percent of the GenBank entries, the code is not trouble free, in part because it must deal with actual syntax errors in the distributed flat files. GenTools gbParse has already been used to find numerous syntax errors in the distributed flat files. The program is robust in reporting entry errors.

To obtain the GenTools gbParser software, which includes documentation, send an E-mail request to "GENTOOLS@CHPC.UTEXAS.EDU" (Internet) or contact Robert Read, GenTools Project, UT-Center for High Performance Computing, Balcones Research Center, 1.154 CMS, 10100 North Burnet Road, Austin, Texas 78712. Further information about the GenTools project may be obtained from Dr. Sarah Barron at the same address.