Plant Genome Database-Update

Published in Probe Volume 1(1-2): Spring-Summer 1991


Douglas Bigwood, Database Manager
Plant Genome Data and Information Center
National Agricultural Library, USDA

Providing users with fast, easy access to plant genome mapping and related information is a primary goal of USDA's Plant Genome Research Program. Currently plans are underway to develop a plant genome database system at NAL's Plant Genome Data and Information Center (PGDIC). The Center's database manager will direct the implementation of the plant genome database, which will contain public plant genome information for four agricultural species-- maize, soybean, wheat, and loblolly pine. In addition, procedures will be implemented to ensure that the information provided is up to date.

Project Activities

Initial activities of the database project include site visits by PGDIC staff to several institutions also involved in developing genome information systems. Institutions visited include the National Center for Biotechnology Information at the National Library of Medicine (GenInfo Backbone), the Los Alamos National Laboratory (GenBank), the Lawrence Berkeley Laboratory (Chromosome Information System), the Welch Library at Johns Hopkins University (Genome Data Base), the Massachusetts General Hospital (Arabidopsis mapping project), and Agrigenetics (commercial breeding projects). Center staff have benefitted from the wealth of knowledge and experience provided by these groups. Hopefully, as a result of this information sharing, some of the pitfalls and problems faced by the other institutions can be avoided in the USDA project.

The Center has also been active in two CODATA projects: Biological Macromolecules (seeking to improve coordination among institutions that compile protein and DNA sequence data) and Standardized Terminology for Access to Biological Data Banks (headed by Lois Blaine, whose article appears elsewhere in the newsletter). CODATA is an interdisciplinary Scientific Committee of the International Council of Scientific Unions that seeks to improve the quality, reliability, management, and accessibility of data important to all fields of science and technology.

Species Groups

The task of collecting and evaluating the data that will comprise the plant genome database system is the responsibility of the principal investigators for the four plant species and their advisory committees. The principal investigators are Frank Greene and Olin Anderson (wheat), David Neal (pine), Ed Coe (maize), and Randy Shoemaker (soybean). Each of the four groups will have their own database requirements. Cooperators in the project have made a concerted effort to ensure that all database-related activities are performed in a coordinated manner. The ultimate goal is to provide a master database design that is as generic as possible. If this goal is achieved--and efforts so far are encouraging--data from a number of additional species may be easily incorporated in the database in the future. Furthermore, plans are to develop an open system so USDA's database can forge data links with related data sources such as GenBank, AGRICOLA, and the Germplasm Resources Information Network (GRIN).

Future Plans

By the time this newsletter is printed, the first meeting of the PGDIC advisory committee will have been held. Composed of genetic and information experts, the committee is expected to be valuable in ensuring that the plant genome database is the best possible resource for users.

PGDIC staff are also establishing computer and communication systems. Initial development will be performed on Unix workstations using the Sybase relational database management system. These have essentially become de facto standards in the genome community. The database system's major network access will be through Internet via a T1 line.

The database analysis and design is proceeding as planned. Implementation will begin in the near future.