You are here

Guidelines for Data Management Planning

Introduction

Data are valuable and often unique assets that should be properly managed in order to be accessible, understandable, and re-usable into the future. These guidelines follow US Federal public access and open data directives and also comply with a broad range of current funding agency requirements for Data Management Plans (DMPs).

Guidelines covering typical DMP components are given below, with examples for many agricultural research domains, followed by information on where to find sample DMPs.

The National Agricultural Library (NAL) offers consultations on Data Management Planning. You may prepare a draft before contacting NAL for assistance at NAL-ADC-Curator@ars.usda.gov. You are responsible for ensuring that your plan meets requirements of your funding agency before submitting it to them. Researchers should determine if a DMP specific to their organization or previous projects already exists before creating a new one. While not all funding programs require DMPs, they are usually a good idea as part of project planning.

Also note that most research data generated with USDA funding will be required to be catalogued in the Ag Data Commons (https://data.nal.usda.gov) under the USDA Public Access Implementation Plan.

Core DMP Components

  1. Expected Data Types

Describe the type of data (e.g. digital, non-digital) and how they will be generated (lab work, field work, surveys, etc.). What kind of metadata will be generated? Metadata is encouraged to facilitate wider understanding and re-use of the data.

For example, you may be collecting environmental data from real time sensors, or images from phenocams. You may be conducting interviews with digital video and audio recordings and subsequent digital transcriptions. You may have field notebooks from crop management experiments or field trials that are not "born digital." You may be generating sequence data for whole genomes or metagenomics. In the course of analysis or modeling, you may be creating customized computer code or scripts for transformation or data cleaning. Metadata describing the data you have collected should be recorded for each experiment, each physical sample, or be embedded in the files produced by the sensors or sequencing machines.

Describe whether raw or processed data will be re-used from other studies and name the anticipated sources.

  1. Data Formats and Standards

Describe the data formats (e.g. csv, pdf, doc) for both raw and processed data. If they are in a non-digital format, are there plans to digitize the data? Use of machine-readable formats is strongly encouraged and will soon be required by most US federal funding agencies.

What standards/schemas will be used to structure or store or share the data and metadata? Community-recognized and non-proprietary standards are strongly encouraged. Name and link to any published data dictionaries, data standards, or ontologies that you are using (e.g., the ICASA Master Variable list, Gene Ontology, or Integrated Taxonomic Information System). If data will be deposited in professional databases or repositories, refer to their data and metadata standards.

For example:

  • Written material formats such as Microsoft Word doc and LaTex files, with TXT being most machine readable
  • Spreadsheets such as Microsoft Excel, with CSV files being most machine readable
  • Curriculum or instructional material such as Microsoft PowerPoint
  • Digital image formats such as TIFF, JPEG
  • Digital video formats such as MPEG, MOV
  • Databases such as MySQL, Microsoft Access
  • Code/Software such as Matlab m files, R scripts
  • Other/Specialized data formats such as fasta, shp, kmz
  • Specimen or observation data in Darwin Core
  • GIS map layers in a variety of formats
  • Results of Application Programming Interface (API) or web service calls in JSON or XML
  • All data registered in Ag Data Commons will be described with Project Open Data metadata , though there may also be additional metadata for your research.
  1. Data Storage and Preservation of Access

Provisions for depositing in a trusted/certified long-term preservation and archiving environment; backups, cloud storage, access protocols, obsolescence avoidance, data migration strategy, etc.

  • Where will the data be stored during and after the life of the project?  Name specific workspaces and repositories as appropriate. For example, you may initially manage data on local or network hard drives and then transfer it to a repository such as Ag Data Commons for long-term access and preservation. You may maintain it on a high-speed computing platform such as SCINet or CyVerse or on a shared workspace like Open Science Framework during analysis. You may deposit it in Dryad or an institutional repository (e.g. Purdue University Research Repository) or you may maintain it using your own infrastructure beyond the life of the project.
  • What is the technical infrastructure and staff expertise?
  • What are the plans for long-term preservation?
    • Approximately how much data are expected to be archived? Ideally this includes raw data and/or minimally-processed data (e.g., with quality control)
    • What is the planned retention period for the data?
    • Outline strategies, tools, and contingency plans that will be used to avoid data loss, degradation, or damage.
  1. Data Sharing and Public Access

Explanations of any restrictions, embargo periods, license, or public access level (see Project Open Data for more information). Data generated by federal employees has either US Public Domain or Creative Commons Zero status, while federally-funded data and non-federal data may vary depending on funder requirements. License definitions and additional information can be found at opendefinition.org.

  • Describe your data access and sharing procedures during and after the data collection process (e.g., publication or public release). As above, name specific repositories, databases, and catalogs as appropriate.  For example, data may be shared by publishing on Dryad or in a genomic database, but should also be catalogued in the Ag Data Commons if funded by USDA.
  • Outline restrictions such as copyright, proprietary and company secrets, confidentiality, patent, appropriate credit, disclaimers, or conditions for use of the data.
  • Limiting distribution of data to project or personal websites only is strongly discouraged. Similarly, in most cases it is insufficient to make data available only on request.  Depositing data in a trusted/certified long-term preservation and archiving environment is preferred.
  • Indicate how you will ensure that appropriate funding project numbers (e.g. CRIS numbers such as NIFA award numbers or ARS project numbers) will be cited with the data.
  1. Roles and Responsibilities

Information about project team members and tasks associated with data management activities over the course of the project.

  • Who will primarily ensure DMP implementation? This is particularly important for multi-investigator and multi-institutional projects. Is there a named data manager?
  • Define key roles of the DMP team (appropriate for larger scale projects – identify who will do which tasks).
  • Provide a contingency plan in case key personnel leave the project. For example, if data are managed individually or collaboratively on a platform such as ARS SCINet, and an investigator leaves, who becomes responsible for the data?
  • What resources are needed to carry out the DMP? If funds are needed, have they been added to the budget request and budget narrative? Projects must budget for sufficient resources to implement the proposed DMP. For example, there may be data publication charges, data storage charges, or salary for data managers.
  1. Monitoring and reporting

Information on how the researcher plans to monitor and report on implementation of the DMP during and after the project, as required by funder; this may include progress in data sharing (publications, database, software, etc).

Where to Find Sample DMPs

DMPTool (https://dmptool.org/) is an online resource maintained by the University of California Curation Center of the California Digital Library with example DMPs from across a range of scientific disciplines and funding agencies. The site hosts USDA-NIFA templates as well as sample DMPs from NIFA applications.

If you have a DMP you would like to share as a sample for others, please contact us at NAL-ADC-Curator@ars.usda.gov.