DMPs created within ARS program areas with USDA funding must follow the structure outlined in P&P 630. These DMPs contain 6 distinct sections. We provide guidelines covering typical DMP components below, with examples for many agricultural research domains. You can also view a recorded Creating Data Management Plan webinar outlining these guidelines.
You can download this example of a well-formed DMP (.docx) and modify as needed:
Download and view the same DMP with annotations to explain each section:
Core DMP components
1. Expected Data Types
Describe the types of data you will produce (e.g. digital, non-digital) and how they will be generated (lab work, field work, surveys, etc.).
- You may collect environmental data from real-time sensors, or images from phenocams.
- You may conduct interviews with digital video and audio recordings and subsequent digital transcriptions.
- You may have field notebooks from crop management experiments or field trials that are not "born digital."
- You may generate sequence data for whole genomes or metagenomics.
- During analysis or modeling, you may create customized computer code or scripts for transformation or data cleaning.
Describe the metadata you will generate. Best practices encourage metadata to facilitate wider understanding and re-use of the data. You should record metadata describing the data you have collected for each experiment, and/or each physical sample. Sometimes this metadata is embedded in the files produced by the sensors or sequencing machines.
If you plan to re-use raw or processed data from other studies, name the anticipated sources.
2. Data Formats and Standards
Describe the data formats (e.g. csv, pdf, doc) for both raw and processed data you will produce. Note any plans to digitize data created in a non-digital format. Most U.S. funding agencies require machine-readable formats where possible.
- Written material formats such as Microsoft Word doc and LaTex files, with TXT being most machine-readable
- Spreadsheets such as Microsoft Excel, with CSV files being most machine-readable
- Curriculum or instructional material such as Microsoft PowerPoint
- Digital image formats such as TIFF, JPEG
- Digital video formats such as MPEG, MOV
- Databases such as MySQL, Microsoft Access
- Code/Software such as Matlab m files, R scripts
- Other/Specialized data formats such as fasta, shp, kmz
- Specimen or observation data in Darwin Core
- GIS map layers in a variety of formats
- Results of Application Programming Interface (API) or web service calls in JSON or XML
- Find a comprehensive list of file formats at https://www.fileformat.info/format/all.htm
Describe the standards or schemas that will be used to structure, store, and share the data and metadata. We strongly encourage community-recognized and non-proprietary standards for maximum interoperability and reusability. Name and link to any published data dictionaries, data standards, or ontologies that you are using. For example:
- The ICASA Master Variable list, a naming convention for agricultural model variables
- Gene Ontology, the world's largest source of information on the functions of genes
- Integrated Taxonomic Information System, authoritative taxonomic information on plants, animals, fungi, and microbes of North America and the world
- ISO 19115, the required metadata standard for all USDA geospatial data
If you plan to deposit data in an existing database or repository, refer to their data and metadata standards. For instance, data deposited in Ag Data Commons is described with metadata that conforms to the DataCite metadata standard, as well as Project Open Data metadata for records forwarded to data.gov. You may require additional metadata to describe your research. We strongly encourage depositing data in subject-specific databases that follow community-recognized metadata standards.
3. Data Storage and Preservation of Access
Describe provisions for depositing data in a trusted/certified long-term preservation and archiving environment. This includes your plan for backups, cloud storage, access protocols, obsolescence avoidance, data migration strategy, persistent identifiers, etc.
Describe where data will be stored during and after the life of the project. Name specific workspaces and repositories as appropriate. For example:
- You may initially manage data on local or network hard drives and then transfer it to a repository for long-term access and preservation.
- You may maintain data on a high-speed computing platform such as SCINet or CyVerse or on a shared workspace like Open Science Framework during analysis.
- You may deposit data in a subject-specific repository (e.g. NCBI for genomics data, AgCROS for geospatial data) or an institutional repository (e.g. Purdue University Research Repository).
- You may maintain data using your own infrastructure beyond the life of the project.
We strongly encourage depositing data in discipline-specific databases or repositories that follow community-recognized metadata standards. For USDA-funded data without a discipline-specific repository, the NAL maintains the Ag Data Commons as a generalist ag repository. This enables the USDA's compliance with both public access and open data requirements to make federally funded research data open, accessible, and machine-readable. Data stored in the Ag Data Commons will receive a DOI (digital object identifier) for persistent access.
Describe the technical infrastructure and staff expertise.
Describe the plans for long-term preservation. Items to cover include:
- Amount and size of data expected to be archived for both short- and long-term. Ideally this includes raw data and/or minimally processed data (e.g., with quality control)
- The planned retention period for the data
- Strategies, tools, and contingency plans to avoid data loss, degradation, or damage
4. Data Sharing and Public Access
Describe your data access and sharing procedures both during the project and after the data collection process is complete, as well as plans for publication or public release. Name specific repositories, databases, and catalogs as appropriate. Many repositories for storage and preservation also offer public access functionality (e.g. the Ag Data Commons).
Explain any restrictions, embargo periods, license, or public access level that apply to the data (see Project Open Data for more information). Data generated by federal employees should carry either US Public Domain or Creative Commons CCZero status, while federally funded data and non-federal data may vary depending on funder requirements. Find license definitions and additional information at opendefinition.org.
The USDA strongly discourages limiting distribution of data to project or personal websites only. Similarly, in most cases it is insufficient to make data available only on request. The USDA prefers researchers deposit data in a trusted/certified long-term preservation and archiving environment.
Other items to cover include:
- Specify plans to create a catalog record for publicly available datasets in the Ag Data Commons if funded by USDA, regardless of where you plan to publish datasets.
- Outline restrictions such as copyright, proprietary and company secrets, confidentiality, patent, appropriate credit, disclaimers, or conditions for use of the data.
- Indicate how you will ensure that appropriate funding project numbers (e.g. CRIS numbers such as NIFA award numbers or ARS project numbers) will be cited with the data.
5. Roles and Responsibilities
Describe information about project team members and tasks associated with data management activities over the course of the project.
- Note who will primarily ensure DMP implementation.
- This is particularly important for multi-investigator and multi-institutional projects. This may consist of a named data manager or responsibility via role.
- Define key roles of the DMP team.
- Appropriate for larger scale projects – identify who will do which tasks.
- Provide a contingency plan in case key personnel leave the project.
- For example, if data are managed individually or collaboratively on a platform such as ARS SCINet, and an investigator leaves, note who becomes responsible for the data.
- Describe what resources are needed to carry out the DMP.
- If the DMP execution requires funds, add them to the budget request and budget narrative. Projects must budget for sufficient resources to implement the proposed DMP. For example, there may be data publication charges, data storage charges, or salary for data managers.
6. Monitoring and reporting
Include information about how the researcher plans to monitor and report on implementation of the DMP during and after the project, as required by funder. This may include progress in data sharing (publications, database, software, etc.).