An official website of the United States government.

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Data Management Glossary

This glossary provides definitions of key terms related to scientific/technical data curation and management as broadly adopted in the research data and repository community. The authoritative sources cited are gratefully acknowledged.

A

access
The ability for a user to view and interact with data stored on a computer or computer system. (abc-clio/ODLIS)
access level
see: public access level
administrative metadata
(= access and use metadata)

Provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. There are several subsets of administrative data; two that sometimes are listed as separate metadata types are: Rights management metadata, which deals with intellectual property rights, and Preservation metadata, which contains information needed to archive and preserve a resource. (LTER)

API
(= application programming interface)

A set of software instructions and standards that allows machine to machine communication - like when a website uses a widget to share a link on Twitter or Facebook. (NALT)

author
(= creator)

The main researchers involved in producing the data, or the authors of the publication, in priority order. May include those responsible for software creation. (DataCite Metadata Schema)

B

big data
An accumulation of data that is too large and complex for processing by traditional database management tools. (Definition of BIG DATA, 2020)

C

catalog
see: data catalog
catalog record
see: metadata record
citation
Support for the ability to establish provenance and attribute credit to research data sources, which allows for easier access to research data within journals and on the Internet. (CODATA-ICSTI; NNLM Data Thesaurus)
collection
A grouping of science data that all come from the same source, such as a modeling group or institution. Series/collections have information that is common across all the datasets/granules they contain. (EOSDIS)
contact name
(= ContactPerson; Responsible Party)

Person with knowledge of how to access, troubleshoot, or otherwise field issues related to the resource. (DataCite Metadata Schema)

controlled vocabulary
see: vocabulary
CSV
A standard format for spreadsheet data. Data is represented in a plain text file, with each data row on a new line and commas separating the values on each row. As a very simple open format it is easy to consume and is widely used for publishing open data. (Open Data Handbook)
curator
(= data curator)

Person tasked with reviewing, enhancing, cleaning, or standardizing metadata and the associated data submitted for storage, use, and maintenance within a data centre or repository. (DataCite Metadata Schema, DataCurator)

D

data catalog
(= catalog)

A searchable and browsable online collection of data sets. A data catalog informs customers about available data sets and metadata around a topic and assists users in locating it quickly. (Dataversity; NYU)

data dictionary
A data dictionary provides a detailed description for each element or variable in your dataset and data model. Data dictionaries are used to document important and useful information such as a descriptive name, the data type, allowed values, units, and text description. (DataONE)
data integrity
Assuring information will not be accidentally or maliciously altered or destroyed. (NSA)
data life cycle
The data lifecycle represents all the stages of data throughout its life from its creation for a study to its distribution, preservation, and reuse. (DataONE; NNLM Data Thesaurus)
data management plan
(= DMP)

A data management plan describes the data that will be authored and how the data will be managed and made accessible throughout its lifetime. The contents of the data management plan should include: the types of data to be authored; the standards that would be applied, for example format and metadata content; provisions for archiving and preservation; access policies and provisions; and plans for eventual transition or termination of the data collection in the long-term future. (DataONE)

data paper
A factual and objective publication with a focused intent to identify and describe specific data, sets of data, or data collections to facilitate discoverability. (DataCite)
data publishing
Data publishing (also data publication) is the act of releasing research data in published form for use by others. It is a practice consisting in preparing certain data or data set(s) for public use thus to make them available to everyone to use as they wish. This practice is an integral part of the open science movement. There is a large and multidisciplinary consensus on the benefits resulting from this practice. (Wikipedia)
data repository
see: repository
data resource
(= resource)

Resources are the actual files, APIs or links that are being shared. (DKAN)

dataset
A dataset is the term for a collection of research data files produced in the course of research for a paper or project, plus accompanying metadata: describing the data, and indicating who produced the data, and who may access it - i.e. title, description, categories, contributors, license and so forth. Usage of the term dataset varies considerably across disciplinary communities. (Mendeley; Renear et al., 2011)
dataset doi
see: digital object identifier
description
(= summary, abstract)

A rich summary of the dataset: how and why it was generated and how it should (or should not) be used. This can be modified from article text, but should focus on characterizing the data, not the research project. Analogous to an abstract for a paper. (ESIP/NetCDF; Ag Data Commons)

digital object identifier
(= dataset doi, DOI)

Globally unique character strings that reference physical, digital, or abstract objects. They provide actionable, interoperable, persistent links to information about the objects they reference. (USGS)

E

embargo
(= scheduling option)

A specified period of time during which the dataset is inaccessible. At the end of the embargo period, the dataset will be made available. Metadata describing the dataset is publicly available during this period. (Subject guides: Data Publication: Home, 2020)

endpoint
An association between a binding and a network address, specified by a URI, that may be used to communicate with an instance of a service. An end point indicates a specific location for accessing a service using a specific protocol and data format. (W3C)

F

FAIR data
A set of guiding principles to make data Findable, Accessible, Interoperable, and Reusable. (FORCE11)
format
(= file type, file format, resource format)

A digital resource encoded for storage in a computer file in a standard way. File formats may be either proprietary or free and may be either unpublished or open. (Wikipedia; DCMI)

G

geographic extent
The spatial (horizontal and/or vertical) delineation of the resource. (NOAA)
geospatial data
Data about objects, events, or phenomena that have a location on the surface of the earth. (Stock & Guesgen)
GIS
(= Geographic Information System)

A framework for gathering, managing, and analyzing data. Rooted in the science of geography, GIS integrates many types of data. It analyzes spatial location and organizes layers of information into visualizations using maps and 3D scenes. (Esri)

H

harvest
To use the public feed or API of another data portal to import items from that portal's catalog into your own. For example, Data.gov harvests all of its datasets from the data.json files of hundreds of U.S. federal, state and local data portals. (DKAN)

L

license
A legal document under which the resource is made available, typically indicated by URL (DCAT, schema.org)
local resource
Data files stored and served from an internally managed repository. (DKAN, USGS)

M

metadata
Documentation of important aspects of data that describe where, when, and why the data were collected; who collected the data; what types of data were collected; what processes were used to create the data; what quality assurance controls were used; and where the collected data are located. Metadata are provided in a human-readable form as well as in a format that is machine readable (for example, XML) for automated use. (USGS)
metadata record
(= catalog record)

An item-level metadata record details the characteristics of a digital object for the purposes of description, resource discovery, and preservation. It typically includes: Descriptive information; Access points; Contextual information; Reference to the original item and collection; Administrative and preservation information. (Xie & Matusiak, 2015)

metadata schema
A unified and structured set of rules developed for object documentation and functional activities. (Drake, 2003)

O

open data
In general, consistent with the following principles: Public; Accessible; Described; Reusable; Complete; Timely; Managed Post-Release. (Project Open Data)

P

peer review
The process in which a new book, article, software program, etc., is submitted by the prospective publisher to experts in the field for critical evaluation prior to publication, a standard procedure in scholarly publishing. (abc-clio/ODLIS)
processed data
Data that has been edited, cleaned or modified from the raw data. (MGDS)
product type
(= resource type)

A high-level categorization of the most important part of the dataset's actual content – for example, Audiovisual; Collection; Dataset; Image; Model; Software. (Ag Data Commons)

public access level
(= access level)

The degree to which this dataset could be made available to the public, regardless of whether it is currently available to the public [e.g. under embargo]. (Project Open Data)

published (moderation state)
see: data publishing

R

raw data
Refers to data that have not been changed since acquisition. (MGDS)
registry
Authoritative, centrally controlled store of information. (W3C)
remote resource
(= external resource)

Associated data stored in external data repositories, or code stored in external software repositories. (Dryad)

repository
(= data repository, metadata repository)

A place that holds data, makes data available to use, and organizes data in a logical manner. A data repository may also be defined as an appropriate, subject-specific location where researchers can submit their data. (NLM)

resource format
see: format

S

self-citation
Reference made in a written work to one or more of the author's previous publications (book, periodical article, conference paper, etc.), an accepted practice in scholarly communication, provided important works written on the subject by other authors are not neglected or ignored. (abc-clio/ODLIS)

T

taxonomy
Typically a controlled vocabulary with a hierarchical structure, with the understanding that there are different definitions of a hierarchy. Terms within a taxonomy have relations to other terms within the taxonomy. These are typically: parent/broader term, child/narrower term, or often both if the term is at mid-level within a hierarchy. (American Society for Indexing)
temporal coverage
(= temporal extent)

The time period that the dataset covers. An interval of time that is named or defined by its start and end. (DCAT)

U

use limitations
Limitations regarding the dataset's usability. Example statements include "estimates biased over water," "equipment malfunctioned during a specified time," or "granularity makes data unsuitable for certain kinds of analysis". (Ag Data Commons)

V

version
A new version of a dataset is created when there is a change in the structure, contents, or condition of the resource. In the case of research data, a new version of a dataset may be created when an existing dataset is reprocessed, corrected or appended with additional data. (ANDS)
vocabulary
(= controlled vocabulary; see also: taxonomy)

A controlled vocabulary, also called an authority file, is an authoritative list of terms to be used in indexing (human or automated). Controlled vocabularies do not necessarily have any structure or relationships between terms within the list and are often used for name authorities (proper nouns), such as persons, organization names, company names, etc. Controlled vocabularies are the broadest category, which includes thesauri and taxonomies. (American Society for Indexing)

References