6 Data Management
Learning Outcomes
By the end of this chapter, learners will:
- Understand the terms research data, research data management, and data management plan
- List some common elements of a data management plan and describe how they could be used to structure an open education data collection project
Important Terminology
- Research data management: A general term that describes what researchers do to structure, organize, and maintain data before, during, and after doing research. Anyone who collects or uses data for the purpose of doing research is doing research data management. Creating a data file and deciding where to save it, renaming a data file, or moving it are all research data management activities (Thompson, 2023).
- Research data lifecycle: Typically depicted as a cyclical process, the research data lifecycle highlights the main stages of planning, collecting, processing, analyzing, preserving, and sharing research data. Researchers start by planning their research. They then collect, process, and clean data to get them into shape for analysis and analyze them to form conclusions about their research. Finally, they take steps to preserve the data for the long term and make them available for others to use and study (Thompson, 2023).
Figure 1. Research data lifecycle. Credit: Kristi Thompson’s (2023) The Basics: An Introduction To Research Data Management, which is available under a Creative Commons Attribution-NonCommercial 4.0 International licence
- Data management plans (DMPs): A formal description of what a researcher plans to do with their data from collection through eventual preservation or deletion. DMPs are intended to help researchers manage data across all phases of the research data lifecycle, from collection to sharing. They are often described as “living documents” that should be updated as needed while researchers work with their data (Thompson, 2023).
The Data Management Plan
Why use a DMP?
A data management plan (DMP) can save you and your research team time. It helps the research team think through and work with their data more effectively, and shows how data will be collected, stored, and preserved (Thompson, 2023).
A DMP is helpful whenever data is being collected or stored, to help plan how data will be processed and analyzed, to reduce redundancy and lost effort, and to ensure best practices are followed.
Components of a DMP
A DMP can include a variety of elements. The below components are based on sections in the DMP Assistant tool. In Canada, researchers at post-secondary institutions have access to the DMP Assistant tool. This web-based tool asks users a series of questions about their data and research plans, with contextual help and guidance on how to answer each question.
Responsibilities and Resourcing
This section of the DMP describes the following components of the project:
- Who is the research lead or principal investigator?
- Who is responsible for collecting and managing the data?
- Who are the producers and owners of the data?
- How will data be collected, created, processed, and analyzed?
- Is any transcribing or translation required?
Data Collection
This section of the DMP discusses the following components of the project:
- What types of data will be collected?
- What file formats will be used?
- How will files be named?
Documentation and Metadata
This section of the DMP comprises a master study document that describes where the data came from; README files that list and describe the documents present within a project and how they are related; and codebooks that outline the schema of data files. The goal of this section of the DMP is to allow anyone subsequently looking at the project to be able to understand how the data was collected and processed.
Storage and Backup
This section of the DMP discusses the following components of the project:
- Where will the data be stored?
- How will the data be backed up?
- How will the data and its backups be secured?
- How will data be preserved for the long term?
The 3-2-1 backup rule is a widely used standard: there should be three copies of each file, the copies should be on two different media, and one copy should be off-site.
Access, Sharing and Reuse
This section of the DMP discusses the following components of the project:
- Will data be shared and if so where?
- Under what license will data be available? Are there restrictions in how the data can be shared or reused?
Ethics and Legal Compliance
This section of the DMP discusses issues of confidentiality, data sovereignty (if relevant), data ownership, and intellectual property rights.
DMP Tools
In Canada, post-secondary institutions have access to the DMP Assistant. This web-based tool asks users a series of questions about their data and research plans, with contextual help and guidance on how to answer each question. The DMP Tool is an alternative that is targeted to the requirements of US funding bodies like the National Institutes of Health.
Conclusion
A data management plan helps to organize and maintain data during and after a data collection initiative. It provides a structured approach to understanding how data will be collected, processed, and analyzed. Ultimately, having a strong data management plan in place makes data collection work better.
Resources
© 2024. This chapter by Stephanie Quail is available under a Creative Commons Attribution-NonCommercial 4.0 International licence. The content in the sections: Learning Outcomes, Important Terminology, and the Data Management Plan are adaptations of Kristi Thompson’s (2023) The Basics: An Introduction To Research Data Management, which is available under a Creative Commons Attribution-NonCommercial 4.0 International licence