National Institutes of Health Data Management
The following will guide you through the completion of your Data Management and Sharing Plan for your National Institutes of Health grant proposal. For each question below, we provide either reference resources to assist you in writing the response to the question or sample text referencing technology resources available at Baylor University.
Note: Please do not copy the content in the questions below into your data management plan. Use the references or sample text to assist you in developing the response.
Blank Template (.docx) Anonymized Example (.docx)
Element 1: Data Type
Types and amount of scientific data expected to be generated in the project
Summarize the types and estimated amount of scientific data expected to be generated in the project
The type of scientific data will vary from research project to project. Baylor University has provided references to the resources below to help you in writing the narrative for Question 1A.
Common Data Types in Public Health Research
Types of Data Resource Guide – University of Nebraska Medical CenterEstimating the amount of scientific data expected to be generated requires consideration of how data transactions throughout the research will occur.
For example, you intend to collect data from laboratory mice using an Instrument called the Mouser and a software called MouseX. When using the Mouser instrument, you plan on taking 20 measurements of mice each day, during the data collection phase. The phase lasts for 60 work days. Each measurement results in an average data file output of 5GB. Further, the MouseX software requires 200GB of storage space for the software application. Data analysis and visualization will be conducted by a co-PI at a different Institution.
When calculating data storage, the resulting outcome should always be increased by at least 15% to account for unexpected volume in your research. A simple method is below.
5GB (output size) X 20 (measurements) = 100GB / Day
60 work days X 100GB = 6,000GB
6,000GB X 15% unexpected volume = 900GB
6,000GB + 900GB + 200GB (Software Storage) = 7,100GB
7,100GB = 7.1TB
The resulting amount of 7.1TB is the estimated amount of data storage required. There are numerous types of data storage options and performance, along with various methods for data storage estimation. Contact research_technology@baylor.edu for assistance in developing estimated data storage values for Data Management Plans.Scientific data that will be preserved and shared, and the rationale for doing so
Describe which scientific data from the project will be preserved and shared and provide the rationale for this decision.
The data you preserve and make accessible to others are part of the legacy of the research, and in many cases will be necessary to validate the findings you place on the public record. It is important, therefore, that data generated at Baylor University is of good quality, preserved according to applicable standards, and is made accessible and re-usable.
The basis of effective open data sharing is described by the FAIR Data Principles, according to which Data should be Findable, Accessible, Interoperable, and Re-usable. In most cases these principles can be complied with by archiving data in a data repository.
Baylor University Libraries – Preserving, Allowing Access, and Reusing DataMetadata, other relevant data, and associated documentation
Briefly list the metadata, other relevant data, and any associated documentation (e.g., study protocols and data collection instruments) that will be made accessible to facilitate interpretation of the scientific data
Baylor University Libraries – Data and Metadata Overview
File Formats
Folder and File Naming Conventions
BEARdocs
Element 2: Related Tools, Software and/or Code
State whether specialized tools, software, and/or code are needed to access or manipulate shared scientific data, and if so, provide the name(s) of the needed tool(s) and software and specify how they can be accessed.
BaylorITS offers numerous software options for researchers. A software search tool is available to the research community. Visit https://helpdeskplus.web.baylor.edu/software and in the Audience option list of the search interface, select Researcher to see the list of software options filtered by availability to researchers.
In the event the software is not listed in the software catalog but has been approved for purchase, please describe the software application in this section. Ensure that you provide the version information, where the application stores data, and any related data analysis or visualization tools or packages that will be used in conjunction with the software application.
If you do not find the software applicable for your research, and are unsure if you will be allowed to procure the software, you may submit a request for approval to purchase a software by using the BaylorITS IDEA Form.
Element 3: Standards
State what common data standards will be applied to the scientific data and associated metadata to enable interoperability of datasets and resources, and provide the name(s) of the data standards that will be applied and describe how these data standards will be applied to the scientific data generated by the research proposed in this project. If applicable, indicate that no consensus standards exist.
The United States – Federal Enterprise Data Resources service, also known as resources.data.gov, provide standards, schemas, and related resources to guide you in determining common data standards. Use the reference below to assist you in documenting the data standards that will be applied to this research.
Data Standards – US Federal Enterprise Data Resources
Element 4: Data Preservation, Access, and Associated Timelines
Repository where scientific data and metadata will be archived
Provide the name of the repository(ies) where scientific data and metadata arising from the project will be archived; see Selecting a Data Repository).
BEARdocs
The Texas Digital Library hosts DSpace, a digital service that collects, preserves, and distributes digital material for a consortium of institutions of higher education in Texas. BEARdocs is a repository of digital materials from Baylor that is hosted in the DSpace environment. Baylor faculty and staff can use BEARdocs to archive theses, dissertations, faculty scholarship, digital projects, and other collections of materials. Anyone with an Internet connection can view and search BEARdocs.
Texas Digital Repository
The Texas Data Repository is a platform for publishing and sharing datasets (and other data products) created by faculty, staff, and students at higher education institutions. The repository (https://dataverse.tdl.org/) is built in an open-source application called the Dataverse software, developed and used by Harvard University. The repository is hosted by the Texas Digital Library, a consortium of academic libraries in Texas with a proven history of providing shared technology services that support secure, reliable access to digital collections of research and scholarship.How scientific data will be findable and identifiable
Describe how the scientific data will be findable and identifiable, i.e., via a persistent unique identifier or other standard indexing tools
Your NIH grant will likely explain how to properly set identifiers for your datasets. It is important to utilize unique identifiers and standard indexing tools for data integrity.
A Guide to Choosing a Data Repository for NIH-Funded Research
Finding Datasets, Data Repositories, and Data StandardsWhen and how long the scientific data will be made available
Describe when the scientific data will be made available to other users (i.e., no later than time of an associated publication or end of the performance period, whichever comes first) and for how long data will be available.
Grants through NIH will specify how long scientific data will be made available. It will also specify when it should be made available to other users. Please write out when the data will be made available to other users and for how long this data will be made available. This is usually decided by things like the grant, when the award ends, end of an NDA, etc.
NIH Record Retention Policy
Element 5: Access, Distribution, or Reuse Considerations
Factors affecting subsequent access, distribution, or reuse of scientific data
NIH expects that in drafting Plans, researchers maximize the appropriate sharing of scientific data. Describe and justify any applicable factors or data use limitations affecting subsequent access, distribution, or reuse of scientific data related to informed consent, privacy and confidentiality protections, and any other considerations that may limit the extent of data sharing. See Frequently Asked Questions for examples of justifiable reasons for limiting sharing of data.
Specify what types of additional considerations will be considered for scientific data. It is possible to not have any additional considerations. This is also the space to include whether there will be any consents required by research participants i.e. broad data sharing.Whether access to scientific data will be controlled
State whether access to the scientific data will be controlled (i.e., made available by a data repository only after approval)
Lay out the process by which scientific data will be accessed by researchers/others. This may vary by data classification. The data classifications to consider are as follows:
Public
Non-Public
Protected
Restricted
Government Classifiedhttps://its.web.baylor.edu/security/data-classification-standards
NIH Access, Distribution, and Reuse ConsiderationsProtections for privacy, rights, and confidentiality of human research participants
If generating scientific data derived from humans, describe how the privacy, rights, and confidentiality of human research participants will be protected (e.g., through de-identification, Certificates of Confidentiality, and other protective measures).
It is important to ensure that the identifiers tied to human research participants are being protected. Protected Health Information (PHI) must be protected in accordance with relevant data protection laws. Protected health information (PHI) under U.S. law is any information about health status, provision of health care, or payment for health care that is created or collected by a Covered Entity (or a Business Associate of a Covered Entity), and can be linked to a specific individual.
The Health Insurance Portability and Accountability Act (HIPAA) of 1996 specifies a number of elements in health data that are considered identifiers. If any are present, the health information cannot be released without patient authorization. Such data can be released for research purposes with approval of a waiver of patient authorization from an Institutional Review Board (IRB). For DHCS identifiable data the IRB is the Committee for the Protection of Human Subjects (https:// www.chhs.ca.gov/cphs/ )
The following are considered identifiers under the HIPAA safe harbor rule:
Names;
All geographic subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of the Census: The geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and the initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000.
All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older;
Telephone numbers;
Fax numbers;
Electronic mail addresses;
Social security numbers;
Medical record numbers;
Health plan beneficiary numbers;
Account numbers;
Certificate/license numbers;
Vehicle identifiers and serial numbers, including license plate numbers;
Device identifiers and serial numbers;
Web Universal Resource Locators (URLs);
Internet Protocol (IP) address numbers;
Biometric identifiers, including finger and voice prints;
Full face photographic images and any comparable images; and
Any other characteristic that could uniquely identify the individual
Element 6: Oversight of Data Management and Sharing
Describe how compliance with this Plan will be monitored and managed, frequency of oversight, and by whom at your institution (e.g., titles, roles).
Specify what individual will be monitoring and managing the data for this project. This will likely be the project’s Principal Investigator and/or Data Manager. Make sure to specify if there are research staff members working for the PI that will also be implementing or managing the data management and sharing.