Data Sharing Information

Additional information available here.

What to Include in an NIH Application

Investigators seeking $500,000 or more in direct costs in any year should include a description of how final research data will be shared, or explain why data sharing is not possible. It is expected that the data sharing discussion will be provided primarily in the form of a brief paragraph immediately following the Research Plan Section of the PHS 398 application form (i.e., immediately after I. Letters of Support), and would not count towards the application page limit.

Data Sharing Plan (to follow immediately after the Research Plan Section)

The precise content of the data-sharing plan will vary, depending on the data being collected and how the investigator is planning to share the data. Applicants who are planning to share data may wish to describe briefly the expected schedule for data sharing, the format of the final dataset, the documentation to be provided, whether or not any analytic tools also will be provided, whether or not a data-sharing agreement will be required and, if so, a brief description of such an agreement (including the criteria for deciding who can receive the data and whether or not any conditions will be placed on their use), and the mode of data sharing (e.g., under their own auspices by mailing a disk or posting data on their institutional or personal website, through a data archive or enclave). Investigators choosing to share under their own auspices may wish to enter into a data-sharing agreement.

References to data sharing may also be appropriate in other sections of the application, as discussed below.

Budget and Budget Justification Sections

Applicants may request funds in their application for data sharing. If funds are being sought, the applicant should address the financial issues in the budget and budget justification sections. Some investigators have more experience than others in estimating costs associated with preparing the dataset and associated documentation, and providing support to data users. As investigators gain experience with the process, their ability to estimate costs will improve. Investigators working with archives can get help with data preparation and cost estimation. Investigators who are concerned about paying for data-sharing costs at the end of their grant can make prior arrangements with archives. Investigators facing considerable delays in the preparation of the final dataset for sharing should consult with the NIH program about how to manage this situation, such as requesting a no-cost extension.

Background and Significance Section (PHS 398 Research Plan Section B)

If support is being sought to develop a large database that will serve as an important resource for the scientific community, the applicant may wish to make a statement about this in the significance section of the application.

Human Subjects Section (PHS 398 Research Plan Section E)

If the research involves human subjects and the data are intended to be shared, the application should discuss how the rights and confidentiality of participants would be protected. In the Human Subjects section of the application, the applicant should discuss the potential risks to research participants posed by data sharing and steps taken to address those risks.

Examples of Data-Sharing Plans

The precise content and level of detail to be included in a data-sharing plan depends on several factors, such as whether or not the investigator is planning to share data, the size and complexity of the dataset, and the like. Below are several examples of data-sharing plans.

Example 1

The proposed research will involve a small sample (less than 20 subjects) recruited from clinical facilities in the New York City area with Williams syndrome. This rare craniofacial disorder is associated with distinguishing facial features, as well as mental retardation. Even with the removal of all identifiers, we believe that it would be difficult if not impossible to protect the identities of subjects given the physical characteristics of subjects, the type of clinical data (including imaging) that we will be collecting, and the relatively restricted area from which we are recruiting subjects. Therefore, we are not planning to share the data.

Example 2

The proposed research will include data from approximately 500 subjects being screened for three bacterial sexually transmitted diseases (STDs) at an inner city STD clinic. The final dataset will include self-reported demographic and behavioral data from interviews with the subjects and laboratory data from urine specimens provided. Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers prior to release for sharing, we believe that there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and associated documentation available to users only under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or returning the data after analyses are completed.

Example 3

This application requests support to collect public-use data from a survey of more than 22,000 Americans over the age of 50 every 2 years. Data products from this study will be made available without cost to researchers and analysts.

User registration is required in order to access or download files. As part of the registration process, users must agree to the conditions of use governing access to the public release data, including restrictions against attempting to identify study participants, destruction of the data after analyses are completed, reporting responsibilities, restrictions on redistribution of the data to third parties, and proper acknowledgement of the data resource. Registered users will receive user support, as well as information related to errors in the data, future releases, workshops, and publication lists. The information provided to users will not be used for commercial purposes, and will not be redistributed to third parties.