IGVF Internal Data Sharing, Resource Sharing, and Publication Policy

Last updated 25_03_28

Overview

The goals of the IGVF consortium are to transform our understanding of how genomic variation impacts genomic function and leads to phenotypes in health and disease, and to share products from the Consortium. This policy explains to Impact of Genomic Variation on Function (IGVF) Consortium members how they are to use and share consortium resources (such as data, analyses, software, and protocols) generated by themselves and other consortium members. Publications are one important use case. The goals of this policy include encouraging trust, transparency, and collaboration within the consortium. This policy is consistent with the Resource Sharing Plan guidelines stated in the original IGVF funding announcements (https://www.genome.gov/Funded-Programs-Projects/Impact-of-Genomic-Variation-on-Function-Consortium#funding-opportunities). This IGVF Internal Data Sharing, Resource Sharing, and Publication Policy does not replace general NIH policies that concern data, technology, and resource access and sharing. This policy also augments but does not supersede sharing requirements described in the Terms and Conditions of IGVF awards. The corresponding External Resource Sharing Policy for researchers who are not part of the IGVF Consortium is available on the IGVF website.

Respect of Ideas and Privileged Communications

Consortium members are expected to work collaboratively rather than competitively and to treat each other with respect and courtesy. To meet this goal, all IGVF members should respect and work towards balancing the needs of consortium members from all awards.
Consortium members are expected to mentor and model good scientific practices. To meet this goal, all IGVF members should respect and work towards balancing the needs of consortium members across career stages (e.g. students, postdocs, staff scientists and faculty members) from all awards.
Presentations, ideas, and results presented at IGVF internal meetings are privileged communications and should not be shared beyond the consortium without explicit written consent. Recordings. slides and minutes of internal meetings are for internal use only.
Consortium members should operate under the assumption that the consortium is actively conducting integrative analyses across data types, as well as directed analyses of specific data types. When a Consortium member or group realizes they are conducting analyses similar to other Consortium members, it is critical that they immediately communicate this to the other Consortium member(s) and NHGRI, which may lead to joint or coordinated publications. Moreover, it is inappropriate to initiate new work that is substantially similar and could be perceived as competing, unless the other member(s) is aware and supportive of these efforts.

Policy for Consortium Member Submission, Release, and Use of IGVF Data and Resources

To maximize the scientific contributions of the Consortium, all IGVF Consortium members are permitted to use resources developed by the Consortium. NHGRI also expects that the major resources generated as part of IGVF, including data, outputs from predictive modeling, analyses, software, computational tools, models, and protocols, will be made freely available to the research community. To facilitate sharing with the external community, IGVF investigators will submit data, predictions, metadata, and other information to the DACC, which will establish and maintain an IGVF data portal. Consistent with this sharing, IGVF Resource Sharing Plans, which are a term and condition of award, include these terms that are also a part of the policy:

Data release through DACC upon QC (hyperlink to QC documents when available).
Preprint shared (e.g., IGVF bioRxiv channel) no later than manuscript submission.
Data released through DACC no later than manuscript submission.
Software release in version-controlled repository no later than manuscript submission.

To comply with the above terms, below is more detailed information about submission to the IGVF data portal (https://data.igvf.org/), submission to other repositories, release, and how this corresponds to publications. Additional information about expectations for data submission can be found in the DACC’s IGVF Data Submission and Release Policy and the DACC’s Release SOP (standard operating procedure) on the consortium’s Wiki.

Data and Computational Tool/Output Requirements at Time of Manuscript/Preprint Submission

This policy applies to both preprints and manuscripts submitted to journals. At the time of manuscript/preprint submission:

Initial versions of raw data and metadata are required to be released through the IGVF data portal. It is preferred that initial processed data are also released at this time.
Computational outputs from predictive modeling, along with metadata, should be released through the IGVF data portal.
Software, computational tools, and models should be released through a version-controlled public repository (e.g., Github, kipoi) and should be linked to via the IGVF data portal. This can occur earlier if the software, computational tools, or models are sufficiently stable.

Data submissions include raw data, processed data, metadata, and computational outputs from predictive modeling. NHGRI and the DACC are aware these data may change during manuscript review and prior to final manuscript acceptance. Final updates should be made at the time of acceptance. All submissions are version-controlled and can be updated. Lab-processed data submissions are allowable and should be submitted to the IGVF data portal while final IGVF pipelines are under development. For the purposes of preprints and manuscript submissions, many labs are using their own pipelines while final IGVF processing pipelines are still under development. Processed data, like raw data and metadata, are version-controlled and must be updated when final pipeline processed data becomes available.

Computational outputs from predictive modeling should have unique IGVF data portal accession numbers as further described in the IGVF additional guidance on sharing software and intermediate analyses (updated hyperlink). All software, computational tools, and models should be well-documented.

Data and Computational Tool/Output Requirements at Time of Manuscript Acceptance

At the time of manuscript acceptance, final versions of raw data, processed data, metadata, and computational outputs from predictive modeling are required to be submitted to the IGVF data portal if anything has changed from the previously submitted. Final versions of software, computational tools, and models should be released through a version-controlled public repository, archived at a permanent repository with a DOI (e.g., Zenodo), and IGVF portal links should be updated.

Data and Computational Tool/Output Requirements at Time of Manuscript Publication

At the time of manuscript publication, final versions of raw data, processed data, metadata, and computational outputs from predictive modeling are required to be released via the IGVF data portal. “Publication” refers to earliest publishing and release to the public of a manuscript following peer-review, including online publication ahead of print.

It is anticipated that IGVF will also submit resources including software, computational tools, analyses, models, predictions, data, and related information into public databases and repositories agreed upon by the IGVF Consortium. Deposition of IGVF data into public databases and repositories will be facilitated by the DACC and not individual awards. Exceptions, such as holding back a limited amount of data for a limited time for a community challenge, will be considered by NIH as individual cases. For journals that require deposition and release of sequencing data into SRA or dbGaP at the time of manuscript review, awards may directly submit these data but must contact the DACC prior to doing so.

Failure to release the above approved products (including raw and final processed data, or outputs of predictive models) by manuscript publication may affect receipt of future years of IGVF funding or receipt of new NHGRI funding.

Manuscript Tracking and Preprints

All manuscripts using consortium data or analyses that are, at least in part, funded by IGVF awards must be reported on the wiki (IGVF wiki > Publications > IGVF Manuscript Tracking). The IGVF Manuscript Tracking list is used to track manuscripts at the time a manuscript is under preparation through publication. As noted above, it is a term of IGVF awards that manuscripts must be shared with the public via bioRxiv no later than the time of submission to a journal. Authors must update the IGVF Manuscript Tracking list with the preprint URL to reflect preprint submission and notify the DACC of these postings so that preprints can be included in the IGVF bioRxiv channel.

Award Number Citation

All IGVF-funded manuscripts must acknowledge IGVF funding by including the appropriate award number(s). Publications that are not, at least in part, using IGVF data, IGVF analyses or other IGVF efforts that are funded through an IGVF award should not cite IGVF award numbers. Please note this applies to IGVF funded work; projects that are not, at least in part, funded through an IGVF award should not cite IGVF award numbers. To learn more about acknowledging federal funding see “Tips for When to Acknowledge NIH Funding” here: https://grants.nih.gov/policy/federal-funding.htm .

Abstracts and Presentations Shared with External Audiences

Consortium members may share slides using the wiki (IGVF Wiki > Outreach Activities > Slide Sharing Drive). Authors have the option to share information for presentations, posters, and webinars on the wiki (IGVF Wiki > Outreach Activities > Consortium Member Engagement Presentations); this is recommended for talks about IGVF or that are mainly focused on IGVF work. Presentations IGVF should mention IGVF support.

Use of Data and Resources you Generated for the Consortium in Publications

If a Consortium member or group wishes to publish on one or a small number of data sets or analyses generated by their group, it is their obligation to inform the other Consortium members, at least at the time a manuscript is being submitted. The authors are required to add the manuscript to the wiki (IGVF wiki > Publications > IGVF Manuscript Tracking) and sharing via working group calls or other IGVF meetings is strongly encouraged.

If a Consortium member or group wishes to publish a paper using a large number of datasets or analyses generated by their group, it is their obligation to inform the rest of the Consortium at the time a manuscript is beginning to be prepared. The authors are required to add the manuscript to the wiki (IGVF wiki > Publications > IGVF Manuscript Tracking) and sharing via working group calls or other IGVF meetings is strongly encouraged.

The goal is to balance individual publications and consortium publications. The rationale is that the larger the number of data sets or analyses the more likely the publication could interfere with others working on a consortium-wide paper; however, there is no clear numerical threshold.

The corresponding author is obliged to report the DACC accession numbers for all IGVF data used in each publication reporting IGVF award support.

Use of Data and Analyses from Other IGVF Groups in Publications

Consortium members may use any IGVF resources from any group (including data generated from human samples covered by various consent groups designated by the Institutional Certificates). However, if a group wants to publish results based in whole or in part on unpublished resources generated by one or more other IGVF groups, the authors are required to consult with the PI(s) from the group(s) that generated the resources to establish a mutually agreeable path towards timely publication. The authors must email NHGRI and the PI(s) of the group(s) that generated the data or analyses and describe their plans to publish at the time a manuscript is beginning to be prepared. The proposed manuscript should be shared on the wiki (IGVF wiki > Publications > IGVF Manuscript Tracking).

The publication process should respect both analysts and data producers, as well as enable junior group members to receive appropriate credit. It is expected that authors would make contributions to any or all of the following, including but not limited to the conception, design, acquisition and analysis of data, drafting of the manuscript, and editing and revision of the manuscript.

Three data/analysis use cases are provided here as examples:

(Case 1) No strings attached: If there is no substantial overlap with ongoing work and no interest in collaboration, consortium members may publish a paper on their analyses (as is automatically the case for analysts outside of IGVF).

(Case 2) Single collaborative paper: If the analyses by the two or more groups complement and strengthen each other, the groups may choose to join forces and collaborate towards a single paper, with authorship and other aspects of the publication determined by mutual agreement. In general, authors (First, Middle and Senior) and order will be determined by PI/MPIs of the involved groups. Generally, the first author will take primary responsibility for the manuscript. Given the nature of collaborative work, shared first or last authors should be considered as an option.

(Case 3) Separate papers: The groups can choose to pursue their independent analyses. The groups may choose to coordinate publications to maximize overall combined benefit to the two groups (usually within 1 year from initiation of discussions and ideally sooner). Even if the groups choose not to formally coordinate publication, collegial communication about submission plans is encouraged.

In case of disagreement: If none of these three cases applies, or the groups cannot agree on their details, they can reach out to the Data Release and Publication Adjudication Committee (see last section; “Conflict Resolution”), which will make a recommendation to maximize benefit to the Consortium and the external community, with a general preference towards shorter publication timelines. Note that a group is not permitted to block the ability of the other group(s) to publish their analyses.

Conflict Resolution

If issues or conflicts pertaining to data submission, release, or other requirements described in this policy arise, NHGRI should be contacted immediately to ascertain steps towards resolution. To resolve conflicts that may arise within the Consortium, a six-member Data Release and Publication Adjudication Committee will be formed if needed and be comprised of one member of the NHGRI senior staff and one PI or co-PI (and an alternate) from one mapping group, one characterization group, one modeling group, one networks group and one DACC group. The membership of this Committee will be selected by NHGRI staff.

The process for conflict resolution by Committee is as follows:
Both parties in conflict will propose their own ideal solution and a compromise solution and submit these privately to the Committee. The Committee will weigh these four suggestions, considering the concerns of both groups, and then propose a compromise solution. The Committee will then hold a conference call with both parties to discuss this compromise solution and gather input. After the conference call, the Committee will make a final decision and provide this to both parties. This final decision cannot be appealed.