Mark Blaxter

Open Data Release

The Moore Foundation-funded Aquatic Symbiosis Genomics Project is a project of the Tree of Life programme at The Wellcome Sanger Institute. We are working with a wide range of collaborators to generate reference genome data for many symbiotic organisms from marine and freshwater ecosystems.

All sequence data generated by the project will be openly available for reuse. All raw and assembled data will be deposited in the European Nucleotide Archive (ENA) public database and from there, into the other International Nucleotide Sequence Database Collaboration (INSDC) nodes: GenBank and the DNA Data Bank of Japan. In the spirit of collaboration and community-building, we strongly encourage research hubs to make biological materials available to others for post-genomic work. We expect collaborators will deposit samples relevant to the sequenced species and individuals into national and local collections (including cryorepositories). Where samples derive from cultured organisms, collaborators should, where feasible, make cultures/organisms available on request to other research labs.

The Sanger Institute project team encourages community reuse, and project data will be released freely for reuse for any purpose upon deposition in ENA. Our intention is to rapidly publish all submitted assemblies as Wellcome Open Research notes, which can be cited (see, for example, Daniel Mead, Kathryn Fingland, Rachel Cripps et al. [2020]. The genome sequence of the Eurasian red squirrel, Sciurus vulgaris Linnaeus 1758. Wellcome Open Research. DOI: 10.12688/wellcomeopenres.15679.1). We ask that scientists who use the genome sequence data give appropriate acknowledgement and citation in their own publications.

The Sanger Institute team will also make available for download intermediate data and assemblies via a project website. These data and assemblies are provided “as is” as a service to the community, and we make no assurances as to their completeness or quality. Please note that these assemblies will be improved before final submission to ENA and we cannot guarantee persistence or availability of intermediate files in the long term. We strongly recommend that published analyses are based on data and assemblies submitted to ENA/INSDC. The genome sequences submitted to ENA by the Sanger Institute will be presented through the EBI Ensembl database, and the annotations presented through Ensembl should be regarded as the official versions.