The Sequence Read Archive (SRA) Overview
Introduction
The SRA is NIH's primary archive of high-throughput sequencing data and is part of the International Nucleotide Sequence Database Collaboration (INSDC) that includes at the NCBI Sequence Read Archive (SRA), the European Bioinformatics Institute (EBI), and the DNA Database of Japan (DDBJ). Data submitted to any of the three organizations are shared among them.
SRA mission
- Archives raw sequencing data and alignment information from high-throughput sequencing platforms, including Roche 454 GS System®, Illumina Genome Analyzer®, Applied Biosystems SOLiD System®, Helicos Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.
- Makes sequence data available to the research community to enhance reproducibility and allow for new discoveries by comparing data sets.
SRA receives, processes and archives thousands of sequencing files daily. This is terabytes of sequence information from all over the world.
Infographics: SRA database growth
Typical next-generation sequencing workflow
![A typical next-generation sequencing workflow](/congress115th/20190111174556im_/https://www.ncbi.nlm.nih.gov/core/assets/sra/images/next-generation-sequencing-workflow.png)
Alnasir J, Shanahan HP. Investigation into the annotation of protocol sequencing steps in the sequence read archive. Gigascience. 2015 May 9;4:23. doi: 10.1186/s13742-015-0064-7. eCollection 2015. PMID: 25960871 (Open Access)
SRA and other NCBI databases
The NCBI develops and maintains over 35 databases for a number of broad biological data categories that include scientific literature, health, genomes, genes, proteins, and chemicals.
Each database has its own minimum publishable unit . For example, in PubMed minimum publishable unit is an article, whereas, in SRA, this is an experiment (accession in the form of SRX# ). SRA experiment includes sequence data and metadata regarding how a biological sample was sequenced.
Search all NCBI databases: http://www.ncbi.nlm.nih.gov/gquery/?term=homo+sapiens
![The NCBI Global Query page](/congress115th/20190111174556im_/https://www.ncbi.nlm.nih.gov/core/assets/sra/images/gquery.png)
Connections of the SRA with other databases
All NCBI databases are interconnected. This interconnectedness allows for powerful search capabilities. For example:
- Find articles in PubMed that reference SRA studies: "pubmed sra"[Filter]
- Find SRA experiments published in PubMed: "sra pubmed"[Filter]
Similarly, you can find SRA connections with other NCBI databases and vice versa.
See Search in SRA for more examples.
SRA data
SRA accepts data from all kinds of sequencing projects including clinically important studies that involve human subjects or their metagenomes, which may contain human sequences. These data often have a controlled access via dbGaP (the database of Genotypes and Phenotypes) .
![SRA access types](/congress115th/20190111174556im_/https://www.ncbi.nlm.nih.gov/core/assets/sra/images/sra_access_types.png)
SRA study is said to have a public access if its sequence data are available for download without restrictions.
Contact SRA
Contact SRA staff for assistance at sra@ncbi.nlm.nih.gov