The Sequence Read Archive (SRA) Overview

Introduction

The SRA is NIH's primary archive of high-throughput sequencing data and is part of the International Nucleotide Sequence Database Collaboration (INSDC) that includes at the NCBI Sequence Read Archive (SRA), the European Bioinformatics Institute (EBI), and the DNA Database of Japan (DDBJ). Data submitted to any of the three organizations are shared among them.

SRA mission

  • Archives raw sequencing data and alignment information from high-throughput sequencing platforms, including Roche 454 GS System®, Illumina Genome Analyzer®, Applied Biosystems SOLiD System®, Helicos Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.
  • Makes sequence data available to the research community to enhance reproducibility and allow for new discoveries by comparing data sets.

Tack SRA receives, processes and archives thousands of sequencing files daily. This is terabytes of sequence information from all over the world.

Infographics: SRA database growth

Typical next-generation sequencing workflow

A typical next-generation sequencing workflow
A typical next-generation sequencing workflow

Alnasir J, Shanahan HP. Investigation into the annotation of protocol sequencing steps in the sequence read archive. Gigascience. 2015 May 9;4:23. doi: 10.1186/s13742-015-0064-7. eCollection 2015. PMID: 25960871 (Open Access)

SRA and other NCBI databases

The NCBI develops and maintains over 35 databases for a number of broad biological data categories that include scientific literature, health, genomes, genes, proteins, and chemicals.

Each database has its own minimum publishable unit . For example, in PubMed minimum publishable unit is an article, whereas, in SRA, this is an experiment (accession in the form of SRX# ). SRA experiment includes sequence data and metadata regarding how a biological sample was sequenced.

Search all NCBI databases: http://www.ncbi.nlm.nih.gov/gquery/?term=homo+sapiens

The NCBI Global Query page
The NCBI Global Query page

Connections of the SRA with other databases

All NCBI databases are interconnected. This interconnectedness allows for powerful search capabilities. For example:

Similarly, you can find SRA connections with other NCBI databases and vice versa.

See Search in SRA for more examples.

SRA data

SRA accepts data from all kinds of sequencing projects including clinically important studies that involve human subjects or their metagenomes, which may contain human sequences. These data often have a controlled access via dbGaP (the database of Genotypes and Phenotypes) .

SRA access types
SRA access types
Tack Submission of data with controlled access starts with registering study's metadata in dbGaP. Refer to our dbGaP submission Guide.

SRA study is said to have a public access if its sequence data are available for download without restrictions.


Contact SRA

Contact SRA staff for assistance at sra@ncbi.nlm.nih.gov

Support Center

Last updated: 2018-01-18T13:28:14Z