What is tbl2asn? |
Tbl2asn is a command-line program that automates the creation of sequence records for submission to GenBank. It uses many of the same functions as Sequin
but is driven generally by data files. Tbl2asn generates .sqn files for submission to GenBank. Additional manual editing is not required before submission.
Tbl2asn is packaged with the Sequin archive. Please use tbl2asn version 2.5 and above, packaged with Sequin version 5.00 and above.
5 types of input data files |
- Template file containing a text ASN.1 Submit-block object (suffix .sbt).
- Nucleotide sequence data in FASTA format (suffix .fsa).
- Feature Table (suffix .tbl).
- Protein sequence (suffix .pep). (These will replace the tbl2asn-generated conceptual translations to confirm that the CDS intervals are correct.).
- Quality Scores (suffix .qvl).
Creating the template file (.sbt) |
- Choose start a new submission with Sequin.
- Enter manuscript title if desired.
- Enter contact, authors and affiliation information.
- Return to submission tab and use File->Export Submitter Info.
- Save as template.sbt.
Generating the .sqn file for submission |
The minimum requirements to generate a Sequin file using tbl2asn are one .sbt and one or more .fsa files.
The files are placed in a source directory and a series of command-line arguments are used to generate the .sqn files.
Tbl2asn will generate a .sqn for every the .fsa file in the directory, plus any of the corresponding optional files that may be present. The
other files must have the same file name prefix as their corresponding .fsa. (for example helicase.fsa and helicase.tbl).
Command Line Arguments
-p | Path to the directory. If files are in the current directory -p. should be used. |
-r | Path for the resulting .sqn submission file (if the -r argument is not given the .sqn files will be saved in the source directory). |
-t | Specifies the template file (.sbt). If the .sbt file is in a different directory the full path must be specified. |
-s | Instructs tbl2asn to read multiple FASTA components in one file as a set of unrelated sequences. This creates a single file of multiple submissions.
(1000 sequences per file is the maximum.) |
-c | Instructs tbl2asn to annotate the longest open reading frame (orf) if a .tbl file is not provided. |
-m | Allows alternative start codons to be used in orf searches. |
-v | Validates the data records. The output is saved to files with a .val suffix. |
-b | Generates GenBank flatfiles with a .gbf suffix. |
-i | Creates single submission from indicated .fsa file in a directory of multiple .fsa files. |
-j | Allows the addition of source qualifiers that will be the same for each submission. Example: -j
"[organism=Saccharomyces cerevisiae] [strain=S288C]". |
-y | Adds a COMMENT to each submission. Example: -y "Contigs larger than 2kb have been annotated, representing approx. 87% of the total genome". |
-o | Creates a single submission from multiple fasta files. |
-l | Reads one or more FASTA+GAP alignments to create one or more phylogenetic sets. |
Note: When using -c to annotate the longest orf, the product name is entered as unknown unless one is provided in the .fsa file definition line.
Note: Please review the .val files and correct any error level errors.
Examples:
Single submission: one sequence per .fsa file
tbl2asn -t template.sbt -p path_to_files -v
Batch submission: multiple sequences per .fsa file
tbl2asn -t template.sbt -p path_to_files -s -v
Single submission: one .fsa file in directory of multiple .fsa files
tbls2asn -t template.sbt -i x.fsa -v
Nucleotide sequence and FASTA defline formats (.fsa) |
No size limit on nucleotide sequence.
FASTA file should consist of a single definition line beginning with a '>'.
Minimum requirements for the FASTA defline are:
- SeqID (sequence identifier) which is the text between the '>'
and the first space. The SeqID cannot begin with 'contig', as this identifier is reserved for another use and will generate errors.
- Organism and related information
Optional defline information that may be included is:
Biological
- strain [strain=S288C]
- isolate [isolate=CWS1]
- chromosome [chromosome=XVI]
Other elements
- topology [topology=circular]
- location [location=mitochondrion]
- molecule [moltype=RNA] (DNA is the default)
- technique [tech=wgs]
- protein name [product=helicase] (if using -c)
Click on Source and/or Organism for
the complete list of modifiers.
Example FASTA:
>Sc_16 [organism=Saccharomyces cerevisiae]
tataggcgaatcgagtatattattttttctcaacatatgtat
atgaacatgagaatatatttataggaatgtataaaattgtga
cctctcctgctattttagttactgattttatgtatgtagggg
gaataggggctgcctttcttaatgcagttttaattttttctt
ttaattttttcttagtaaaattatttaaagtaaagattaatg
gaataaccattgcgcttttttttacagtttttggtttttcat
tttttggaaaaaatattttaaatattttacctttttatttag
ggggtattttatatagtatctatacttcaacagatttttctg
aacatatagttcctattgctttttcaagtgcattagcccctt
ttgtaagcagtgttgcttttatggagaaatatcctatgaaac
atcatatataaattttaattggtattttaattggttttatag
tggttcctttgtctaaaagtctttatgactttcatgagggat
atgattttatataatttaggttttacagcaggtttag
Feature table format (.tbl) |
Tbl2asn reads features from a five-column tab-delimited table called a Feature table .
The feature table specifies the location and type of each feature. Tbl2asn will process the feature intervals and translate any CDSs into proteins.
The first line of the table should contain the following information:
>Features SeqID table_name
The SeqID must match the nucleotide sequence SeqID in the corresponding .fsa file.
Example Feature Table:
>Feature Sc_16 Table1
69 543 gene
gene sde3p
69 543 CDS
product SDE3P
protein_id WS1030
Protein sequence format (.pep) |
- Set up as a FASTA file using the protein sequence.
- This file will substitute the automatically translated products of the CDS features with the provided protein sequences.
- Serves as a check that the conceptual translation of the nucleotide sequence is as predicted.
- SeqID must match protein_id in the .tbl file
Example FASTA:
>WS1030 [gene=sde3p] [protein=SDE3P]
MYKIVTSPAILVTDFMYVGGIGAAFLNAVLIFSFNFFL
VKLFKVKINGITIAAFFTVFGFSFFGKNILNILPFYLG
GILYSIYTSTDFSEHIVPIAFSSALAPFVSSVAFYGEI
SYETSYINAILIGILIGFIVVPLSKSLYDFHEGYDLYN
LGFTAG
Quality scores table format (.qvl) |
Provides Phrap/Consed quality scores.
Generates Seq-graph data that will be included with the nucleotide sequence of the .fsa file in the final .sqn file
Sequin will display these in graphical view
>Sc_16
51 63 70 82 82 82 90 90 90 90 86 86
86 86 86 86 90 90 90 90 90 86 86 78...
Disclaimer
Privacy statement
Revised: March 12, 2004.
|