NCBI logo TBL2ASN
PubMed Entrez BLAST OMIM Taxonomy Structure
NCBI
SITE MAP

GenBank
Sequence submission support and software

BankIt
For quick and simple submissions

Sequin
Stand-alone sequence submission tool


blue bulletWhat is tbl2asn?

Tbl2asn is a command-line program that automates the creation of sequence records for submission to GenBank. It uses many of the same functions as Sequin but is driven generally by data files. Tbl2asn generates .sqn files for submission to GenBank. Additional manual editing is not required before submission.

Tbl2asn is packaged with the Sequin archive. Please use tbl2asn version 2.5 and above, packaged with Sequin version 5.00 and above.

blue bullet5 types of input data files

  1. Template file containing a text ASN.1 Submit-block object (suffix .sbt).
  2. Nucleotide sequence data in FASTA format (suffix .fsa).
  3. Feature Table (suffix .tbl).
  4. Protein sequence (suffix .pep). (These will replace the tbl2asn-generated conceptual translations to confirm that the CDS intervals are correct.).
  5. Quality Scores (suffix .qvl).
blue bulletCreating the template file (.sbt)
  • Choose start a new submission with Sequin.
  • Enter manuscript title if desired.
  • Enter contact, authors and affiliation information.
  • Return to submission tab and use File->Export Submitter Info.
  • Save as template.sbt.
blue bulletGenerating the .sqn file for submission

  • The minimum requirements to generate a Sequin file using tbl2asn are one .sbt and one or more .fsa files.
  • The files are placed in a source directory and a series of command-line arguments are used to generate the .sqn files.
  • Tbl2asn will generate a .sqn for every the .fsa file in the directory, plus any of the corresponding optional files that may be present. The other files must have the same file name prefix as their corresponding .fsa. (for example helicase.fsa and helicase.tbl).

  • Command Line Arguments

    -pPath to the directory. If files are in the current directory -p. should be used.
    -rPath for the resulting .sqn submission file (if the -r argument is not given the .sqn files will be saved in the source directory).
    -tSpecifies the template file (.sbt). If the .sbt file is in a different directory the full path must be specified.
    -sInstructs tbl2asn to read multiple FASTA components in one file as a set of unrelated sequences. This creates a single file of multiple submissions. (1000 sequences per file is the maximum.)
    -cInstructs tbl2asn to annotate the longest open reading frame (orf) if a .tbl file is not provided.
    -mAllows alternative start codons to be used in orf searches.
    -vValidates the data records. The output is saved to files with a .val suffix.
    -bGenerates GenBank flatfiles with a .gbf suffix.
    -iCreates single submission from indicated .fsa file in a directory of multiple .fsa files.
    -jAllows the addition of source qualifiers that will be the same for each submission. Example: -j "[organism=Saccharomyces cerevisiae] [strain=S288C]".
    -yAdds a COMMENT to each submission. Example: -y "Contigs larger than 2kb have been annotated, representing approx. 87% of the total genome".
    -oCreates a single submission from multiple fasta files.
    -lReads one or more FASTA+GAP alignments to create one or more phylogenetic sets.

    Note: When using -c to annotate the longest orf, the product name is entered as unknown unless one is provided in the .fsa file definition line.

    Note: Please review the .val files and correct any error level errors.

    Examples:

  • Single submission: one sequence per .fsa file
      tbl2asn -t template.sbt -p path_to_files -v
  • Batch submission: multiple sequences per .fsa file
      tbl2asn -t template.sbt -p path_to_files -s -v
  • Single submission: one .fsa file in directory of multiple .fsa files
      tbls2asn -t template.sbt -i x.fsa -v
    blue bulletNucleotide sequence and FASTA defline formats (.fsa)

  • No size limit on nucleotide sequence.
  • FASTA file should consist of a single definition line beginning with a '>'.
  • Minimum requirements for the FASTA defline are:
    • SeqID (sequence identifier) which is the text between the '>' and the first space. The SeqID cannot begin with 'contig', as this identifier is reserved for another use and will generate errors.
    • Organism and related information
  • Optional defline information that may be included is:
      Biological
      • strain [strain=S288C]
      • isolate [isolate=CWS1]
      • chromosome [chromosome=XVI]
      Other elements
      • topology [topology=circular]
      • location [location=mitochondrion]
      • molecule [moltype=RNA] (DNA is the default)
      • technique [tech=wgs]
      • protein name [product=helicase] (if using -c)

    Click on Source and/or Organism for the complete list of modifiers.

    Example FASTA:

    >Sc_16 [organism=Saccharomyces cerevisiae]
    tataggcgaatcgagtatattattttttctcaacatatgtat
    atgaacatgagaatatatttataggaatgtataaaattgtga
    cctctcctgctattttagttactgattttatgtatgtagggg
    gaataggggctgcctttcttaatgcagttttaattttttctt
    ttaattttttcttagtaaaattatttaaagtaaagattaatg
    gaataaccattgcgcttttttttacagtttttggtttttcat
    tttttggaaaaaatattttaaatattttacctttttatttag
    ggggtattttatatagtatctatacttcaacagatttttctg
    aacatatagttcctattgctttttcaagtgcattagcccctt
    ttgtaagcagtgttgcttttatggagaaatatcctatgaaac
    atcatatataaattttaattggtattttaattggttttatag
    tggttcctttgtctaaaagtctttatgactttcatgagggat
    atgattttatataatttaggttttacagcaggtttag
    

    blue bulletFeature table format (.tbl)

    Tbl2asn reads features from a five-column tab-delimited table called a Feature table . The feature table specifies the location and type of each feature. Tbl2asn will process the feature intervals and translate any CDSs into proteins. The first line of the table should contain the following information:

    >Features SeqID table_name
    

    The SeqID must match the nucleotide sequence SeqID in the corresponding .fsa file.

    Example Feature Table:

    >Feature Sc_16 Table1
    69      543    gene
                            gene       sde3p
    69      543    CDS
                            product SDE3P
                            protein_id     WS1030
                            

    blue bulletProtein sequence format (.pep)

    • Set up as a FASTA file using the protein sequence.
    • This file will substitute the automatically translated products of the CDS features with the provided protein sequences.
    • Serves as a check that the conceptual translation of the nucleotide sequence is as predicted.
    • SeqID must match protein_id in the .tbl file

    Example FASTA:

    >WS1030 [gene=sde3p] [protein=SDE3P]
    MYKIVTSPAILVTDFMYVGGIGAAFLNAVLIFSFNFFL
    VKLFKVKINGITIAAFFTVFGFSFFGKNILNILPFYLG
    GILYSIYTSTDFSEHIVPIAFSSALAPFVSSVAFYGEI
    SYETSYINAILIGILIGFIVVPLSKSLYDFHEGYDLYN
    LGFTAG
    

    blue bulletQuality scores table format (.qvl)

  • Provides Phrap/Consed quality scores.
  • Generates Seq-graph data that will be included with the nucleotide sequence of the .fsa file in the final .sqn file
  • Sequin will display these in graphical view

    >Sc_16
    51 63 70 82 82 82 90 90 90 90 86 86 86 86 86 86 90 90 90 90 90 86 86 78...

    Disclaimer     Privacy statement

    Revised: March 12, 2004.