NCBI Logo Sequin FAQs
Sequin Entrez BLAST OMIM Taxonomy Structure

 

  1. Network-aware Sequin
  2. Indicating organism names for phylogenetic studies
  3. Adding titles to sets of sequences
  4. Annotating selenocysteines
  5. Library problems under Solaris
  6. Adding modifiers (e.g., strain, chromosome, cell-type) to biological source information
  7. Appending a reference to a feature
  8. Propagating features
  9. Export versus Save
  10. Problems importing sequences
  11. Automatic Definition Line Generation
  12. Submitting HTG sequences
  13. Submitting complete genomes

  1. How do I change between the stand-alone and network-aware modes of Sequin?
  2. There are two ways to change between the stand-alone and network-aware modes of Sequin. (1) When you launch the Sequin program, you will see a menu called Misc on the Welcome to Sequin form. Select Net Configure under this menu. (2) If you are already running Sequin, select the option under the Sequin Misc menu called Net Configure. In either case, Sequin will prompt you to set certain preferences and will then run a network configuration program. In most cases, the default preferences are sufficient. To switch Sequin back into its stand-alone mode, select the Net Configure option again. You must restart Sequin before any changes to the network mode take effect. For additional information, see the Sequin help documentation under Net Configure.

    List of FAQs


  3. I am submitting a set of sequences as part of a phylogenetic study. Each sequence comes from a different organism. How do I indicate which sequence comes from which organism?
  4. There are two ways to indicate the organism. First, you can encode the organism name directly into the file that contains the nucleotide sequence. Second, you can indicate the organism name on the Source Modifiers form which appears after the Organism and Sequences form.

    For either method, your sequences must be in FASTA, FASTA+GAP, PHYLIP, NEXUS Contiguous, or NEXUS Interleaved format. FASTA format is used for single unaligned sequences. It consists of a "definition line" followed by lines of sequence. FASTA+GAP, PHYLIP, and NEXUS formats are used for sets of aligned sequences. FASTA+GAP format is similar to FASTA format, except that gaps, indicated by a "-", are allowed. PHYLIP and NEXUS format are generated by certain sequence analysis packages.

    To encode the organism name into the that which contains the nucleotide sequence:

    If your sequences are in FASTA or FASTA+GAP format, insert the phrase [org=organism scientific name], such as [org=Mus musculus] or [org=Drosophila melanogaster], in the definition line of each sequence. The definition line is the line starting with a ">" character, which immediately precedes your sequence. Use the scientific name of the organism, but don't use abbreviations such as D. melanogaster. The first word immediately following the ">" character is the SeqId, a unique identifier that you provide for your sequence.

    If your sequences are in PHYLIP or NEXUS format, you must create a FASTA-style definition line for each sequence. Place this definition line containing the [org=organism scientific name] phrase at the bottom of the PHYLIP or NEXUS file.

    You can also encode other modifiers in the definition line. Another FAQ discusses how to add modifiers (e.g., strain, chromosome, cell-type) to biological source information. Here are some examples:

    Sample file containing sequences in FASTA format:
    >dna1 [org=Mus musculus] [strain=A] GGGGGGGGGGAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTCCCCCCCCCCCCCGGGGGGGGGGG AAAAAAATTTTTTTTTTTTTCCCCCCCCCCCCCC >dna2 [org=Drosophila melanogaster] [strain=B] GGGGGTGGGGAAAAAAAAAAAAAAATTTTTTTATTTTTTTTCCCCGGCCCCCCCGGGGGGGGGGG AAAAAAATTTTTTTTATTTTCCCCACCCCCCCCC >dna3 [org=Saccharomyces cerevisiae] [strain=C] GGGGGGCGGGAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTCCCCCGCCCCCCCGGGGGGGGGGG AAAAAAATTTTTTTTCTTTTCCCCCCCCCCCCCC

    Sample file containing sequences in PHYLIP format:
    3 100 ABC-1 GGGGGGGGGG AAAAAAAAAA AAAAATTTTT TTTTTTTTTT TCCCCCCCCC ABC-2 GGGGGTGGGG AAAAAAAAAA AAAAATTTTT TTATTTTTTT TCCCCGGCCC ABC-3 GGGGGGCGGG AAAAAAAAAA AAAAATTTTT TTTTTTTTTT TCCCCCGCCC CCCCGGGGGG GGGGGAAAAA AATTTTTTTT TTTTTCCCCC CCCCCCCCCC CCCCGGGGGG GGGGGAAAAA AATTTTTTTT ATTTTCCCCA CCCCCCCCCC CCCCGGGGGG GGGGGAAAAA AATTTTTTTT CTTTTCCCCC CCCCCCCCCC >[org=Mus musculus] [strain=A] >[org=Drosophila melanogaster] [strain=B] >[org=Saccharomyces cerevisiae] [strain=C]

    Sample file containing sequences in NEXUS Interleaved format:
    #NEXUS [!This data assembled using Sequencher*, from Gene Codes Corporation.] begin data; dimensions ntax=3 nchar=100; format datatype=dna gap=: interleave; matrix 3 100 ABC-1 GGGGGGGGGG AAAAAAAAAA AAAAATTTTT TTTTTTTTTT TCCCCCCCCC ABC-2 GGGGGTGGGG AAAAAAAAAA AAAAATTTTT TTATTTTTTT TCCCCGGCCC ABC-3 GGGGGGCGGG AAAAAAAAAA AAAAATTTTT TTTTTTTTTT TCCCCCGCCC ABC-1 CCCCGGGGGG GGGGGAAAAA AATTTTTTTT TTTTTCCCCC CCCCCCCCCC ABC-2 CCCCGGGGGG GGGGGAAAAA AATTTTTTTT ATTTTCCCCA CCCCCCCCCC ABC-3 CCCCGGGGGG GGGGGAAAAA AATTTTTTTT CTTTTCCCCC CCCCCCCCCC >[org=Mus musculus] [strain=A] >[org=Drosophila melanogaster] [strain=B] >[org=Saccharomyces cerevisiae] [strain=C]

    Sample file containing sequences in NEXUS Contiguous format:
    #NEXUS BEGIN DATA; DIMENSIONS NTAX=3 NCHAR=100; FORMAT MISSING=? GAP=- DATATYPE=DNA ; MATRIX ABC-1 GGGGGGGGGGAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTCCCCCCCCCCCCCGGGGGGGGGGG AAAAAAATTTTTTTTTTTTTCCCCCCCCCCCCCC ABC-2 GGGGGTGGGGAAAAAAAAAAAAAAATTTTTTTATTTTTTTTCCCCGGCCCCCCCGGGGGGGGGGG AAAAAAATTTTTTTTATTTTCCCCACCCCCCCCC ABC-3 GGGGGGCGGGAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTCCCCCGCCCCCCCGGGGGGGGGGG AAAAAAATTTTTTTTCTTTTCCCCCCCCCCCCCC >[org=Mus musculus] [strain=A] >[org=Drosophila melanogaster] [strain=B] >[org=Saccharomyces cerevisiae] [strain=C]

    To enter the organism name on the Source Modifiers form:

    Although each sequence should have a SeqID, a unique identifier, you do not need to add any additional information. After you fill out the Organism and Sequences form, you will see the Source Modifiers form. From the top pop-up menu, choose the modifier you want to annotate, in this case, Organism. The left column lists the sequences by their SeqID. Type the scientific organism name for each sequence in the corresponding box labelled Value. Do not use abbreviations such as D. melanogaster.

    You can also add additional optional modifiers on this page, such as strain or chromosome. Another FAQ discusses how to add modifiers (e.g., strain, chromosome, cell-type) to biological source information.

    Sample file containing sequences in FASTA format:
    >dna1 GGGGGGGGGGAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTCCCCCCCCCCCCCGGGGGGGGGGG AAAAAAATTTTTTTTTTTTTCCCCCCCCCCCCCC >dna2 GGGGGTGGGGAAAAAAAAAAAAAAATTTTTTTATTTTTTTTCCCCGGCCCCCCCGGGGGGGGGGG AAAAAAATTTTTTTTATTTTCCCCACCCCCCCCC >dna3 GGGGGGCGGGAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTCCCCCGCCCCCCCGGGGGGGGGGG AAAAAAATTTTTTTTCTTTTCCCCCCCCCCCCCC

    Sample file containing sequences in PHYLIP format:
    3 100 ABC-1 GGGGGGGGGG AAAAAAAAAA AAAAATTTTT TTTTTTTTTT TCCCCCCCCC ABC-2 GGGGGTGGGG AAAAAAAAAA AAAAATTTTT TTATTTTTTT TCCCCGGCCC ABC-3 GGGGGGCGGG AAAAAAAAAA AAAAATTTTT TTTTTTTTTT TCCCCCGCCC CCCCGGGGGG GGGGGAAAAA AATTTTTTTT TTTTTCCCCC CCCCCCCCCC CCCCGGGGGG GGGGGAAAAA AATTTTTTTT ATTTTCCCCA CCCCCCCCCC CCCCGGGGGG GGGGGAAAAA AATTTTTTTT CTTTTCCCCC CCCCCCCCCC

    List of FAQs


  5. I am submitting a large set of sequences. I want each sequence to have the same title, but I don't want to add all the titles to all the definition lines by hand. Can Sequin add titles to sequences automatically?
  6. There is no need to add a title to the definition line for each sequence. On the Annotation Page, which is part of the Organism and Sequences form, you can add the title that you would like to apply to all the sequences. The title should start with the name of the organism. If your sequences all come from different organisms, you can instruct Sequin to prefix the title with the organism name.

    Examples of sequence titles are

    Arabidopsis thaliana pyruvate dehydrogenase E1 alpha subunit mRNA, complete cds; nuclear gene for mitochondrial protein.

    Bos taurus retinal pigment (RPE1) mRNA, 3' UTR.

    Ophraella conferta 16S ribosomal RNA gene, partial sequence; mitochondrial.

    Sequin can also create titles automatically from information you provide in the record. See the FAQ on the Automatic Definition Line Generation, below.

    For additional information on formatting or adding titles, see the help documentation for the Nucleotide Definition Line or the Annotation Page, respectively.

    List of FAQs


  7. How do I indicate the position of a selenocysteine residue?
  8. Open the Coding Region feature form by double clicking on the CDS in the record viewer. Select the Exceptions subpage of the Coding Region page by clicking on the appropriate folder tabs. In the box labeled Position, indicate, with a single number, the amino acid location of the selenocysteine. Select Selenocysteine from the Amino Acid pop-up menu, and click on Accept.

    List of FAQs


  9. I am running Sequin under Solaris, and it won't start without a library named libresolv.so.2. I do not have this library on my machine.
  10. The library libresolv.so.2 is a security patch issued by Sun. Your system administrator should be able to install it. Alternatively, the library called libresolv.so.1 can substitute for libresolv.so.2. Copying libresolv.so.1 to libresolv.so.2 by typing

    cp libresolv.so.1 libresolv.so.2

    will also solve the problem.

    List of FAQs


  11. I would like to add some additional information about the biological source from which the sequence is derived. What modifiers can I include, and how should I annotate them on the sequence?
  12. You can add biological source information that describes the organism or source from which the sequence was derived. If you are submitting a single sequence, you must encode the information directly in the definition line. If you are submitting multiple sequences as part of a phylogenetic, population, or mutation study, you can either encode the information directly in the definition line or enter it on the Source Modifiers form, which follows the Organism and Sequences form.

    The file containing your sequences may be in FASTA, FAST+GAP, PHYLIP, or NEXUS Interleaved, or NEXUS Contiguous format. FASTA format is used for single, unaligned sequences. It consists of a "definition line" followed by lines of sequence. FASTA+GAP, PHYLIP, and NEXUS formats are used for sets of aligned sequences. FASTA+GAP format is similar to FASTA format, except that gaps, indicated by a "-", are allowed. PHYLIP and NEXUS formats are generated by certain sequence analysis packages.

    To encode biological source information into the definition line:

    The definition line is the line starting with a ">" character. The modifiers, such as [strain=BALB/c] or [chromosome=2], should be placed in brackets and should be placed ahead of the sequence title. Two other FAQs, Adding titles to sets of sequences and Automatic Definition Line Generation describe how to add titles to your sequences.

    For example, if your sequences are in FASTA or FASTA+GAP format, you could indicate the name of the organism, as well as the strain, chromosome, and cell type:

    Sample file containing sequences in FASTA format:
    >dna1 [org=Mus musculus] [strain=BALB/c] [chromosome=2] [cell-type=leukocyte] Mus musculus example (XMP) mRNA, partial cds GGGGGGGGGGAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTCCCCCCCCCCCCCGGGGGGGGGGG AAAAAAATTTTTTTTTTTTTCCCCCCCCCCCCCC >dna2 [org=Rattus norvegicus] [strain=Sprague-Dawley] [chromosome=5] [cell-type=macrophage] Rattus norvegicus example (XMP) mRNA, partial cds GGGGGTGGGGAAAAAAAAAAAAAAATTTTTTTATTTTTTTTCCCCGGCCCCCCCGGGGGGGGGGG AAAAAAATTTTTTTTATTTTCCCCACCCCCCCCC

    You can include the same modifiers at the very end of your PHYLIP or NEXUS file:

    Sample file containing sequences in PHYLIP format:
    2 100 ABC-1 GGGGGGGGGG AAAAAAAAAA AAAAATTTTT TTTTTTTTTT TCCCCCCCCC ABC-2 GGGGGTGGGG AAAAAAAAAA AAAAATTTTT TTATTTTTTT TCCCCGGCCC CCCCGGGGGG GGGGGAAAAA AATTTTTTTT TTTTTCCCCC CCCCCCCCCC CCCCGGGGGG GGGGGAAAAA AATTTTTTTT ATTTTCCCCA CCCCCCCCCC >[org=Mus musculus] [strain=BALB/c] [chromosome=2] [cell-type=leukocyte] Mus musculus example (XMP) mRNA, partial cds >[org=Rattus norvegicus] [strain=Sprague-Dawley] [chromosome=5] [cell-type=macrophage] Rattus norvegicus example (XMP) mRNA, partial cds

    To enter biological source information on the Source Modifiers form:

    You will only have access to this form if you are submitting a set of sequences as part of a phylogenetic, population, or mutation study. You do not need to include any information about the biological source on the definition line. Rather, from the top pop-up menu on the Source Modifiers form, choose the modifier you want to add. The left column lists the sequences by their SeqID, or the unique identifier which you provided for your sequence. Type the modifier for each sequence in the corresponding box labeled Value. For example, if you select the Strain modifier, you might type BALB/c in the first Value box, Sprague-Dawley in the second, etc.

    A FASTA file might look like this:

    >dna1 GGGGGGGGGGAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTCCCCCCCCCCCCCGGGGGGGGGGG AAAAAAATTTTTTTTTTTTTCCCCCCCCCCCCCC >dna2 GGGGGTGGGGAAAAAAAAAAAAAAATTTTTTTATTTTTTTTCCCCGGCCCCCCCGGGGGGGGGGG AAAAAAATTTTTTTTATTTTCCCCACCCCCCCCC

    A PHYLIP file might look like this:

    2 100 ABC-1 GGGGGGGGGG AAAAAAAAAA AAAAATTTTT TTTTTTTTTT TCCCCCCCCC ABC-2 GGGGGTGGGG AAAAAAAAAA AAAAATTTTT TTATTTTTTT TCCCCGGCCC CCCCGGGGGG GGGGGAAAAA AATTTTTTTT TTTTTCCCCC CCCCCCCCCC CCCCGGGGGG GGGGGAAAAA AATTTTTTTT ATTTTCCCCA CCCCCCCCCC

    A complete list of modifiers is available from the Sequin help document Source and Organism subpage sections. Sequences located in the mitochondrion or chloroplast need a modifier of [location=mitochondrion] or [location=chloroplast].

    If you are including an amino acid translation of a nucleotide sequence, we suggest that you include the gene and protein names in the FASTA definition line. No other modifiers may be included in protein definition lines. For example,

    >aa1 [gene=XMP] [prot=example] Mus musculus example (XMP) protein, partial sequence GGGKKKKKFFFFFSPPPPGGGEKNFFFFPPPPP >aa2 [gene=XMP] [prot=example] Rattus norvegicus example (XMP) protein, partial sequence GVGKKKKKFFYFFSPAPPGGGEKNFFYFPHPPP

    List of FAQs


  13. How do I append a publication reference to a single feature, e.g., a CDS?
  14. There are two types of publications in a database record, publication descriptors and publication features. Publication descriptors refer to the entire sequence; publication features refer to a part of a sequence. Most publications should be descriptors since they refer to the characterization of the sequence in the record.

    To add a publication feature, under the Sequin Misc menu, click on Create New Publication -- > Publication Feature. Then fill in the information for the citation. If you are referencing a published citation, this process is much easier if you have made Sequin network aware). If Sequin is network aware, go to the Journal page, and fill out the PMID (PubMed identifier). Click on Lookup By PMID, and the pages will be filled out automatically. PMIDs can be found by looking up the citation in Entrez. Alternatively, enter the Journal,Volume, Pages, and Year. Then select "Lookup Article". Sequin will retrieve the missing Title and Authors information. If the citation does not yet have a PMID, or if you are not running Sequin in its network-aware mode, fill out the Title, Authors, and Journal pages by hand.

    List of FAQs


  15. I have an alignment of multiple sequences. I would like to annotate the same feature (such as rRNA) on all the sequences. Is there an easy way to do this with Sequin?
  16. You can use Sequin to propagate any kind of feature from one sequence to a complete set.

    1. Import the set of sequences in a pre-aligned format, for example, in PHYLIP, NEXUS, or gapped FASTA format.
    2. After you import the sequences, Sequin will open a window (the record viewer) showing the GenBank flatfile format of the first sequence.
    3. Annotate the desired feature(s) on the first sequence by choosing the appropriate item from the Annotate menu and entering the base span in the Location page of the dialog box that appears.
    4. Under the Edit Menu in the record viewer, select Feature Propagate
    5. A pop-up box will appear in which you can select the features to be propagated. You can also specify whether the features will be extended or split at gaps in the alignment. The split at gaps selection will produce two features, one on either side of the gap within the alignment. If you are propagating a CDS feature, you may specify that the translation end or extend through internal stop codons. You may also extend the translation after the stop codon on the source entry by chosing to translate the CDS after partial 3' boundary. If the CDS that you are propagating to other records is partial on either end, you should select the 'Cleanup CDS partials after propagation' check box. This will retain the partial nature of the CDS features on all records.

    List of FAQs


  17. What is the difference between the Export and Save functions under the Sequin File menu?
  18. The Sequin Save and Save As functions save the record so that it can be re-opened by Sequin. The record is saved in a format called ASN.1, a data description language used by the NCBI. Be sure to save the record before you exit Sequin or you will lose all of your work. The ASN.1 format is also the format into which the file is saved when you prepare your your record for submission (by clicking the Done button on the record viewer or selecting Prepare Submission under the File menu). The database staff use the ASN.1 to build your sequence submission.

    The Sequin Export function exports the current view of the record. The information that is exported depends on the option that is selected in the Display Format pop-up menu. In Sequence display format, Sequin will export a text file that shows the sequence and any annotated features, such as a CDS or mRNA. In GenBank or EMBL display format, Sequin will export a copy of the record as it would appear in GenBank or EMBL format, respectively. In FASTA display format, Sequin will export the DNA sequence in FASTA format. In ASN.1 display format, Sequin will export a copy of the record in ASN.1.

    Note that Exporting the record is not the same as Saving. A file that has been created by Exporting cannot be re-opened by Sequin and should not be submitted to the database.

    List of FAQs


  19. I've formatted my sequence as you suggest, but I can't import it into Sequin. What could be wrong?
  20. If there is no line break (carriage return) between the definition line and the first line of sequence, you will not be able to import the sequence. Some word processors will break a single line into two lines without actually adding a carriage return. In this case, although the definition line and sequence will appear to be on two different lines, they really are on a single line. If you are unsure whether there is a carriage return, you can either set up your word processor so that it shows invisible characters such as carriage returns, or view the file in a text editor that does not create artificial line breaks.

    List of FAQs


  21. How can I create a sequence title automatically from within Sequin?
  22. The sequence title is displayed in the DEFINITION field of the GenBank flatfile format. See the example below:

    
    LOCUS       AMU12345      426 bp    DNA             MAM       07-MAY-1999
    DEFINITION  Aepyceros melampus isolate am5 D-loop, partial
                sequence; mitochondrial  <--------- sequence title
    ACCESSION   U12345
    

    With Sequin, titles can be added to sequences in several ways:

    1. Titles can be encoded in the sequence data file. All text that follows the source indicators (e.g., [org=Mus musculus] [strain=BALB/c], which are extracted) becomes the sequence title. See the "Before You Begin: Preparing Nucleotide and Amino Acid Data" section of the "Sequin Quick Guide" for more details.
    2. When entering a population/phylogenetic/mutation study (a set of related sequences), the Annotation page allows you to assign a title to each sequence and gives you the option of automatically prefixing each title with the appropriate organism name. See Adding titles to sets of Sequences above for more details.
    3. The simplest method is to use the Annotate->Generate Definition Line menu item. This creates titles based on the source and feature annotations you have made on the record, and conforms to GenBank style guidelines for definition lines.

    Submitters should choose only one way to include title information in their sequence records.

    If you choose to generate an automatic title, you must add certain data to the sequence record (i.e., biological source and features). Sequin will use that data to generate the title (DEFINITION field).

    Follow these steps to "Generate Definition Line" using the Sequin Annotate Menu:

    1. Create your Sequin record and import your nucleotide data.
    2. Annotate the biological features.
    3. Select Generate Definition Line from the Annotate menu.

    The automatic title generated by Sequin will follow GenBank conventions but may be modified by the database staff if it is not appropriate. The title you enter here will replace any title you entered elsewhere in the submission, for example, any title that was attached to the nucleotide sequence in the original data file.

    An existing sequence title (i.e., DEFINITION field) can be edited by double clicking on it and making changes in the window that pops up.

    List of FAQs


  23. How can I use Sequin to prepare and submit HTG sequences?
  24. Sequin also allows genome centers to prepare HTG submissions. Sequin reads in a FASTA sequence file (or an Ace Contig file with Phrap sequence quality values) and a Sequin submission template file. To learn how to use Sequin for your HTG submissions, see the instructions on the HTG Web pages.

    List of FAQs


  25. How do I submit complete genomes or other large sequence records?
  26. The tbl2asn function is packaged with Sequin and can be used for single or multiple submissions. More detailed instructions about using this function are provided. A five-column, tab-delimited table of feature locations and qualifiers is necessary for annotation using tbl2asn. When submitting a complete bacterial genome, please review the genome guidelines.

    List of FAQs


 

Questions or Comments?
Write to the NCBI Service Desk

Revised June 10, 2004.