Sequin FAQs |
Sequin | Entrez | BLAST | OMIM | Taxonomy | Structure |
There are two ways to change between the stand-alone and network-aware modes of Sequin. (1) When you launch the Sequin program, you will see a menu called Misc on the Welcome to Sequin form. Select Net Configure under this menu. (2) If you are already running Sequin, select the option under the Sequin Misc menu called Net Configure. In either case, Sequin will prompt you to set certain preferences and will then run a network configuration program. In most cases, the default preferences are sufficient. To switch Sequin back into its stand-alone mode, select the Net Configure option again. You must restart Sequin before any changes to the network mode take effect. For additional information, see the Sequin help documentation under Net Configure.
There are two ways to indicate the organism. First, you can encode the organism name directly into the file that contains the nucleotide sequence. Second, you can indicate the organism name on the Source Modifiers form which appears after the Organism and Sequences form.
For either method, your sequences must be in FASTA, FASTA+GAP, PHYLIP, NEXUS Contiguous, or NEXUS Interleaved format. FASTA format is used for single unaligned sequences. It consists of a "definition line" followed by lines of sequence. FASTA+GAP, PHYLIP, and NEXUS formats are used for sets of aligned sequences. FASTA+GAP format is similar to FASTA format, except that gaps, indicated by a "-", are allowed. PHYLIP and NEXUS format are generated by certain sequence analysis packages.
To encode the organism name into the that which contains the nucleotide sequence:
If your sequences are in FASTA or FASTA+GAP format, insert the phrase [org=organism scientific name], such as [org=Mus musculus] or [org=Drosophila melanogaster], in the definition line of each sequence. The definition line is the line starting with a ">" character, which immediately precedes your sequence. Use the scientific name of the organism, but don't use abbreviations such as D. melanogaster. The first word immediately following the ">" character is the SeqId, a unique identifier that you provide for your sequence.
If your sequences are in PHYLIP or NEXUS format, you must create a FASTA-style definition line for each sequence. Place this definition line containing the [org=organism scientific name] phrase at the bottom of the PHYLIP or NEXUS file.
You can also encode other modifiers in the definition line. Another FAQ discusses how to add modifiers (e.g., strain, chromosome, cell-type) to biological source information. Here are some examples:
To enter the organism name on the Source Modifiers form:
Although each sequence should have a SeqID, a unique identifier, you do not need to add any additional information. After you fill out the Organism and Sequences form, you will see the Source Modifiers form. From the top pop-up menu, choose the modifier you want to annotate, in this case, Organism. The left column lists the sequences by their SeqID. Type the scientific organism name for each sequence in the corresponding box labelled Value. Do not use abbreviations such as D. melanogaster.
You can also add additional optional modifiers on this page, such as strain or chromosome. Another FAQ discusses how to add modifiers (e.g., strain, chromosome, cell-type) to biological source information.
There is no need to add a title to the definition line for each sequence. On the Annotation Page, which is part of the Organism and Sequences form, you can add the title that you would like to apply to all the sequences. The title should start with the name of the organism. If your sequences all come from different organisms, you can instruct Sequin to prefix the title with the organism name.
Sequin can also create titles automatically from information you provide in the record. See the FAQ on the Automatic Definition Line Generation, below.
For additional information on formatting or adding titles, see the help documentation for the Nucleotide Definition Line or the Annotation Page, respectively.
Open the Coding Region feature form by double clicking on the CDS in the record viewer. Select the Exceptions subpage of the Coding Region page by clicking on the appropriate folder tabs. In the box labeled Position, indicate, with a single number, the amino acid location of the selenocysteine. Select Selenocysteine from the Amino Acid pop-up menu, and click on Accept.
The library libresolv.so.2 is a security patch issued by Sun. Your system administrator should be able to install it. Alternatively, the library called libresolv.so.1 can substitute for libresolv.so.2. Copying libresolv.so.1 to libresolv.so.2 by typing
cp libresolv.so.1 libresolv.so.2
will also solve the problem.
You can add biological source information that describes the organism or source from which the sequence was derived. If you are submitting a single sequence, you must encode the information directly in the definition line. If you are submitting multiple sequences as part of a phylogenetic, population, or mutation study, you can either encode the information directly in the definition line or enter it on the Source Modifiers form, which follows the Organism and Sequences form.
The file containing your sequences may be in FASTA, FAST+GAP, PHYLIP, or NEXUS Interleaved, or NEXUS Contiguous format. FASTA format is used for single, unaligned sequences. It consists of a "definition line" followed by lines of sequence. FASTA+GAP, PHYLIP, and NEXUS formats are used for sets of aligned sequences. FASTA+GAP format is similar to FASTA format, except that gaps, indicated by a "-", are allowed. PHYLIP and NEXUS formats are generated by certain sequence analysis packages.
To encode biological source information into the definition line:
The definition line is the line starting with a ">" character. The modifiers, such as [strain=BALB/c] or [chromosome=2], should be placed in brackets and should be placed ahead of the sequence title. Two other FAQs, Adding titles to sets of sequences and Automatic Definition Line Generation describe how to add titles to your sequences.
For example, if your sequences are in FASTA or FASTA+GAP format, you could indicate the name of the organism, as well as the strain, chromosome, and cell type:
You can include the same modifiers at the very end of your PHYLIP or NEXUS file:
To enter biological source information on the Source Modifiers form:
You will only have access to this form if you are submitting a set of sequences as part of a phylogenetic, population, or mutation study. You do not need to include any information about the biological source on the definition line. Rather, from the top pop-up menu on the Source Modifiers form, choose the modifier you want to add. The left column lists the sequences by their SeqID, or the unique identifier which you provided for your sequence. Type the modifier for each sequence in the corresponding box labeled Value. For example, if you select the Strain modifier, you might type BALB/c in the first Value box, Sprague-Dawley in the second, etc.
A FASTA file might look like this:
A PHYLIP file might look like this:
A complete list of modifiers is available from the Sequin help document Source and Organism subpage sections. Sequences located in the mitochondrion or chloroplast need a modifier of [location=mitochondrion] or [location=chloroplast].
If you are including an amino acid translation of a nucleotide sequence, we suggest that you include the gene and protein names in the FASTA definition line. No other modifiers may be included in protein definition lines. For example,
There are two types of publications in a database record, publication descriptors and publication features. Publication descriptors refer to the entire sequence; publication features refer to a part of a sequence. Most publications should be descriptors since they refer to the characterization of the sequence in the record.
To add a publication feature, under the Sequin Misc menu, click on Create New Publication -- > Publication Feature. Then fill in the information for the citation. If you are referencing a published citation, this process is much easier if you have made Sequin network aware). If Sequin is network aware, go to the Journal page, and fill out the PMID (PubMed identifier). Click on Lookup By PMID, and the pages will be filled out automatically. PMIDs can be found by looking up the citation in Entrez. Alternatively, enter the Journal,Volume, Pages, and Year. Then select "Lookup Article". Sequin will retrieve the missing Title and Authors information. If the citation does not yet have a PMID, or if you are not running Sequin in its network-aware mode, fill out the Title, Authors, and Journal pages by hand.
You can use Sequin to propagate any kind of feature from one sequence to a complete set.
The Sequin Save and Save As functions save the record so that it can be re-opened by Sequin. The record is saved in a format called ASN.1, a data description language used by the NCBI. Be sure to save the record before you exit Sequin or you will lose all of your work. The ASN.1 format is also the format into which the file is saved when you prepare your your record for submission (by clicking the Done button on the record viewer or selecting Prepare Submission under the File menu). The database staff use the ASN.1 to build your sequence submission.
The Sequin Export function exports the current view of the record. The information that is exported depends on the option that is selected in the Display Format pop-up menu. In Sequence display format, Sequin will export a text file that shows the sequence and any annotated features, such as a CDS or mRNA. In GenBank or EMBL display format, Sequin will export a copy of the record as it would appear in GenBank or EMBL format, respectively. In FASTA display format, Sequin will export the DNA sequence in FASTA format. In ASN.1 display format, Sequin will export a copy of the record in ASN.1.
Note that Exporting the record is not the same as Saving. A file that has been created by Exporting cannot be re-opened by Sequin and should not be submitted to the database.
If there is no line break (carriage return) between the definition line and the first line of sequence, you will not be able to import the sequence. Some word processors will break a single line into two lines without actually adding a carriage return. In this case, although the definition line and sequence will appear to be on two different lines, they really are on a single line. If you are unsure whether there is a carriage return, you can either set up your word processor so that it shows invisible characters such as carriage returns, or view the file in a text editor that does not create artificial line breaks.
The sequence title is displayed in the DEFINITION field of the GenBank flatfile format. See the example below:
LOCUS AMU12345 426 bp DNA MAM 07-MAY-1999 DEFINITION Aepyceros melampus isolate am5 D-loop, partial sequence; mitochondrial <--------- sequence title ACCESSION U12345
Submitters should choose only one way to include title information in their sequence records.
If you choose to generate an automatic title, you must add certain data to the sequence record (i.e., biological source and features). Sequin will use that data to generate the title (DEFINITION field).
Follow these steps to "Generate Definition Line" using the Sequin Annotate Menu:
The automatic title generated by Sequin will follow GenBank conventions but may be modified by the database staff if it is not appropriate. The title you enter here will replace any title you entered elsewhere in the submission, for example, any title that was attached to the nucleotide sequence in the original data file.
An existing sequence title (i.e., DEFINITION field) can be edited by double clicking on it and making changes in the window that pops up.
Sequin also allows genome centers to prepare HTG submissions. Sequin reads in a FASTA sequence file (or an Ace Contig file with Phrap sequence quality values) and a Sequin submission template file. To learn how to use Sequin for your HTG submissions, see the instructions on the HTG Web pages.
The tbl2asn function is packaged with Sequin and can be used for single or multiple submissions. More detailed instructions about using this function are provided. A five-column, tab-delimited table of feature locations and qualifiers is necessary for annotation using tbl2asn. When submitting a complete bacterial genome, please review the genome guidelines.
Questions or Comments?
Write to the NCBI Service Desk
Revised June 10, 2004.