The DDBJ/EMBL/GenBank Feature Table: Definition

Version 6.2 Oct 15 2004 DNA Data Bank of Japan, Mishima, Japan. EMBL Nucleotide Sequence Database, Cambridge, UK. GenBank, NCBI, Bethesda, MD, USA. 1 Introduction 2 Overview of the Feature Table format 2.1 Format Design 2.2 Key aspects of this Feature Table design 2.3 Feature Table Terminology 3 Feature Table components and format 3.1 Naming conventions 3.2 Feature keys 3.2.1 Purpose 3.2.2 Format and conventions 3.2.3 Key groups and hierarchy 3.2.4 Feature key examples 3.3 Qualifiers 3.3.1 Purpose 3.3.2 Format and conventions 3.3.4 Qualifier examples 3.4 Feature labels 3.4.1 Purpose 3.4.2 Format and conventions 3.4.3 Examples of feature labels 3.5 Location 3.5.1 Purpose 3.5.2 Format and conventions 4 Feature table Format 4.1 Format examples 4.2 Definition of line types 4.3 Data item positions 4.4 Use of blanks 5 Examples of sequence annotation 5.1 Eukaryotic gene 5.2 Bacterial operon 5.3 Artificial cloning vector (circular) 5.4 Plasmid 5.5 Repeat element 5.6 Immunoglobulin heavy chain 5.7 T-cell receptor 5.8 transfer RNA 6 Limitations of this feature table design 7. Appendices 7.1 Appendix I EMBL, GenBank and DDBJ entries 7.1.1 EMBL Format 7.1.2 GenBank Format 7.1.3 DDBJ Format 7.2 Appendix II Feature table: Backus-Naur form 7.3 Appendix III: Feature keys reference 7.3.1 Feature key relationship tree 7.3.2 Feature key reference manual 7.4 Appendix IV: Summary of qualifiers for feature keys 7.4.1 Qualifier List 7.4.2 Feature qualifiers - mapped to Feature keys 7.5 Appendix V: Controlled vocabularies 7.5.1 Nucleotide base codes (IUPAC) 7.5.2 Modified base abbreviations 7.5.3 Amino acid abbreviations 7.5.4 Modified and unusual Amino Acids 7.5.5 Genetic Code Tables 7.5.6 Country Names 1 Introduction Nucleic acid sequences provide the fundamental starting point for describing and understanding the structure, function, and development of genetically diverse organisms. The GenBank, EMBL, and DDBJ nucleic acid sequence data banks have from their inception used tables of sites and features to describe the roles and locations of higher order sequence domains and elements within the genome of an organism. In February, 1986, GenBank and EMBL began a collaborative effort (joined by DDBJ in 1987) to devise a common feature table format and common standards for annotation practice. 2 Overview of the Feature Table format The overall goal of the feature table design is to provide an extensive vocabulary for describing features in a flexible framework for manipulating them. The Feature Table documentation represents the shared rules that allow the three databases to exchange data on a daily basis. The range of features to be represented is diverse, including regions which: * perform a biological function, * affect or are the result of the expression of a biological function, * interact with other molecules, * affect replication of a sequence, * affect or are the result of recombination of different sequences, * are a recognizable repeated unit, * have secondary or tertiary structure, * exhibit variation, or have been revised or corrected. 2.1 Format Design The format design is based on a tabular approach and consists of the following items: Feature key a single word or abbreviation indicating functional group Location instructions for finding the feature Qualifiers auxiliary information about a feature 2.2 Key aspects of this Feature Table design * Feature keys allow specific annotation of important sequence features. * Related features can be easily specified and retrieved. Feature keys are arranged hierarchically, allowing complex and compound features to be expressed. Both location operators and the feature keys show feature relationships even when the features are not contiguous. The hierarchy of feature keys allows broad categories of biological functionality, such as rRNAs, to be easily retrieved. * Generic feature keys provide a means for entering new or undefined features. A number of "generic" or miscellaneous feature keys have been added to permit annotation of features that cannot be adequately described by existing feature keys. These generic feature keys will serve as an intermediate step in the identification and addition of new feature keys. The syntax has been designed to allow the addition of new feature keys as they are required. * More complex locations (fuzzy and alternate ends, for example) can be specified. Each end point of a feature may be specified as a single point, an alternate set of possible end points, a base number beyond which the end point lies, or a region which contains the end point. * Features can be combined and manipulated in many different ways. The location field can contain operators or functional descriptors specifying what must be done to the sequence to reproduce the feature. For example, a series of exons may be "join"ed into a full coding sequence. * Standardized qualifiers provide precision and parsibility of descriptive details A combination of standardized qualifiers and their controlled-vocabulary values enable free-text descriptions to be avoided * The nature of supporting evidence for a feature can be explicitly indicated. Features, such as open reading frames or sequences showing sequence similarity to consensus sequences, for which there is no direct experimental evidence can be annotated. Therefore, the feature table can incorporate contributions from researchers doing computational analysis of the sequence databases. However, all features that are supported by experimental data will be clearly marked as such. * The table syntax has been designed to be machine parsible. A consistent syntax allows machine extraction and manipulation of sequences coding for all features in the table. 2.3 Feature Table Terminology The format and wording in the feature table use common biological research terminology whenever possible. For example, an item in the feature table such as: Key Location/Qualifiers CDS 23..400 /product="alcohol dehydrogenase" /gene="adhI" might be read as: The feature CDS is a coding sequence beginning at base 23 and ending at base 400, has a product called 'alcohol dehydrogenase' and is coded for by a gene called adhI' A more complex description: Key Location/Qualifiers CDS join(544..589,688..>1032) /product="T-cell receptor beta-chain" which might be read as: This feature, which is a partial coding sequence is formed by joining elements indicated to form one contiguous sequence encoding a product called T-cell receptor beta-chain. The following sections contain detailed explanations of the feature table design showing conventions for each component of the feature table, examples of how the format might be implemented, a description of the exact column placement of all the data items and examples of complete sequence entries that have been annotated using the new format. The last section of this document describes known limitations of the current feature table design. Appendix I gives an example database entry for the DDBJ, GenBank and EMBL formats. Appendix II describes the format in Backus-Naur-Form (BNF). Appendices III and IV provide reference manuals for the feature table keys and qualifiers, respectively. Appendix V includes controlled vocabularies such as nucleotide base codes, modified base abbreviations, genetic code tables etc. This document defines the syntax and vocabulary of the feature table. The syntax is sufficiently flexible to allow expression of a single biological entity in numerous ways. In such cases, the annotation staffs at the databases will propose conventions for standard means of denoting the entities. This feature table format is shared by GenBank, EMBL and DDBJ. Comments, corrections, and suggestions may be submitted to any of the database staffs. New format specifications will be added as needed. 3 Feature Table components and format 3.1 Naming conventions Feature table components, including feature keys, qualifiers, accession numbers, database name abbreviations, feature labels, and location operators, are all named following the same conventions. Component names may be no more than 20 characters long (Feature keys 15, Feature qualifiers 20) and must contain at least one letter. Case should not be regarded as significant in comparing feature labels ('Prot1' and 'pROT1' are the same. The following characters are permitted to occur in feature table component names: * Upper-case letters (A-Z) * Lower-case letters (a-z) Numbers (0-9) * Underscore (_) * Hyphen (-) * Single quotation mark or apostrophe (') * Asterisk (*) 3.2 Feature keys 3.2.1 Purpose Feature keys indicate (1) the biological nature of the annotated feature or (2) information about changes to or other versions of the sequence. The feature key permits a user to quickly find or retrieve similar features or features with related functions. 3.2.2 Format and conventions There is a defined list of allowable feature keys which is shown in Appendix III. Each feature must contain a feature key. Features created solely as location references should use a single hyphen "-" as their feature key. 3.2.3 Key groups and hierarchy The feature keys fall into families which are in some sense similar in function and which are annotated in a similar manner. A functional family may have a "generic" or miscellaneous key, which can be recognized by the 'misc.' prefix, that can used for instances not covered by the other defined keys of that group. The feature key groups are listed below with a short definition and an annotation example: 1. Difference and change features Indicate ways in which a sequence should be changed to produce a different " version": misc_difference location /replace="change_location" 2. Expression signal features Indicate regions containing a signal that alters a biological function: misc_signal location 3. Transcript features Indicate products made by a region: misc_RNA location 4. Binding features Indicate that a sequence or nucleotide is covalently, non-covalently or otherwise bound to something else: misc_binding location /bound_moiety="bound molecule" 5. Repeat features Indicate repetitive sequence elements: repeat_region location 6. Recombination features Indicate regions that have been either inserted or deleted by recombination: misc_recomb location 7. Structure features Indicate sequence for which there is secondary or tertiary structural information: misc_structure location In addition to the functional groupings shown above, the feature keys can also be arranged in a hierarchical tree based on the degree of specificity or level of detail known about a feature. This hierarchy is shown in outline form in Appendix III where the most general level is the 'misc_feature' key and other keys are arranged in increasing level of detail. By using more general keys, features can be annotated even if their biological functions are insufficiently well characterized to assign them more specific keys. 3.2.4 Feature key examples Key Description CDS Protein-coding sequence RBS ribosome binding site rep_origin Origin of replication protein_bind Protein binding site on DNA tRNA mature transfer RNA See Appendix III for descriptions of all feature keys. 3.3 Qualifiers 3.3.1 Purpose Qualifiers provide a general mechanism for supplying information about features in addition to that conveyed by the key and location. 3.3.2 Format and conventions Qualifiers take the form of a slash (/) followed by the qualifier name and, if applicable, an equal sign (=) and a value. Each qualifier should have a single value; if multiple values are necessary, these should be represented by iterating the same qualifier, eg: Key Location/Qualifiers CDS 1..1000 /codon=(seq:"cug",aa:Ser) /codon=(seq:"tga",aa:Trp) If the location descriptor does not need a continuation line, the first qualifier begins a new line in the feature location column. If the location descriptor requires a continuation line, the first qualifier may follow immediately after the location. Any necessary continuation lines begin in the same column. See Section 4 for a complete description of data item positions. 3.3.3 Qualifier values Since qualifiers convey many different types of information, there are several value formats: 1. Free text 2. Controlled vocabulary or enumerated values 3. Citation or reference numbers 4. Sequences 5. Feature labels 3.3.3.1 Free text Most qualifier values will be a descriptive text phrase which must be enclosed in double quotation marks. When the text occupies more than one line, a single set of quotation marks are required at the beginning and at the end of the text. The text itself may be composed of any printable characters (ASCII values 32-126 decimal). If double quotation marks are used within a free text string, each set (") must be 'escaped' by placing a second double quotation mark immediately before it (""). For example: /note="This is an example of ""escaped"" quotation marks" 3.3.3.2 Controlled vocabulary or enumerated values Some qualifiers require values from a controlled vocabulary and are entered without quotation marks. For example, the '/direction' qualifier has only three values: 'left', 'right' or 'both'. Qualifier value controlled vocabularies, like feature table component names, must be treated as completely case insensitive: they may be entered and displayed in any combination of upper and lower case ('/direction=Left' '/direction=left' and '/direction=LEFT' are all legal and all convey the same meaning). The database staffs reserve the right to regularize the case of qualifier values in the interest of readability, unlike the case of feature labels where the databases will maintain the case as originally entered (see Section 3.4.2). Qualifier value controlled vocabularies will be maintained by the cooperating database staffs. Examples of controlled vocabularies can be found in Appendices IV and V. The database staff should be contacted for the current lists. 3.3.3.3 Citation or reference numbers The citation or published reference number (as enumerated in the entry 'REFERENCE' or 'RN' data item) should be enclosed in square brackets (e.g., [3]) to distinguish it from other numbers. 3.3.3.4 Sequences Literal sequence of nucleotide bases e.g., join(12..45,"atgcatt",988..1050) in location descriptors has become illegal starting from implementation of version 2.1 of the Feature Table Definition Document (December 15, 1998) 3.3.4 Qualifier examples Key Location/Qualifiers source 1..1509 /organism="Mus musculus" /strain="CD1" /mol_type="genomic DNA" promoter <1..9 /gene="ubc42" mRNA join(10..567,789..1320) /gene="ubc42" CDS join(54..567,789..1254) /gene="ubc42" /product="ubiquitin conjugating enzyme" /function="cell division control" CDS 109..564 /usedin=X10009:catalase 3.4 Feature labels The /label= qualifier takes as its value a feature label. Feature labels follow the same naming conventions as other feature table components (e.g., keys and qualifiers). While feature labels are optional, attaching a label to a feature allows it to be referred to unambiguously. For example, the feature label can be used to refer unambiguously to a coding region that exists in a different entry to the exons of which it is comprised." 3.4.1 Purpose The feature label identifies a feature item within an entry and, when combined with the entry's primary accession number and the name of the database from which it came, is a permanent internationally unique tag for that feature. There are, however, certain situations in which a "permanent" feature may "disappear" from the distributed version of the database and others in which it may be desirable to change a feature's label. 3.4.2 Format and conventions Each feature in a feature table may have a label which must be unique within that entry, but which may be the same as feature labels used in other entries. A feature can be given any label. However, labels containing meaningful abbreviations will be much more easily remembered than non- descriptive labels. Because letter case is not significant, two features within one entry cannot have labels that differ only in case: '16S_rRNA' and '16s_rRNA' could not both be used in the same entry. The full feature name syntax is as follows: Database name::primary accession number:feature label References to a feature should use as much of the full feature name as required to unambiguously identify the feature. 3.4.3 Examples of feature labels Feature label Description adhI adhI gene coding for alcohol dehydrogenase tfp35 tail fiber protein 35 3'-ltr long terminal repeat a1col_x51 prepro-alpha-1-collagen, exon 51 X10045:diff1 first conflict for the sequence of entry X10045 GB::K10675:catexA feature with label catexA in entry K10675 of the GenBank databank 3.5 Location 3.5.1 Purpose The location indicates the region of the presented sequence which corresponds to a feature. 3.5.2 Format and conventions The location contains at least one sequence location descriptor and may contain one or more operators with one or more sequence location descriptors. Base numbers refer to the numbering in the entry. This numbering, which is not necessarily the same as the numbering scheme used in the published report cited, designates the first base (5' end) of the presented sequence as base 1. Base locations beyond the range of the presented sequence may not be used in location descriptors. Location operators and descriptors are discussed in more detail below. 3.5.2.1 Location descriptors The location descriptor can be one of the following: (a) a single base number (b) a site between two indicated base numbers (c) a single base chosen from within a specified range of bases (d) the base numbers delimiting a sequence span (e) a remote entry identifier followed by a local location descriptor (i.e., a-d) A site between two points (nucleotides), such as endonucleolytic cleavage site, is indicated by listing the two points separated by a carat (^). A single base chosen from a range or span of bases is indicated by the first base number and the last base number of the range separated by a single period (e.g., '12.21' indicates a single base taken from between the indicated points). Sequence spans are indicated by the starting base number and the ending base number separated by two periods (e.g., '34..456'). The '<' and '>' symbols may be used with the starting and ending base numbers to indicate that an end point is beyond (and does not include) the specified base number. The starting and ending base positions can be represented as distinct base numbers ('34..456') or as alternatives specified by an operator. A single point chosen from a range of points uses the 'x.y' format described above. A location in a remote entry (not the entry to which the feature table belongs) can be specified by giving the remote entry (accession-number) followed by a location descriptor which applies to that entry's sequence. 3.5.2.2 Operators The location operator is a prefix that specifies what must be done to the indicated sequence to find or construct the location corresponding to the feature. A list of allowable operators is given below with their definitions and most common format. complement(location) Find the complement of the presented sequence in the span specified by " location" (i.e., read the complement of the presented strand in its 5'-to-3' direction) join(location,location, ... location) The indicated elements should be joined (placed end-to-end) to form one contiguous sequence order(location,location, ... location) The elements can be found in the specified order (5' to 3' direction), but nothing is implied about the reasonableness about joining them 3.5.3 Location examples The following is a list of common location descriptors with their meanings: Location Description 467 Points to a single base in the presented sequence 340..565 Points to a continuous range of bases bounded by and including the starting and ending bases <345..500 Indicates that the exact lower boundary point of a feature is unknown. The location begins at some base previous to the first base specified (which need not be contained in the presented sequence) and con- tinues to and includes the ending base <1..888 The feature starts before the first sequenced base and continues to and includes base 888 (102.110) Indicates that the exact location is unknown but that it is one of the bases between bases 102 and 110, in- clusive (23.45)..600 Specifies that the starting point is one of the bases between bases 23 and 45, inclusive, and the end point is base 600 (122.133)..(204.221) The feature starts at a base between 122 and 133, inclusive, and ends at a base between 204 and 221, inclusive 123^124 Points to a site between bases 123 and 124 145^177 Points to a site between two adjacent bases anywhere between bases 145 and 177 join(12..78,134..202) Regions 12 to 78 and 134 to 202 should be joined to form one contiguous sequence complement(join(2691..4571,4918..5163) Joins regions 2691 to 4571 and 4918 to 5163, then complements the joined segments (the feature is on the strand complementary to the presented strand) join(complement(4918..5163),complement(2691..4571)) Complements regions 4918 to 5163 and 2691 to 4571, then joins the complemented segments (the feature is on the strand complementary to the presented strand) complement(34..(122.126)) Start at one of the bases complementary to those between 122 and 126 on the presented strand and finish at the base complementary to base 34 (the feature is on the strand complementary to the presented strand) J00194:100..202 Points to bases 100 to 202, inclusive, in the entry (in this database) with primary accession number 'J00194' 4 Feature table Format The examples below show the preferred sequence annotations for a number of commonly occurring sequence types. These examples may not be appropriate in all cases but should be used as a guide whenever possible. This section describes the columnar format used to write this feature table in "flat-file" form for distributions of the database. 4.1 Format examples Feature table format example (EMBL): source 1..1859 /db_xref="taxon:3899" /organism="Trifolium repens" /tissue_type="leaves" /clone_lib="lambda gt10" /clone="TRE361" /mol_type="mRNA" CDS 14..1495 /db_xref="MENDEL:11000" /db_xref="SWISS-PROT:P26204" /note="non-cyanogenic" /EC_number="3.2.1.21" /product="beta-glucosidase" /protein_id="CAA40058.1" /translation="MDFIVAIFALFVISSFTITSTNAVEASTLLDIGNLSR....... ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Feature table format example (GenBank): source 1..8959 /organism="Homo sapiens" /db_xref="taxon:9606" /mol_type="genomic DNA" gene 212..8668 /gene="NF1" CDS 212..8668 /gene="NF1" /note="putative" /codon_start=1 /product="GAP-related protein" /protein_id="AAA59924.1" /translation="MAAHRPVEWVQAVVSRFDEQLPIKTGQQNTHTKVSTE....... ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Feature table format example (DDBJ): source 1..2136 /clone="pK28" /organism="Rattus norvegicus" /strain="Sprague-Dawley" /tissue_type="kidney" /mol_type="genomic DNA" mRNA 19..2128 CDS 31..1212 /codon_start=1 /evidence=not_experimental /function="Dual specificity protein tyrosine/threonine kinase" /product="MAP kinase kinase" /protein_id="BAA02603.1" /translation="MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKL....... ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 4.2 Definition of line types The feature table consists of a header line, which contains the column titles for the table, and the individual feature entries. Each feature entry is composed of a feature descriptor line and qualifier and continuation lines, if needed. The feature descriptor line contains the feature's name, key, and location. If the location cannot be contained on the first line of the feature descriptor, it is continued on a continuation line immediately following the descriptor line. If the feature requires further attributes, feature qualifier lines immediately follow the corresponding feature descriptor line (or its continuation). Qualifier information that cannot be contained on one line continues on the following continuation lines as necessary. Thus, there are 4 types of feature table lines: Line type Content #/entry #/feature --------- ------- ------- --------- Header Column titles 1* N/A Feature descriptor Key and location 1 to many* 1 Feature qualifiers Qualifiers and values N/A 0 to many Continuation lines Feature descriptor or 0 to many 0 to many qualifier continuation 4.3 Data item positions The position of the data items within the feature descriptor line is as follows: column position data item --------------- --------- 1-5 blank 6-20 feature key 21 blank 22-80 location Data on the qualifier and continuation lines begins in column position 22 (the first 21 columns contain blanks). The EMBL format for all lines differs from the GenBank / DDBJ formats that it includes a line type abbreviation in columns 1 and 2. 4.4 Use of blanks Blanks (spaces) may, in general, be used within the feature location and qualifier values to make the construction more readable. The following rules should be observed: * Names of feature table components may not contain blanks (see Section 3.1) * Operator names may not be separated from the following open parenthesis (the beginning of the operand list) by blanks. * Qualifiers may not be separated from the preceding slash or the following equals sign (if one) by blanks 5 Examples of sequence annotation The examples below show the preferred sequence annotations for a number of commonly occurring sequence types. These examples may not be appropriate in all cases but should be used as a guide whenever possible. 5.1 Eukaryotic gene source 1..1509 /organism="Mus musculus" /strain="CD1" /mol_type="genomic DNA" promoter <1..9 /gene="ubc42" mRNA join(10..567,789..1320) /gene="ubc42" CDS join(54..567,789..1254) /gene="ubc42" /product="ubiquitin conjugating enzyme" /function="cell division control" /translation="MVSSFLLAEYKNLIVNPSEHFKISVNEDNLTEGPPDTLY QKIDTVLLSVISLLNEPNPDSPANVDAAKSYRKYLYKEDLESYPMEKSLDECS AEDIEYFKNVPVNVLPVPSDDYEDEEMEDGTYILTYDDEDEEEDEEMDDE" exon 10..567 /gene="ubc42" /number=1 intron 568..788 /gene="ubc42" /number=1 exon 789..1320 /gene="ubc42" /number=2 polyA_signal 1310..1317 /gene="ubc42" 5.2 Bacterial operon source 1..9430 /organism="Lactococcus sp." /strain="MG1234" /mol_type="genomic DNA" operon 160..6865 /operon="gal" -35_signal 160..165 /operon="gal" /evidence=EXPERIMENTAL -10_signal 179..184 /operon="gal" /evidence=EXPERIMENTAL CDS 405..1934 /operon="gal" /gene="galA" /product="galactose permease" /function="galactose transporter" /evidence=EXPERIMENTAL CDS 2003..3001 /operon="gal" /gene="galM" /product="aldose 1-epimerase" /EC_number="5.1.3.3" /function="mutarotase" CDS 3235..4537 /operon="gal" /gene="galK" /product="galactokinase" /EC_number="2.7.1.6" /evidence=EXPERIMENTAL mRNA 189..6865 /operon="gal" /evidence=EXPERIMENTAL 5.3 Artificial cloning vector (circular) source 1..5300 /organism="Cloning vector pABC" /lab_host="Escherichia coli" /mol_type="other DNA" /focus source 1..5138 /organism="Escherichia coli" /mol_type="other DNA" /strain="K12" source 5139..5247 /organism="Aequorea victoria" /mol_type="other DNA" /dev_stage="adult" source 5248..5300 /organism="Escherichia coli" /mol_type="other DNA" /strain="K12" CDS join(complement(<1..799),complement(5080..5120)) /gene="mob1" /product="mobilization protein 1" CDS complement(1697..2512) /gene="Km" /product="kanamycin resistance protein" CDS 3037..3711 /gene="rep1" /product="replication protein 1" CDS complement(4170..4829) /gene="Cm" /product="chloramphenicol resistance protein" CDS 5139..5247 /gene="GFP" /product="green fluorescent protein" 5.4 Plasmid source 1..2245 /organism="Escherichia coli" /plasmid="Plasmid XYZ" /strain="K12" /mol_type="genomic DNA" rep_origin 6 /direction=LEFT /note="ori" CDS join(complement(567..795),complement(21..349)) /gene="trbC" /product="transfer protein C" CDS 803..1344 /gene="traN" /product="transfer protein N" CDS 1559..1985 /gene="incA /product="incompatability protein A" CDS join(2004..2195,3..20) /gene="finP" /product="fertility inhibition protein P" 5.5 Repeat element source 1..1011 /organism="Homo sapiens" /clone="pha281u/1DO" /mol_type="genomic DNA" repeat_region 80..401 /rpt_type=DISPERSED /rpt_family="Alu-J" /rpt_unit=80..401 5.6 Immunoglobulin heavy chain source 1..321 /organism="Mus musculus" /strain="BALB/c2 /cell_line="hybridoma 1A4" /rearranged /mol_type="mRNA" CDS <1..>321 /codon_start=1 /gene="VFM1-DFL16.1-JH4" /product="immunoglobulin heavy chain" V_region 1..277 /gene="VFM1" /product="immunoglobulin heavy chain variable region" 5.7 T-cell receptor source 1..402 /organism="Homo sapiens" /sex="male" /cell_type="CD4+ T-lymphocyte" /rearranged /clone="TCR1A.12" /mol_type="mRNA" sig_peptide 1..54 /gene="TCR1A" CDS 1..402 /gene="TCR1A" /product="T-cell receptor alpha chain" mat_peptide 55..399 /gene="TCR1A" /product="T-cell receptor alpha chain" V_region 55..327 /gene="TCR1A" J_segment 328..393 /gene="TCR1A" C_region 394..399 /gene="TCR1A" 5.8 transfer RNA source 1..2345 /organism="Yersinia sp." /strain="IP134" /mol_type="genomic DNA" -35_signal 644..650 /gene="tRNA-Leu(UUR)" tRNA 655..730 /gene="tRNA-Leu(UUR)" /anticodon=(pos:678..680,aa:Leu) /product="transfer RNA-Leu(UUR)" 6 Limitations of this feature table design During the development of the feature table design numerous choices between simplicity and representational power had to be made. In order to create a design which was capable of representing the most common features of biological significance, a certain degree of complexity in the syntax was guaranteed. However, to limit that level of complexity, certain limitations of the design syntax have been accepted. 7. Appendices 7.1 Appendix I EMBL, GenBank and DDBJ entries 7.1.1 EMBL Format ID LISOD standard; genomic DNA; PRO; 756 BP. XX AC X64011; S78972; XX SV X64011.1 XX DT 28-APR-1992 (Rel. 31, Created) DT 30-JUN-1993 (Rel. 36, Last updated, Version 6) XX DE Listeria ivanovii sod gene for superoxide dismutase XX KW sod gene; superoxide dismutase. XX OS Listeria ivanovii OC Bacteria; Firmicutes; Bacillus/Clostridium group; OC Bacillus/Staphylococcus group; Listeria. XX RN [1] RX MEDLINE; 92140371. RA Haas A., Goebel W.; RT "Cloning of a superoxide dismutase gene from Listeria ivanovii by RT functional complementation in Escherichia coli and characterization of the RT gene product."; RL Mol. Gen. Genet. 231:313-322(1992). XX RN [2] RP 1-756 RA Kreft J.; RT ; RL Submitted (21-APR-1992) to the EMBL/GenBank/DDBJ databases. RL J. Kreft, Institut f. Mikrobiologie, Universitaet Wuerzburg, Biozentrum Am RL Hubland, 8700 Wuerzburg, FRG XX DR SWISS-PROT; P28763; SODM_LISIV. XX FH Key Location/Qualifiers FH FT source 1..756 FT /db_xref="taxon:1638" FT /organism="Listeria ivanovii" FT /strain="ATCC 19119" FT /mol_type="genomic DNA" FT RBS 95..100 FT /gene="sod" FT terminator 723..746 FT /gene="sod" FT CDS 109..717 FT /db_xref="SWISS-PROT:P28763" FT /transl_table=11 FT /gene="sod" FT /EC_number="1.15.1.1" FT /product="superoxide dismutase" FT /protein_id="CAA45406.1" FT /translation="MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKLNEAVSG FT HAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPNGGGAPTGNLKAA FT IESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVSTANQDSPLSEGKTPVLGL FT DVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRFDAAK" XX SQ Sequence 756 BP; 247 A; 136 C; 151 G; 222 T; 0 other; cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat 60 gtaatttctt .......... 120 // 7.1.2 GenBank Format LOCUS LISOD 756 bp DNA linear BCT 30-JUN-1993 DEFINITION Listeria ivanovii sod gene for superoxide dismutase. ACCESSION X64011 S78972 VERSION X64011.1 GI:44010 KEYWORDS sod gene; superoxide dismutase. SOURCE Listeria ivanovii ORGANISM Listeria ivanovii Bacteria; Firmicutes; Bacillales; Listeriaceae; Listeria. REFERENCE 1 (bases 1 to 756) AUTHORS Haas,A. and Goebel,W. TITLE Cloning of a superoxide dismutase gene from Listeria ivanovii by functional complementation in Escherichia coli and characterization of the gene product JOURNAL Mol. Gen. Genet. 231 (2), 313-322 (1992) MEDLINE 92140371 REFERENCE 2 (bases 1 to 756) AUTHORS Kreft,J. TITLE Direct Submission JOURNAL Submitted (21-APR-1992) J. Kreft, Institut f. Mikrobiologie, Universitaet Wuerzburg, Biozentrum Am Hubland, 8700 Wuerzburg, FRG FEATURES Location/Qualifiers source 1..756 /organism="Listeria ivanovii" /strain="ATCC 19119" /db_xref="taxon:1638" /mol_type="genomic DNA" RBS 95..100 /gene="sod" gene 95..746 /gene="sod" CDS 109..717 /gene="sod" /EC_number="1.15.1.1" /codon_start=1 /transl_table=11 /product="superoxide dismutase" /db_xref="GI:44011" /protein_id="CAA45406.1" /db_xref="SWISS-PROT:P28763" /translation="MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKLNEAVS GHAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPNGGGAPTGNLK AAIESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVSTANQDSPLSEGKTPV LGLDVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRFDAAK" terminator 723..746 /gene="sod" ORIGIN 1 cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat 61 gtaatttctt .......... // 7.1.3 DDBJ Format LOCUS LISOD 756 bp DNA linear BCT 30-JUN-1993 DEFINITION Listeria ivanovii sod gene for superoxide dismutase. ACCESSION X64011 S78972 VERSION X64011.1 GI:44010 KEYWORDS sod gene; superoxide dismutase. SOURCE Listeria ivanovii ORGANISM Listeria ivanovii Bacteria; Firmicutes; Bacillales; Listeriaceae; Listeria. REFERENCE 1 (bases 1 to 756) AUTHORS Haas,A. and Goebel,W. TITLE Cloning of a superoxide dismutase gene from Listeria ivanovii by functional complementation in Escherichia coli and characterization of the gene product JOURNAL Mol. Gen. Genet. 231 (2), 313-322 (1992) MEDLINE 92140371 REFERENCE 2 (bases 1 to 756) AUTHORS Kreft,J. TITLE Direct Submission JOURNAL Submitted (21-APR-1992) J. Kreft, Institut f. Mikrobiologie, Universitaet Wuerzburg, Biozentrum Am Hubland, 8700 Wuerzburg, FRG FEATURES Location/Qualifiers source 1..756 /organism="Listeria ivanovii" /strain="ATCC 19119" /db_xref="taxon:1638" /mol_type="genomic DNA" RBS 95..100 /gene="sod" gene 95..746 /gene="sod" CDS 109..717 /gene="sod" /EC_number="1.15.1.1" /codon_start=1 /transl_table=11 /product="superoxide dismutase" /db_xref="GI:44011" /protein_id="CAA45406.1" /db_xref="SWISS-PROT:P28763" /translation="MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKLNEAVS GHAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPNGGGAPTGNLK AAIESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVSTANQDSPLSEGKTPV LGLDVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRFDAAK" terminator 723..746 /gene="sod" BASE COUNT 247 a 136 c 151 g 222 t ORIGIN 1 cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat 61 gtaatttctt .......... // 7.2 Appendix II Feature table: Backus-Naur form Feature table is a mandatory part of an entry. Full entry syntax is specified elsewhere. feature_table ::= <feature_table_header><feature_table_body> feature_table_header ::= FH Key Location/Qualifiers | FEATURES Location/Qualifiers feature_table_body ::= <feature> | <feature_table_body><feature> At least one feature is required. feature ::= <feature_key><feature_details> Key is required, location required, qualifier list optional feature_key ::= <symbol> | - feature_details ::= <location><qualifier_list> | <location> There exists a table of legal keys. "-" is a placeholder for no key. location ::= <absolute_location> | <feature_name> | <functional_operator>(<location_list>) absolute_location ::= <local_location> | <path> : <local_location> path ::= <database> :: <primary_accession> | <primary_accession> feature_name ::= <path>:<feature_label> | <feature_label> feature_label :== <symbol> local_location ::= <base_position> | <between_position> | <base_range> location_list ::= <location> | <location_list>,<location> functional_operator ::= <symbol> base_position ::= <integer> | <low_base_bound> | <high_base_bound> | <two_base_bound> low_base_bound ::= > <integer> high_base_bound ::= < <integer> two_base_bound ::= <base_position>.<base_position> between_position ::= <base_position>^<base_position> base_range ::= <base_position>..<base_position> database ::= <symbol> primary_accession ::= <symbol> sequence_character ::= a | b | c | d | g | h | k | m | n | r | s | t | u | v | w | y qualifier_list ::= <qualifier> | <qualifier_list><qualifier> qualifier ::= /<qualifier_name> | /<qualifier_name>=<value> qualifier_name ::= <symbol> value ::= <simple_value> | (<value_list>) | (<tagged_value_list>) simple_value ::= <integer> | <location> | <reference_number> | "<text_string>" | <symbol> value_list ::= <value> | <value_list>,<value> tagged_value_list ::= <tagged_value> | <tagged_value_list>,<tagged_value> tagged_value ::= <tag>:<value> tag ::= <symbol> reference_number ::= [ <unsigned_integer> ] symbol ::= <letter> | <symbol><symbol_character> | <symbol_character><symbol> text_string ::= <string_character>| <text_string><string_character> unsigned_integer ::= <digit> | <unsigned_integer><digit> integer ::= <unsigned_integer> | - <unsigned_integer> string_character ::= <letter> | <digit> | <punctuation> | "" symbol_character ::= <up_case_letter> | <low_case_letter> |<digit> | _ | - | ' | * letter ::= <up_case_letter> | <low_case_letter> up_case_letter ::= A | B| ... | Z low_case_letter ::= a | b | ... | z digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 punctuation ::= <space> | ! | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / | : | ; | < | = | > | ? | @ | [ | \ | ] | ^ | _ | ` | { | <bar> | } | ~ bar ::= | space ::= ascii 32 7.3 Appendix III: Feature keys reference 7.3.1 Feature key relationship tree A. misc_feature 1. misc_difference a) conflict b) unsure c) old_sequence d) variation e) modified_base 2. gene 3. misc_signal a) promoter 1) CAAT_signal 2) TATA_signal 3) -35_signal 4) -10_signal 5) GC_signal b) RBS c) polyA_signal d) enhancer e) attenuator f) terminator g) rep_origin h) oriT 4. misc_RNA a) prim_transcript 1) precursor_RNA a) mRNA b) 5'clip c) 3'clip d) 5'UTR e) 3'UTR f) exon g) CDS 1) sig_peptide 2) transit_peptide 3) mat_peptide h) intron i) polyA_site j) rRNA k) tRNA l) scRNA m) snRNA n) snoRNA 5. Immunogobulin related a) C_region b) D_segment c) J_segment d) N_region e) S_region f) V_region g) V_segment 6. repeat_region a) repeat_unit b) LTR c) satellite 7. misc_binding a) primer_bind b) protein_bind 8. misc_recomb a) iDNA 9. misc_structure a) stem_loop b) D-loop 10. gap 11. operon 7.3.2 Feature key reference manual The following manual has been organized according to the following format: Feature Key the feature key name Definition the definition of the key Mandatory qualifiers qualifiers required with the key; if there are no mandatory qualifiers, this field is omitted. Optional qualifiers optional qualifiers associated with the key Organism scope valid organisms for the key; if the scope is any organism, this field is omitted. Molecule scope valid molecule types; if the scope is any molecule type, this field is omitted. References citations of published reports, usually supporting the feature consensus sequence Comment comments and clarifications Abbreviations: accnum an entry primary accession number <amino_acid> abbreviation for amino acid <base_range> location descriptor for a simple range of bases <bool> Boolean truth value. Valid values are yes and no <evidence_value> value indicating the nature of supporting evidence. feature_label the feature label (follows naming conventions for all feature table components) <integer> unsigned integer value <location> general feature location descriptor <modified_base> abbreviation for modified nucleoside base [number] integer representing number of citation in entry's reference list <repeat_type> value indicating the organization of a repeated sequence. Currently valid values are tandem, inverted, flanking, terminal, direct, dispersed, and other "text" any text or character string. Since the string is delimited by double quotes, double quotes may only appear as part of the string if they appear in pairs. For example, the sentence: The feature label "ops-tata" is used with the "promotor" feature key would be formatted thus: "The feature label""ops-tata" " is used with the " "promoter" " feature key" Feature Key attenuator Definition 1) region of DNA at which regulation of termination of transcription occurs, which controls the expression of some bacterial operons; 2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /operon="text" /phenotype="text" /usedin=accnum:feature_label Organism scope prokaryotes Molecule scope DNA Feature Key C_region Definition constant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; includes one or more exons depending on the particular chain Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Parent Key CDS Organism scope eukaryotes Feature Key CAAT_signal Definition CAAT box; part of a conserved sequence located about 75 bp up-stream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG(C or T)CAATCT [1,2]. Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /usedin=accnum:feature_label Organism scope eukaryotes and eukaryotic viruses Molecule scope DNA References [1] Efstratiadis, A. et al. Cell 21, 653-668 (1980) [2] Nevins, J.R. "The pathway of eukaryotic mRNA formation" Ann Rev Biochem 52, 441-466 (1983) Feature Key CDS Definition coding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stop codon); feature includes amino acid conceptual translation. Optional qualifiers /allele="text" /citation=[number] /codon=(seq:"codon-sequence",aa:<amino_acid>) /codon_start=<1 or 2 or 3> /db_xref="<database>:<identifier>" /EC_number="text" /evidence=<evidence_value> /exception="text" /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /number=unquoted text (single token) /old_locus_tag="text" (single token) /operon="text" /product="text" /protein_id="<identifier>" /pseudo /standard_name="text" /translation="text" /transl_except=(pos:<base_range>,aa:<amino_acid>) /transl_table =<integer> /usedin=accnum:feature_label Comment /codon_start has valid value of 1 or 2 or 3, indicating the offset at which the first complete codon of a coding feature can be found, relative to the first base of that feature; /transl_table defines the genetic code table used if other than the universal genetic code table; genetic code exceptions outside the range of the specified tables are reported in /codon or /transl_except qualifiers /protein_id consists of a stable ID portion (3+5 format with 3 position letters and 5 numbers) plus a version number after the decimal point; when the protein sequence encoded by the CDS changes, only the version number of the /protein_id value is incremented; the stable part of the /protein_id remains unchanged and as a result will permanently be associated with a given protein; Feature Key conflict Definition independent determinations of the "same" sequence differ at this site or region; Mandatory qualifiers /citation=[number] Or /compare=[accession-number.sequence-version] Optional qualifiers /allele="text" /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /replace="text" /usedin=accnum:feature_label Comment use /replace="" to annotate deletion, e.g. conflict 4..5 /replace="" Feature Key D-loop Definition displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /usedin=accnum:feature_label Molecule scope DNA Feature Key D_segment Definition Diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Parent Key CDS Organism scope eukaryotes Feature Key enhancer Definition a cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter; Optional qualifiers /allele="text" /bound_moiety="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /label=feature_label /gene="text /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /standard_name="text" /usedin=accnum:feature_label Organism scope eukaryotes and eukaryotic viruses Feature Key exon Definition region of genome that codes for portion of spliced mRNA, rRNA and tRNA; may contain 5'UTR, all CDSs and 3' UTR; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /EC_number="text" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /number=unquoted text (single token) /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Feature key gap Definition gap in the sequence Mandatory qualifiers /estimated_length=unknown or <integer> Optional qualifiers /map="text" /note="text" Comment the location span of the gap feature for an unknown gap is 100 bp, with the 100 bp indicated as 100 "n"'s in the sequence. Where estimated length is indicated by an integer, this is indicated by the same number of "n"'s in the sequence. No upper or lower limit is set on the size of the gap. Feature Key GC_signal Definition GC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /usedin=accnum:feature_label Organism scope eukaryotes and eukaryotic viruses Feature Key gene Definition region of biological interest identified as a gene and for which a name has been assigned; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /operon="text" /product="text" /pseudo /phenotype="text" /standard_name="text" /usedin=accnum:feature_label Comment the gene feature describes the interval of DNA that corresponds to a genetic trait or phenotype; the feature is, by definition, not strictly bound to it's positions at the ends; it is meant to represent a region where the gene is located. Feature Key iDNA Definition intervening DNA; DNA which is eliminated through any of several kinds of recombination; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /number=unquoted text (single token) /old_locus_tag="text" (single token) /standard_name="text" /usedin=accnum:feature_label Molecule scope DNA Comment e.g., in the somatic processing of immunoglobulin genes. Feature Key intron Definition a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it; Optional qualifiers /allele="text" /citation=[number] /cons_splice=(5'site:<bool>,3'site:<bool>) /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /number=unquoted text (single token) /old_locus_tag="text" (single token) /standard_name="text" /usedin=accnum:feature_label Comment cons_splice is used only when one of the intron's splice sites does not match the GT...AG consensus. Feature Key J_segment Definition joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Parent Key CDS Organism scope eukaryotes Feature Key LTR Definition long terminal repeat, a sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /standard_name="text" /usedin=accnum:feature_label Feature Key mat_peptide Definition mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post-translational modification; the location does not include the stop codon (unlike the corresponding CDS); Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /EC_number="text" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Feature Key misc_binding Definition site in nucleic acid which covalently or non-covalently binds another moiety that cannot be described by any other binding key (primer_bind or protein_bind); Mandatory qualifiers /bound_moiety="text" Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /usedin=accnum:feature_label Comment note that the key RBS is used for ribosome binding sites Feature Key misc_difference Definition feature sequence is different from that presented in the entry and cannot be described by any other Difference key (conflict, unsure, old_sequence, variation, or modified_base); Optional qualifiers /allele="text" /citation=[number] /clone="text" /compare=[accession-number.sequence-version] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /phenotype="text" /replace="text" /standard_name="text" /usedin=accnum:feature_label Comment the misc_difference feature key should be used to describe variability that arises as a result of genetic manipulation (e.g. site directed mutagenesis); use /replace="" to annotate deletion, e.g. misc_difference 412..433 /replace="" Feature Key misc_feature Definition region of biological interest which cannot be described by any other feature key; a new or rare feature; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /number=unquoted text (single token) /old_locus_tag="text" (single token) /phenotype="text" /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Comment this key should not be used when the need is merely to mark a region in order to comment on it or to use it in another feature's location; use the '-' pseudo-key instead. Feature Key misc_recomb Definition site of any generalized, site-specific or replicative recombination event where there is a breakage and reunion of duplex DNA that cannot be described by other recombination keys or qualifiers of source key (/insertion_seq, /transposon, /proviral); Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /organism="text" /standard_name="text" /usedin=accnum:feature_label Molecule scope DNA Comment if no /organism is provided with misc_recomb, this suggests that only one organism (same as in SOURCE) is involved in the recombination event Feature Key misc_RNA Definition any transcript or RNA product that cannot be defined by other RNA keys (prim_transcript, precursor_RNA, mRNA, 5'clip, 3'clip, 5'UTR, 3'UTR, exon, CDS, sig_peptide, transit_peptide, mat_peptide, intron, polyA_site, rRNA, tRNA, scRNA, and snRNA); Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /operon="text" /product="text" /standard_name="text" /usedin=accnum:feature_label Feature Key misc_signal Definition any region containing a signal controlling or altering gene function or expression that cannot be described by other signal keys (promoter, CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS, polyA_signal, enhancer, attenuator, terminator, and rep_origin). Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /operon="text" /phenotype="text" /standard_name="text" /usedin=accnum:feature_label Feature Key misc_structure Definition any secondary or tertiary nucleotide structure or conformation that cannot be described by other Structure keys (stem_loop and D-loop); Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /standard_name="text" /usedin=accnum:feature_label Feature Key modified_base Definition the indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod_base qualifier value) Mandatory qualifiers /mod_base=<modified_base> Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /frequency="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /usedin=accnum:feature_label Comment value is limited to the restricted vocabulary for modified base abbreviations; Feature Key mRNA Definition messenger RNA; includes 5'untranslated region (5'UTR), coding sequences (CDS, exon) and 3'untranslated region (3'UTR); Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /operon="text" /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Feature Key N_region Definition extra nucleotides inserted between rearranged immunoglobulin segments. Optional qualifiers /allele="text" /citation=[number] /db_xref=":" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Parent Key CDS Organism scope eukaryotes Feature Key old_sequence Definition the presented sequence revises a previous version of the sequence at this location; Mandatory qualifiers /citation=[number] Or /compare=[accession-number.sequence-version] Optional qualifiers /allele="text" /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /replace="text" /usedin=accnum:feature_label Comment use /replace="" to annotate deletion, e.g. old_sequence 12..15 /replace="" Feature Key operon Definition region containing polycistronic transcript containing genes that encode enzymes that are in the same metabolic pathway and regulatory sequences Mandatory qualifiers /operon="text" Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /label=feature_label /map="text" /note="text" /phenotype="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Feature Key oriT Definition origin of transfer; region of a DNA molecule where transfer is initiated during the process of conjugation or mobilization Optional qualifiers /allele="text" /bound_moiety="text" /citation=[number] /db_xref="<database>:<identifier>" /direction=value /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /rpt_family="text" /rpt_type=<repeat_type> /rpt_unit="text" or <base_range> /standard_name="text" /usedin=accnum:feature_label Molecule Scope DNA Comments rep_origin should be used for origins of replication; /direction has legal values RIGHT, LEFT and BOTH, however only RIGHT and LEFT are valid when used in conjunction with the oriT feature; origins of transfer can be present in the chromosome; plasmids can contain multiple origins of transfer Feature Key polyA_signal Definition recognition region necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA [1]; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /usedin=accnum:feature_label Organism scope eukaryotes and eukaryotic viruses References [1] Proudfoot, N. and Brownlee, G.G. Nature 263, 211-214 (1976) Feature Key polyA_site Definition site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /usedin=accnum:feature_label Organism scope eukaryotes and eukaryotic viruses Feature Key precursor_RNA Definition any RNA species that is not yet the mature RNA product; may include 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip); Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /operon="text" /product="text" /standard_name="text" /old_locus_tag="text" (single token) /usedin=accnum:feature_label Comment used for RNA which may be the result of post- transcriptional processing; if the RNA in question is known not to have been processed, use the prim_transcript key. Feature Key prim_transcript Definition primary (initial, unprocessed) transcript; includes 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip); Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /operon="text" /standard_name="text" /usedin=accnum:feature_label Feature Key primer_bind Definition non-covalent primer binding site for initiation of replication, transcription, or reverse transcription; includes site(s) for synthetic e.g., PCR primer elements; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /standard_name="text" /PCR_conditions="text" /old_locus_tag="text" (single token) /usedin=accnum:feature_label Comment used to annotate the site on a given sequence to which a primer molecule binds - not intended to represent the sequence of the primer molecule itself; PCR components and reaction times may be stored under the "/PCR_conditions" qualifier; since PCR reactions most often involve pairs of primers, a single primer_bind key may use the order() operator with two locations, or a pair of primer_bind keys may be used. Feature Key promoter Definition region on a DNA molecule involved in RNA polymerase binding to initiate transcription; Optional qualifiers /allele="text" /bound_moiety="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /operon="text" /phenotype="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Molecule scope DNA Feature Key protein_bind Definition non-covalent protein binding site on nucleic acid; Mandatory qualifiers /bound_moiety="text" Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /standard_name="text" /usedin=accnum:feature_label Comment note that RBS is used for ribosome binding sites. Feature Key RBS Definition ribosome binding site; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /standard_name="text" /usedin=accnum:feature_label References [1] Shine, J. and Dalgarno, L. Proc Natl Acad Sci USA 71, 1342-1346 (1974) [2] Gold, L. et al. Ann Rev Microb 35, 365-403 (1981) Comment in prokaryotes, known as the Shine-Dalgarno sequence: is located 5 to 9 bases upstream of the initiation codon; consensus GGAGGT [1,2]. Feature Key repeat_region Definition region of genome containing repeating units; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /insertion_seq="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /rpt_family="text" /rpt_type=<repeat_type> /rpt_unit="text" or <base_range> /standard_name="text" /transposon="text" /usedin=accnum:feature_label Feature Key repeat_unit Definition single repeat element; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /rpt_family="text" /rpt_type=<repeat_type> /rpt_unit="text" or <base_range> /usedin=accnum:feature_label Comment preferred usage is to annotate the /rpt_family and rpt_type qualifiers on the repeat_region, not on the repeat_unit(s). Feature Key rep_origin Definition origin of replication; starting site for duplication of nucleic acid to give two identical copies; Optional Qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /direction=value /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /standard_name="text" /usedin=accnum:feature_label Comment /direction has valid values: RIGHT, LEFT, or BOTH. Feature Key rRNA Definition mature ribosomal RNA ; RNA component of the ribonucleoprotein particle (ribosome) which assembles amino acids into proteins. Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Comment rRNA sizes should be annotated with the /product Qualifier. Feature Key S_region Definition switch region of immunoglobulin heavy chains; involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin class from the same B-cell; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Parent Key misc_signal Organism scope eukaryotes Feature Key satellite Definition many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /rpt_type=<repeat_type> /rpt_family="text" /rpt_unit="text" or <base_range> /standard_name="text" /usedin=accnum:feature_label Molecule scope DNA Comment use the satellite key to identify the entire region of satellite sequence within an entry; use repeat_unit to identify individual repeated units (one is generally sufficient) of the satellite. Feature Key scRNA Definition small cytoplasmic RNA; any one of several small cytoplasmic RNA molecules present in the cytoplasm and (sometimes) nucleus of a eukaryote; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Feature Key sig_peptide Definition signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane leader sequence; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Feature Key snRNA Definition small nuclear RNA molecules involved in pre-mRNA splicing and processing Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Feature Key snoRNA Definition small nucleolar RNA molecules mostly involved in rRNA modification and processing; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Feature Key source Definition identifies the biological source of the specified span of the sequence; this key is mandatory; more than one source key per sequence is allowed; every entry/record will have, as a minimum, either a single source key spanning the entire sequence or multiple source keys which together span the entire sequence. Mandatory qualifiers /organism="text" /mol_type="genomic DNA", "genomic RNA", "mRNA", "tRNA", "rRNA", "snoRNA", "snRNA", "scRNA", "pre-RNA", "other RNA", "other DNA", "unassigned DNA", "unassigned RNA" Optional qualifiers /cell_line="text" /cell_type="text" /chromosome="text" /citation=[number] /clone="text" /clone_lib="text" /country="<country_value>[:<region>][, <locality>]" /cultivar="text" /db_xref="<database>:<identifier>" /dev_stage="text" /ecotype="text" /environmental_sample /focus /frequency="text" /germline /haplotype="text" /lab_host="text" /isolate="text" /isolation_source="text" /label=feature_label /macronuclear /map="text" /note="text" /organelle=<organelle_value> /plasmid="text" /pop_variant="text" /proviral /rearranged /segment="text" /serotype="text" /serovar="text" /sex="text" /specimen_voucher="text" /specific_host="text" /strain="text" /sub_clone="text" /sub_species="text" /sub_strain="text" /tissue_lib="text" /tissue_type="text" /transgenic /usedin=accnum:feature_label /variety="text" /virion Molecule scope any Comment transgenic sequences must have at least two source feature keys; in a transgenic sequence the source feature key describing the organism that is the recipient of the DNA must span the entire sequence; see Appendix IV /organelle for a list of <organelle_value> Feature Key stem_loop Definition hairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA. Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /operon="text" /standard_name="text" /usedin=accnum:feature_label Feature Key STS Definition sequence tagged site; short, single-copy DNA sequence that characterizes a mapping landmark on the genome and can be detected by PCR; a region of the genome can be mapped by determining the order of a series of STSs; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /standard_name="text" /usedin=accnum:feature_label Molecule scope DNA Parent key misc_binding Comment STS location to include primer(s) in primer_bind key or primers. Feature Key TATA_signal Definition TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit which may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T) [1,2]; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /usedin=accnum:feature_label Organism scope eukaryotes and eukaryotic viruses Molecule scope DNA References [1] Efstratiadis, A. et al. Cell 21, 653-668 (1980) [2] Corden, J., et al. "Promoter sequences of eukaryotic protein-encoding genes" Science 209, 1406-1414 (1980) Feature Key terminator Definition sequence of DNA located either at the end of the transcript that causes RNA polymerase to terminate transcription; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /operon="text" /standard_name="text" /usedin=accnum:feature_label Molecule scope DNA Feature Key transit_peptide Definition transit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein; this domain is involved in post-translational import of the protein into the organelle; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Feature Key tRNA Definition mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence; Optional qualifiers /allele="text" /anticodon=(pos:<base_range>,aa:<amino_acid>) /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Feature Key unsure Definition author is unsure of exact sequence in this region; Optional qualifiers /allele="text" /citation=[number] /compare=[accession-number.sequence-version] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /replace="text" /usedin=accnum:feature_label Comment use /replace="" to annotate deletion, e.g. unsure 11..15 /replace="" Feature Key V_region Definition variable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for the variable amino terminal portion; can be composed of V_segments, D_segments, N_regions, and J_segments; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Parent Key CDS Organism scope eukaryotes Feature Key V_segment Definition variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for most of the variable region (V_region) and the last few amino acids of the leader peptide; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Parent Key CDS Organism scope eukaryotes Feature Key variation Definition a related strain contains stable mutations from the same gene (e.g., RFLPs, polymorphisms, etc.) which differ from the presented sequence at this location (and possibly others); Optional qualifiers /allele="text" /citation=[number] /compare=[accession-number.sequence-version] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /frequency="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /phenotype="text" /product="text" /replace="text" /standard_name="text" /usedin=accnum:feature_label Comment used to describe alleles, RFLP's,and other naturally occurring mutations and polymorphisms; variability arising as a result of genetic manipulation (e.g. site directed mutagenesis) should be described with the misc_difference feature; use /replace="" to annotate deletion, e.g. variation 4..5 /replace="" Feature Key 3'clip Definition 3'-most region of a precursor transcript that is clipped off during processing; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /standard_name="text" /usedin=accnum:feature_label Feature Key 3'UTR Definition region at the 3' end of a mature transcript (following the stop codon) that is not translated into a protein; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /standard_name="text" /usedin=accnum:feature_label Feature Key 5'clip Definition 5'-most region of a precursor transcript that is clipped off during processing; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier> /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /standard_name="text" /usedin=accnum:feature_label Feature Key 5'UTR Definition region at the 5' end of a mature transcript (preceding the initiation codon) that is not translated into a protein; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /standard_name="text" /usedin=accnum:feature_label Feature Key -10_signal Definition Pribnow box; a conserved region about 10 bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT [1,2,3,4]; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /operon="text" /standard_name="text" /usedin=accnum:feature_label Organism scope prokaryotes Molecule scope DNA References [1] Schaller, H., Gray, C., and Hermann, K. Proc Natl Acad Sci USA 72, 737-741 (1974) [2] Pribnow, D. Proc Natl Acad Sci USA 72, 784-788 (1974) [3] Hawley, D.K. and McClure, W.R. "Compilation and analysis of Escherichia coli promoter DNA sequences" Nucl Acid Res 11, 2237-2255 (1983) [4] Rosenberg, M. and Court, D. "Regulatory sequences involved in the promotion and termination of RNA transcription" Ann Rev Genet 13, 319-353 (1979) Feature Key -35_signal Definition a conserved hexamer about 35 bp upstream of the start point of bacterial transcription units; consensus=TTGACa or TGTTGACA; Optional qualifiers /allele="text" /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /old_locus_tag="text" (single token) /operon="text" /standard_name="text" /usedin=accnum:feature_label Organism scope prokaryotes Molecule scope DNA References [1] Takanami, M., et al. Nature 260, 297-302 (1976) [2] Moran, C.P., Jr., et al. Molec Gen Genet 186, 339-346 (1982) [3] Maniatis, T., et al. Cell 5, 109-113 (1975) Feature Key - Definition "-" is a placeholder for no key; should be used when the need is merely to mark region in order to comment on it or to use it in another feature's location; Optional qualifiers /citation=[number] /db_xref="<database>:<identifier>" /evidence=<evidence_value> /function="text" /gene="text" /label=feature_label /locus_tag="text" (single token) /map="text" /note="text" /number=unquoted text (single token) /old_locus_tag="text" (single token) /phenotype="text" /product="text" /pseudo /standard_name="text" /usedin=accnum:feature_label Comment Example: 1..17 /usedin="X55079:GAA_CDS" 7.4 Appendix IV: Summary of qualifiers for feature keys 7.4.1 Qualifier List The following is a list of available qualifiers for feature keys and their usage. The information is arranged as follows: Qualifier name of qualifier; qualifier requires a value if followed by an equal sign Definition definition of the qualifier Value format format of value, if required Example example of qualifier with value Comment comments, questions and clarifications Qualifier /allele= Definition name of the allele for the given gene Value format "text" Example /allele="adh1-1" Comment all gene-related features (exon, CDS etc) for a given gene should share the same /allele qualifier value; the /allele qualifier value must, by definition, be different from the /gene qualifier value; when used with the variation feature key, the allele qualifier value should be that of the variant. Qualifier /anticodon=(pos: ,aa: ) Definition location of the anticodon of tRNA and the amino acid for which it codes Value format (pos:<base_range>,aa:<amino_acid>) where base_range is the position of the anticodon and amino_acid is the abbreviation for the amino acid encoded Example /anticodon=(pos:34..36,aa:Phe) Qualifier /bound_moiety= Definition name of the molecule/complex that may bind to the given feature Value format "text" Example /bound_moiety="GAL4" Comment Multiple /bound_moiety qualifiers are legal on "promoter" and "enhancer" features. A single /bound_moiety qualifier is legal on the "misc_binding", "oriT" and "protein_bind" features. Qualifier /cell_line= Definition cell line from which the sequence was obtained Value format "text" Example /cell_line="MCF7" Qualifier /cell_type= Definition cell type from which the sequence was obtained Value format "text" Example /cell_type="leukocyte" Qualifier /chromosome= Definition chromosome (e.g. Chromosome number) from which the sequence was obtained Value format "text" Example /chromosome="1" Qualifier /citation= Definition reference to a citation listed in the entry reference field Value format [integer-number] where integer-number is the number of the reference as enumerated in the reference field Example /citation=[3] Comment used to indicate the citation providing the claim of and/or evidence for a feature; brackets are used for conformity. Qualifier /clone= Definition clone from which the sequence was obtained Value format "text" Example /clone="lambda-hIL7.3" Comment not more than one clone should be specified for a given source feature; to indicate that the sequence was obtained from multiple clones, multiple source features should be given. Qualifier /clone_lib= Definition clone library from which the sequence was obtained Value format "text" Example /clone_lib="lambda-hIL7" Qualifier /codon= Definition specifies a codon which is different from any found in the reference genetic code Value format (seq:"codon-sequence",aa:<amino_acid>) where "codon-sequence" contains the bases of the codon and <amino_acid> is the abbreviation for the translated amino acid, the abbreviation for a modified unusual amino_acids from section 7.5, or the word OTHER Example /codon=(seq:"ttt", aa:Leu) Comment used to specify unusual genetic codes, organellar codes, etc, that are different from the "normal" code for the organism; the codon specified by "seq" codes for the amino acid or stop codon specified by "aa"; the codon that is specified is used throughout the CDS; amino acids that are not on the controlled vocabulary list can be annotated by using "aa:OTHER" as the amino acid designation, and by giving the name of the residue in a /note qualifier; only nucleotides a, g, c or t can be used in "codon-sequence"; multiple /codon qualifiers should be used to describe ambiguous nucleotides. Qualifier /codon_start= Definition indicates the offset at which the first complete codon of a coding feature can be found, relative to the first base of that feature. Value format 1 or 2 or 3 Example /codon_start=2 Qualifier /compare= Definition Reference details of an existing public INSD entry to which a comparison is made Value format [accession-number.sequence-version] Example /compare=AJ634337.1 Comment This qualifier may be used on the following features: misc_difference, conflict, unsure, old_sequence and variation. The features "old_sequence" and "conflict" must have either a /citation or a /compare qualifier. Multiple /compare qualifiers with different contents are allowed within a single feature. This qualifier is not intended for large-scale annotation of variations, such as SNPs. Qualifier /cons_splice= Definition differentiates between intron splice sites that conform to the 5'-GT ... AG-3' splice site consensus Value format (5'site:<value>, 3'site:<value>), where <value> can be 'YES', 'NO' or 'ABSENT' Example /cons_splice=(5'site:YES, 3'site:NO) /cons_splice=(5'site:ABSENT, 3'site:NO) Comment since the vast majority of splice sites conform to the consensus, this qualifier should be used only when one does not and the sequence has been checked; 'ABSENT' can be used when one of the termini is not part of the sequence and information on splice site is not available. Qualifier /country= Definition Geographical origin of sequenced sample, intended for epidemiological or population studies. Value format "<country_value>[:<region>][, <locality>]" where country_value is any value from the controlled vocabulary at URL:http://www.ncbi.nlm.nih.gov/projects/collab/country.html Example /country="Canada:Vancouver" /country="France:Cote d'Azur, Antibes" /country="Atlantic Ocean:Charlie Gibbs Fracture Zone" Comment Intended to provide a reference to the site where the source organism was isolated or sampled. Regions and localities should be indicated where possible. Note that the physical geography of the isolation or sampling site should be represented in /isolation_source. Qualifier /cultivar= Definition cultivar (cultivated variety) of plant from which sequence was obtained. Value format "text" Example /cultivar="Nipponbare" /cultivar="Tenuifolius" /cultivar="Candy Cane" /cultivar="IR36" Comment 'cultivar' is applied solely to products of artificial selection; use the variety qualifier for natural, named plant and fungal varieties; Qualifier /db_xref= Definition database cross-reference: pointer to related information in another database. Value format "<database>:<identifier>" where database is the name of the database containing related information, and identifier is the internal identifier of the related information according to the naming conventions of the cross-referenced database. Example /db_xref="SWISS-PROT:P12345" Comment the complete list of allowed database types is kept on NCBI's public WWW server, at URL: http://www.ncbi.nlm.nih.gov/projects/collab/ Qualifier /dev_stage= Definition if the sequence was obtained from an organism in a specific developmental stage, it is specified with this qualifier Value format "text" Example /dev_stage="fourth instar larva" Qualifier /direction= Definition direction of DNA replication Value format left, right, or both where left indicates toward the 5' end of the entry sequence (as presented) and right indicates toward the 3' end Example /direction=LEFT Qualifier /EC_number= Definition Enzyme Commission number for enzyme product of sequence Value format "text" Example /EC_number="1.1.2.4" Comment valid values for EC numbers are defined in the list prepared by the IUPAC-IUB Commission on Biochemical Enzyme Nomenclature (published in Enzyme Nomenclature 1984 New York: Academic Press (1984) or a more recent revision thereof). Qualifier /ecotype Definition a population within a given species displaying genetically based, phenotypic traits that reflect adaptation to a local habitat. Value Format "text" Example /ecotype="Columbia" Comment an example of such a population is one that has adapted hairier than normal leaves as a response to an especially sunny habitat. 'Ecotype' is often applied to standard genetic stocks of Arabidopsis thaliana, but it can be applied to any sessile organism. Qualifier /environmental_sample Definition identifies sequences derived by direct molecular isolation (PCR, DGGE, or other anonymous methods) from an environmental sample with no reliable identification of the source organism Value format none Example /environmental_sample Comment used only with the source feature key; source feature keys containing the /environmental_sample qualifier should also contain the /isolation_source qualifier. Qualifier /estimated_length Definition estimated length of the gap in the sequence Value format unknown or <integer> Example /estimated_length=unknown /estimated_length=342 Qualifier /evidence= Definition value indicating the nature of supporting evidence, distinguishing between experimentally determined and theoretically derived data Value format experimental, not_experimental Example /evidence=experimental Comment experimental indicates that the feature identification or assignment is supported by direct experimental evidence; not_experimental indicates that the data for the feature are derived (eg promotor as identified by consensus match). Qualifier /exception= Definition indicates that the amino acid or RNA sequence will not translate or agree with the DNA sequence according to standard biological rules. Value format "text" Example /exception="RNA editing" /exception="reasons given in citation" Comment only to be used to describe biological mechanisms such as RNA editing; where the exception cannot easily be described a published citation must be referred to; protein translation of /exception CDS will be different from the according conceptual translation; - must not be used where transl_except would be adequate, e.g. in case of stop codon completion use: /transl_except=(pos:6883,aa:TERM) /note="TAA stop codon is completed by addition of 3' A residues to mRNA". - must not be used for ribosomal slippage, instead use join operator, e.g.: CDS join(486..1784,1787..4810) /note="ribosomal slip on tttt sequence at 1784..1787" Qualifier /focus Definition defines the source feature of primary biological interest for records that have multiple source features originating from different organisms Value format none Example /focus Comment the /focus qualifier identifies the organism which is displayed in the organism line and determines the DDBJ/EMBL/GenBank taxonomic division the entry will appear in; if no translation table is specified, the organism with /focus will define the translation table; within an entry with several source features, only one will exist with /focus on it; multi-source entries with a /transgenic source feature do not require a /focus qualifier. Qualifier /frequency= Definition frequency of the occurrence of a feature Value format text representing the fraction of population carrying the variation expressed as a decimal fraction Example /frequency=".85" Qualifier /function= Definition function attributed to a sequence Value format "text" Example function="essential for recognition of cofactor" Comment /function is used when the gene name and/or product name do not convey the function attributable to a sequence. Qualifier /gene= Definition symbol of the gene corresponding to a sequence region Value format "text" Example /gene="ilvE" Qualifier /germline Definition if the sequence shown is DNA and a member of the immunoglobulin family, this qualifier is used to denote that the sequence is from unrearranged DNA. Value format none Example /germline Comment /germline cannot be used in the same entry/record as /rearranged Qualifier /haplotype= Definition haplotype of organism from which the sequence was obtained Value format "text" Example /haplotype="Dw3 B5 Cw1 A1" Qualifier /insertion_seq= Definition insertion sequence element from which the sequence was obtained Value format "text" Example /insertion_seq="IS-11" Comment /insertion_seq is legal on repeat_region feature key; Qualifier /isolate= Definition individual isolate from which the sequence was obtained Value format "text" Example /isolate="Patient #152" /isolate="DGGE band PSBAC-13" Qualifier /isolation_source= Definition describes the physical, environmental and/or local geographical source of the biological sample from which the sequence was derived Value format "text" Examples /isolation_source="rumen isolates from standard Pelleted ration-fed steer #67" /isolation_source="permanent Antarctic sea ice" /isolation_source="denitrifying activated sludge from carbon_limited continuous reactor" Comment used only with the source feature key; source feature keys containing an /environmental_sample qualifier should also contain an /isolation_source qualifier; the /country qualifier should be used to describe the country and major geographical sub-region. Qualifier /label= Definition a label used to permanently tag a feature Value format feature_label Example /label=Alb1_exon1 Comment feature labels follow the naming conventions for all feature table objects (see Sections 3.1 and 3.4) Qualifier /lab_host= Definition laboratory host used to propagate the organism from which the sequence was obtained Value format "text" Example /lab_host="chicken embryos" Qualifier /locus_tag Definition feature tag assigned for tracking purposes Value Format "text"(single token) but not "<1-5 letters><5-9 digit integer>[.<integer>]" Example /locus_tag="RSc0382" /locus_tag="YPO0002" Comment /locus_tag can be used with any feature where /gene is valid; identical /locus_tag values may be used within an entry/record, but only if the identical /locus_tag values are associated with the same gene; in all other circumstances the /locus_tag value must be unique within that entry/record. Multiple /locus_tag values are not allowed within one feature for entries created after 15-OCT-2004. If a /locus_tag needs to be re-assigned the /old_locus_tag qualifier should be used to store the old value. Existing records where multiple /locus_tag qualifiers are present will be retrofitted by January 2005. The /locus_tag value should not be in a format which resembles INSD accession numbers, accession.version, or /proteid_id identifiers. Qualifier /map= Definition genomic map position of feature Value format "text" Example /map="8q12-13" Qualifier /macronuclear Definition if the sequence shown is DNA and from an organism which undergoes chromosomal differentiation between macronuclear and micronuclear stages, this qualifier is used to denote that the sequence is from macronuclear DNA. Value format none Example /macronuclear Qualifier /mod_base= Definition abbreviation for a modified nucleotide base Value format modified_base Example /mod_base=m5c Comment modified nucleotides not found in the restricted vocabulary list can be annotated by entering '/mod_base=OTHER' with '/note="name of modified base"' Qualifier /mol_type= Definition in vivo molecule type of sequence Value format "genomic DNA", "genomic RNA", "mRNA", "tRNA", "rRNA", "snoRNA", "snRNA", "scRNA", "pre-RNA", "other RNA", "other DNA", "unassigned DNA", "unassigned RNA" Example /mol_type="genomic DNA" Comment these text values describe the in vivo molecule that has been sequenced and not the sequencing technique that has been used (e.g. mRNA is a valid value, cDNA is not); the value "genomic DNA" does not imply that the molecule is nuclear (e.g. organelle and plasmid DNA should be described using "genomic DNA"); ribosomal RNA genes should be described using "genomic DNA"; "rRNA" should only be used if the ribosomal RNA molecule itself has been sequenced; /mol_type is mandatory on every source feature key; all /mol_type values within one entry/record must be the same; values "other RNA" and "other DNA" should be applied to synthetic molecules, values "unassigned DNA", "unassigned RNA" should be applied were in vivo molecule is unknown; Qualifier /note= Definition any comment or additional information Value format "text" Example /note="This qualifier is equivalent to a comment." Qualifier /number= Definition a number to indicate the order of genetic elements (e.g., exons or introns) in the 5' to 3' direction Value format unquoted text (single token) Example /number=4 /number=6B Comment text limited to integers, letters or combination of integers and/or letters represented as an unquoted single token (e.g. 5a, XIIb); any additional terms should be included in /standard_name. Example: /number=2A /standard_name="long" Qualifier /old_locus_tag Definition feature tag assigned for tracking purposes Value Format "text" (single token) Example /old_locus_tag="RSc0382" /locus_tag="YPO0002" Comment /old_locus_tag can be used with any feature where /gene is valid and where a /locus_tag qualifier is present. Identical /old_locus_tag values may be used within an entry/record, but only if the identical /old_locus_tag values are associated with the same gene; in all other circumstances the /old_locus_tag value must be unique within that entry/record. Multiple/old_locus_tag qualifiers with distinct values are allowed within a single feature; /old_locus_tag and /locus_tag values must not be identical within a single feature. Qualifier /operon Definition name of the operon the feature belongs to Value format "text" Example /operon="lac" Comment currently valid only on Prokaryota-specific features Qualifier /organelle= Definition type of membrane-bound intracellular structure from which the sequence was obtained Value format mitochondrion, nucleomorph, plastid, mitochondrion:kinetoplast, plastid:chloroplast, plastid:apicoplast, plastid:chromoplast, plastid:cyanelle, plastid:leucoplast, plastid:proplastid, Examples /organelle="mitochondrion" /organelle="nucleomorph" /organelle="plastid" /organelle="mitochondrion:kinetoplast" /organelle="plastid:chloroplast" /organelle="plastid:apicoplast" /organelle="plastid:chromoplast" /organelle="plastid:cyanelle" /organelle="plastid:leucoplast" /organelle="plastid:proplastid" Comments modifier text limited to values from controlled list Qualifier /organism= Definition scientific name of the organism that provided the sequenced genetic material. Value format "text" Example /organism="Homo sapiens" Comment the organism name which appears on the OS or ORGANISM line will match the value of the /organism qualifier of the source key in the simplest case of a one-source sequence. Qualifier /partial Definition differentiates between complete regions and partial ones Value format none Example /partial Comment not to be used for new entries from 15-DEC-2001; use '<' and '>' signs in the location descriptors to indicate that the sequence is partial. Qualifier /PCR_conditions= Definition description of reaction conditions and components for PCR Value format "text" Example /PCR_conditions="Initial denaturation:94degC,1.5min" Comment used with primer_bind key Qualifier /phenotype= Definition phenotype conferred by the feature Value format "text" Example /phenotype="erythromycin resistance" Qualifier /pop_variant= Definition population variant from which the sequence was obtained Value format "text" Example /pop_variant="population variant name" Qualifier /plasmid= Definition name of plasmid from which sequence was obtained Value format "text" Example /plasmid="C-589" Qualifier /product= Definition name of a product encoded by a sequence Value format "text" Example /product="catalase" Qualifier /protein_id= Definition protein identifier, issued by International collaborators. this qualifier consists of a stable ID portion (3+5 format with 3 position letters and 5 numbers) plus a version number after the decimal point. Value format <identifier> Example /protein_id="AAA12345.1" Comment when the protein sequence encoded by the CDS changes, only the version number of the /protein_id value is incremented; the stable part of the /protein_id remains unchanged and as a result will permanently be associated with a given protein; this qualifier is valid only on CDS features which translate into a valid protein. Qualifier /proviral Definition if the sequence shown is viral and integrated into another organism's genome, this qualifier is used to denote that Value format none Example /proviral Comment /proviral cannot be used in the same entry/record as /virion Qualifier /pseudo Definition indicates that this feature is a non-functional version of the element named by the feature key Value format none Example /pseudo Qualifier /rearranged Definition if the sequence shown is DNA and a member of the immunoglobulin family, this qualifier is used to denote that the sequence is from rearranged DNA. Value format none Example /rearranged Comment /rearranged cannot be used in the same entry/record as /germline Qualifier /replace= Definition indicates that the sequence identified a feature's intervals is replaced by the sequence shown in "text"; if no sequence is contained within the qualifier, this indicates a deletion. Value format "text" Example /replace="a" /replace="" Qualifier /rpt_family= Definition type of repeated sequence; "Alu" or "Kpn", for example Value format "text" Example /rpt_family="Alu" Comment preferred usage is to qualify the repeat_region instead of any of the constituent repeat_units Qualifier /rpt_type=<repeat_type> Definition organization of repeated sequence Value format tandem, inverted, flanking, terminal, direct, dispersed, and other Example /rpt_type=INVERTED Comment preferred usage is to qualify the repeat_region instead of any of the constituent repeat_units. definitions of these values will be added in a future release of this document. see Singer, M. Int Rev Cytol 76, 67-112 (1982); Cell 26, 293-95 (1981); Hardman, N. Biochem J 234, 1-11 (1986). Qualifier /rpt_unit= Definition identity of repeat unit Value format "text" or <base_range> Example /rpt_unit="aagggc" /rpt_unit=202..245 Comment used to indicate the literal sequence, or the base range of the sequence that constitutes a repeat_region or a single repeat_unit; the repeat family name should not be entered in /rpt_unit="text"; /rpt_family should be used instead. Qualifier /segment= Definition name of viral or phage segment sequenced Value format "text" Example /segment="6" Qualifier /serotype= Definition serological variety of a species characterized by its antigenic properties Value format "text" Example /serotype="B1" Comment used only with the source feature key; the Bacteriological Code recommends the use of the term 'serovar' instead of 'serotype' for the prokaryotes; see the International Code of Nomenclature of Bacteria (1990 Revision) Appendix 10.B "Infraspecific Terms". Qualifier /serovar= Definition serological variety of a species (usually a prokaryote) characterized by its antigenic properties Value format "text" Example /serovar="O157:H7" Comment used only with the source feature key; the Bacteriological Code recommends the use of the term 'serovar' instead of 'serotype' for prokaryotes; see the International Code of Nomenclature of Bacteria (1990 Revision) Appendix 10.B "Infraspecific Terms". Qualifier /sex= Definition sex of the organism from which the sequence was obtained Value format "text" Example /sex="female" Qualifier /specific_host= Definition natural host from which the sequence was obtained Value format "text" Example /specific_host="Rhizobium NGR234" Qualifier /specimen_voucher= Definition an identifier of the individual or collection of the source organism and the place where it is currently stored, usually an institution. Value format "text" Example /specimen_voucher="Smith s. n. 4-IV-1995 (U. S. Natl. Herbarium)" Qualifier /standard_name= Definition accepted standard name for this feature Value format "text" Example /standard_name="dotted" Comment use /standard_name to give full gene name, but use /gene to give gene symbol (in the above example /gene="Dt"). Qualifier /strain= Definition strain from which sequence was obtained Value format "text" Example /strain="BALB/c" Qualifier /sub_clone= Definition sub-clone from which sequence was obtained Value format "text" Example /sub_clone="lambda-hIL7.20g" Comment the comments on /clone apply to /sub_clone Qualifier /sub_species= Definition name of sub-species of organism from which sequence was obtained Value format "text" Example /sub_species="lactis" Qualifier /sub_strain= Definition sub_strain from which sequence was obtained Value format "text" Example /sub_strain="abis" Qualifier /tissue_lib= Definition tissue library from which sequence was obtained Value format "text" Example /tissue_lib="tissue library 772" Qualifier /tissue_type= Definition tissue type from which the sequence was obtained Value format "text" Example /tissue_type="liver" Qualifier /transgenic Definition identifies the source feature of the organism which was the recipient of transgenic DNA Value format none Example /transgenic Comment transgenic sequences must at least have two source feature keys; the source feature key describing the organism of the recipient DNA must span the whole sequence; the /transgenic qualifier identifies the organism which is displayed in the organism line and determines that the entry will appear in the DDBJ/EMBL/GenBank Synthetic Construct division; multi-source entries including a /transgenic source feature should not have a /focus qualifier. Qualifier /translation= Definition automatically generated one-letter abbreviated amino acid sequence derived from either the universal genetic code or the table as specified in /transl_table and as determined by exceptions in the /transl_except and /codon qualifiers Value format IUPAC one-letter amino acid abbreviation, "X" is to be used for AA exceptions. Example /translation="MASTFPPWYRGCASTPSLKGLIMCTW" Comment to be used with CDS feature only; this is a mandatory qualifier to the CDS feature key except for /pseudo CDSs; see /transl_table for definition and location of genetic code Tables. Qualifier /transl_except= Definition translational exception: single codon the translation of which does not conform to genetic code defined by Organism and /codon= Value format (pos:location,aa:<amino_acid>) where amino_acid is the amino acid coded by the codon at the base_range position Example /transl_except=(pos:213..215,aa:Trp) /transl_except=(pos:1017,aa:TERM) /transl_except=(pos:2000..2001,aa:TERM) /transl_except=(pos:X22222:15..17,aa:Ala) Comment if the amino acid is not on the restricted vocabulary list use e.g., '/transl_except=(pos:213..215,aa:OTHER)' with '/note="name of unusual amino acid"'; for modified amino-acid selenocysteine use three letter code 'Sec' (one letter code 'U' in amino-acid sequence) /transl_except=(pos:1002..1004,aa:Sec); for partial termination codons where TAA stop codon is completed by the addition of 3' A residues to the mRNA either a single base_position or a base_range is used, e.g. if partial stop codon is a single base: /transl_except=(pos:1017,aa:TERM) if partial stop codon consists of two bases: /transl_except=(pos:2000..2001,aa:TERM) with '/note='stop codon completed by the addition of 3' A residues to the mRNA'. Qualifier /transl_table= Definition definition of genetic code table used if other than universal genetic code table. Tables used are described in appendix V, section 7.5.5. Value format <integer; 1=universal table 1;2=non-universal table 2;... Example /transl_table=4 Comment genetic code exceptions outside range of specified tables are reported in /codon or /transl_except qualifiers. Qualifier /transposon= Definition transposable element from which the sequence was obtained Value format "text" Example /transposon="Tn9" Comment /transposon is legal on repeat_region feature key; Qualifier /usedin= Definition indicates that the feature is used in a compound feature in another entry Value format Accession-number:feature-name or Database_name::Acc_number:feature_label Example /usedin=X10087:proteinx Comment database_name is an abbreviation for the name of the database in which the entry for the accession number can be found. Qualifier /variety Definition variety (= varietas, a formal Linnaean rank) of organism from which sequence was derived. Value format "text" Example /variety="insularis" Comment use the cultivar qualifier for cultivated plant varieties, i.e., products of artificial selection; varieties other than plant and fungal variatas should be annotated via /note, e.g. /note="breed:Cukorova" Qualifier /virion Definition viral genomic sequence as it is encapsidated (distinguished from its proviral form integrated in a host cell's chromosome) Value format none Example /virion Comment /virion cannot be used in the same entry/record as /proviral 7.4.2 Feature qualifiers - mapped to Feature keys The following is a list of available qualifiers mapped to the list of feature keys on which each qualifier is legal. QUALIFIER FEATURE KEY /allele -10_signal /allele -35_signal /allele 3'clip /allele 3'UTR /allele 5'clip /allele 5'UTR /allele attenuator /allele C_region /allele CAAT_signal /allele CDS /allele conflict /allele D_segment /allele D-loop /allele enhancer /allele exon /allele GC_signal /allele gene /allele iDNA /allele intron /allele J_segment /allele LTR /allele mat_peptide /allele misc_binding /allele misc_difference /allele misc_feature /allele misc_recomb /allele misc_RNA /allele misc_signal /allele misc_structure /allele modified_base /allele mRNA /allele N_region /allele old_sequence /allele operon /allele oriT /allele polyA_signal /allele polyA_site /allele precursor_RNA /allele prim_transcript /allele primer_bind /allele promoter /allele protein_bind /allele RBS /allele rep_origin /allele repeat_region /allele repeat_unit /allele rRNA /allele S_region /allele satellite /allele scRNA /allele sig_peptide /allele snoRNA /allele snRNA /allele stem_loop /allele STS /allele TATA_signal /allele terminator /allele transit_peptide /allele tRNA /allele unsure /allele V_region /allele V_segment /allele variation /anticodon tRNA /bound_moiety enhancer /bound_moiety misc_binding /bound_moiety oriT /bound_moiety promoter /bound_moiety protein_bind /cell_line source /cell_type source /chromosome source /citation -10_signal /citation -35_signal /citation 3'clip /citation 3'UTR /citation 5'clip /citation 5'UTR /citation attenuator /citation C_region /citation CAAT_signal /citation CDS /citation conflict /citation D_segment /citation D-loop /citation enhancer /citation exon /citation GC_signal /citation gene /citation iDNA /citation intron /citation J_segment /citation LTR /citation mat_peptide /citation misc_binding /citation misc_difference /citation misc_feature /citation misc_recomb /citation misc_RNA /citation misc_signal /citation misc_structure /citation modified_base /citation mRNA /citation N_region /citation old_sequence /citation operon /citation oriT /citation polyA_signal /citation polyA_site /citation precursor_RNA /citation prim_transcript /citation primer_bind /citation promoter /citation protein_bind /citation RBS /citation rep_origin /citation repeat_region /citation repeat_unit /citation rRNA /citation S_region /citation satellite /citation scRNA /citation sig_peptide /citation snoRNA /citation snRNA /citation source /citation stem_loop /citation STS /citation TATA_signal /citation terminator /citation transit_peptide /citation tRNA /citation unsure /citation V_region /citation V_segment /citation variation /clone misc_difference /clone source /clone_lib source /codon CDS /codon_start CDS /compare conflict /compare misc_difference /compare old_sequence /compare variation /compare unsure /cons_splice intron /country source /cultivar source /db_xref -10_signal /db_xref -35_signal /db_xref 3'clip /db_xref 3'UTR /db_xref 5'clip /db_xref 5'UTR /db_xref attenuator /db_xref C_region /db_xref CAAT_signal /db_xref CDS /db_xref conflict /db_xref D_segment /db_xref D-loop /db_xref enhancer /db_xref exon /db_xref GC_signal /db_xref gene /db_xref iDNA /db_xref intron /db_xref J_segment /db_xref LTR /db_xref mat_peptide /db_xref misc_binding /db_xref misc_difference /db_xref misc_feature /db_xref misc_recomb /db_xref misc_RNA /db_xref misc_signal /db_xref misc_structure /db_xref modified_base /db_xref mRNA /db_xref N_region /db_xref old_sequence /db_xref operon /db_xref oriT /db_xref polyA_signal /db_xref polyA_site /db_xref precursor_RNA /db_xref prim_transcript /db_xref primer_bind /db_xref promoter /db_xref protein_bind /db_xref RBS /db_xref rep_origin /db_xref repeat_region /db_xref repeat_unit /db_xref rRNA /db_xref S_region /db_xref satellite /db_xref scRNA /db_xref sig_peptide /db_xref snoRNA /db_xref snRNA /db_xref source /db_xref stem_loop /db_xref STS /db_xref TATA_signal /db_xref terminator /db_xref transit_peptide /db_xref tRNA /db_xref unsure /db_xref V_region /db_xref V_segment /db_xref variation /dev_stage source /direction oriT /direction rep_origin /EC_number CDS /EC_number exon /EC_number mat_peptide /ecotype source /environmental_sample source /estimated_length gap /evidence -10_signal /evidence -35_signal /evidence 3'clip /evidence 3'UTR /evidence 5'clip /evidence 5'UTR /evidence attenuator /evidence C_region /evidence CAAT_signal /evidence CDS /evidence conflict /evidence D_segment /evidence D-loop /evidence enhancer /evidence exon /evidence GC_signal /evidence gene /evidence iDNA /evidence intron /evidence J_segment /evidence LTR /evidence mat_peptide /evidence misc_binding /evidence misc_difference /evidence misc_feature /evidence misc_recomb /evidence misc_RNA /evidence misc_signal /evidence misc_structure /evidence modified_base /evidence mRNA /evidence N_region /evidence old_sequence /evidence operon /evidence oriT /evidence polyA_signal /evidence polyA_site /evidence precursor_RNA /evidence prim_transcript /evidence primer_bind /evidence promoter /evidence protein_bind /evidence RBS /evidence rep_origin /evidence repeat_region /evidence repeat_unit /evidence rRNA /evidence S_region /evidence satellite /evidence scRNA /evidence sig_peptide /evidence snoRNA /evidence snRNA /evidence stem_loop /evidence STS /evidence TATA_signal /evidence terminator /evidence transit_peptide /evidence tRNA /evidence unsure /evidence V_region /evidence V_segment /evidence variation /exception CDS /exception mRNA /focus source /frequency modified_base /frequency source /frequency variation /function 3'clip /function 3'UTR /function 5'clip /function 5'UTR /function CDS /function exon /function gene /function iDNA /function intron /function LTR /function mat_peptide /function misc_binding /function misc_feature /function misc_RNA /function misc_signal /function misc_structure /function mRNA /function operon /function precursor_RNA /function prim_transcript /function promoter /function protein_bind /function repeat_region /function repeat_unit /function rRNA /function scRNA /function sig_peptide /function snoRNA /function snRNA /function stem_loop /function transit_peptide /function tRNA /gene -10_signal /gene -35_signal /gene 3'clip /gene 3'UTR /gene 5'clip /gene 5'UTR /gene attenuator /gene C_region /gene CAAT_signal /gene CDS /gene conflict /gene D_segment /gene D-loop /gene enhancer /gene exon /gene GC_signal /gene gene /gene iDNA /gene intron /gene J_segment /gene LTR /gene mat_peptide /gene misc_binding /gene misc_difference /gene misc_feature /gene misc_recomb /gene misc_RNA /gene misc_signal /gene misc_structure /gene modified_base /gene mRNA /gene N_region /gene old_sequence /gene oriT /gene polyA_signal /gene polyA_site /gene precursor_RNA /gene prim_transcript /gene primer_bind /gene promoter /gene protein_bind /gene RBS /gene rep_origin /gene repeat_region /gene repeat_unit /gene rRNA /gene S_region /gene satellite /gene scRNA /gene sig_peptide /gene snoRNA /gene snRNA /gene stem_loop /gene STS /gene TATA_signal /gene terminator /gene transit_peptide /gene tRNA /gene unsure /gene V_region /gene V_segment /gene variation /germline source /haplotype source /insertion_seq repeat_region /isolate source /isolation_source source /lab_host source /label -10_signal /label -35_signal /label 3'clip /label 3'UTR /label 5'clip /label 5'UTR /label attenuator /label C_region /label CAAT_signal /label CDS /label conflict /label D_segment /label D-loop /label enhancer /label exon /label GC_signal /label gene /label iDNA /label intron /label J_segment /label LTR /label mat_peptide /label misc_binding /label misc_difference /label misc_feature /label misc_recomb /label misc_RNA /label misc_signal /label misc_structure /label modified_base /label mRNA /label N_region /label old_sequence /label operon /label oriT /label polyA_signal /label polyA_site /label precursor_RNA /label prim_transcript /label primer_bind /label promoter /label protein_bind /label RBS /label rep_origin /label repeat_region /label repeat_unit /label rRNA /label S_region /label satellite /label scRNA /label sig_peptide /label snoRNA /label snRNA /label source /label stem_loop /label STS /label TATA_signal /label terminator /label transit_peptide /label tRNA /label unsure /label V_region /label V_segment /label variation /locus_tag -10_signal /locus_tag -35_signal /locus_tag 3'clip /locus_tag 3'UTR /locus_tag 5'clip /locus_tag 5'UTR /locus_tag attenuator /locus_tag C_region /locus_tag CAAT_signal /locus_tag CDS /locus_tag conflict /locus_tag D_segment /locus_tag D-loop /locus_tag enhancer /locus_tag exon /locus_tag GC_signal /locus_tag gene /locus_tag iDNA /locus_tag intron /locus_tag J_segment /locus_tag LTR /locus_tag mat_peptide /locus_tag misc_binding /locus_tag misc_difference /locus_tag misc_feature /locus_tag misc_recomb /locus_tag misc_RNA /locus_tag misc_signal /locus_tag misc_structure /locus_tag modified_base /locus_tag mRNA /locus_tag N_region /locus_tag old_sequence /locus_tag oriT /locus_tag polyA_signal /locus_tag polyA_site /locus_tag precursor_RNA /locus_tag prim_transcript /locus_tag primer_bind /locus_tag promoter /locus_tag protein_bind /locus_tag RBS /locus_tag rep_origin /locus_tag repeat_region /locus_tag repeat_unit /locus_tag rRNA /locus_tag S_region /locus_tag satellite /locus_tag scRNA /locus_tag sig_peptide /locus_tag snoRNA /locus_tag snRNA /locus_tag stem_loop /locus_tag STS /locus_tag TATA_signal /locus_tag terminator /locus_tag transit_peptide /locus_tag tRNA /locus_tag unsure /locus_tag V_region /locus_tag V_segment /locus_tag variation /macronuclear source /map -10_signal /map -35_signal /map 3'clip /map 3'UTR /map 5'clip /map 5'UTR /map attenuator /map C_region /map CAAT_signal /map CDS /map conflict /map D_segment /map D-loop /map enhancer /map exon /map GC_signal /map gap /map gene /map iDNA /map intron /map J_segment /map LTR /map mat_peptide /map misc_binding /map misc_difference /map misc_feature /map misc_recomb /map misc_RNA /map misc_signal /map misc_structure /map modified_base /map mRNA /map N_region /map old_sequence /map operon /map oriT /map polyA_signal /map polyA_site /map precursor_RNA /map prim_transcript /map primer_bind /map promoter /map protein_bind /map RBS /map rep_origin /map repeat_region /map repeat_unit /map rRNA /map S_region /map satellite /map scRNA /map sig_peptide /map snoRNA /map snRNA /map source /map stem_loop /map STS /map TATA_signal /map terminator /map transit_peptide /map tRNA /map unsure /map V_region /map V_segment /map variation /mod_base modified_base /mol_type source /note -10_signal /note -35_signal /note 3'clip /note 3'UTR /note 5'clip /note 5'UTR /note attenuator /note C_region /note CAAT_signal /note CDS /note conflict /note D_segment /note D-loop /note enhancer /note exon /note GC_signal /note gap /note gene /note iDNA /note intron /note J_segment /note LTR /note mat_peptide /note misc_binding /note misc_difference /note misc_feature /note misc_recomb /note misc_RNA /note misc_signal /note misc_structure /note modified_base /note mRNA /note N_region /note old_sequence /note operon /note oriT /note polyA_signal /note polyA_site /note precursor_RNA /note prim_transcript /note primer_bind /note promoter /note protein_bind /note RBS /note rep_origin /note repeat_region /note repeat_unit /note rRNA /note S_region /note satellite /note scRNA /note sig_peptide /note snoRNA /note snRNA /note source /note stem_loop /note STS /note TATA_signal /note terminator /note transit_peptide /note tRNA /note unsure /note V_region /note V_segment /note variation /number CDS /number exon /number iDNA /number intron /number misc_feature /old_locus_tag -10_signal /old_locus_tag -35_signal /old_locus_tag 3'clip /old_locus_tag 3'UTR /old_locus_tag 5'clip /old_locus_tag 5'UTR /old_locus_tag attenuator /old_locus_tag C_region /old_locus_tag CAAT_signal /old_locus_tag CDS /old_locus_tag conflict /old_locus_tag D_segment /old_locus_tag D-loop /old_locus_tag enhancer /old_locus_tag exon /old_locus_tag GC_signal /old_locus_tag gene /old_locus_tag iDNA /old_locus_tag intron /old_locus_tag J_segment /old_locus_tag LTR /old_locus_tag mat_peptide /old_locus_tag misc_binding /old_locus_tag misc_difference /old_locus_tag misc_feature /old_locus_tag misc_recomb /old_locus_tag misc_RNA /old_locus_tag misc_signal /old_locus_tag misc_structure /old_locus_tag modified_base /old_locus_tag mRNA /old_locus_tag N_region /old_locus_tag old_sequence /old_locus_tag oriT /old_locus_tag polyA_signal /old_locus_tag polyA_site /old_locus_tag precursor_RNA /old_locus_tag prim_transcript /old_locus_tag primer_bind /old_locus_tag promoter /old_locus_tag protein_bind /old_locus_tag RBS /old_locus_tag rep_origin /old_locus_tag repeat_region /old_locus_tag repeat_unit /old_locus_tag rRNA /old_locus_tag S_region /old_locus_tag satellite /old_locus_tag scRNA /old_locus_tag sig_peptide /old_locus_tag snoRNA /old_locus_tag snRNA /old_locus_tag stem_loop /old_locus_tag STS /old_locus_tag TATA_signal /old_locus_tag terminator /old_locus_tag transit_peptide /old_locus_tag tRNA /old_locus_tag unsure /old_locus_tag V_region /old_locus_tag V_segment /old_locus_tag variation /operon -10_signal /operon -35_signal /operon attenuator /operon CDS /operon gene /operon misc_RNA /operon misc_signal /operon mRNA /operon operon /operon precursor_RNA /operon prim_transcript /operon promoter /operon stem_loop /operon terminator /organelle source /organism misc_recomb /organism source /PCR_conditions primer_bind /phenotype attenuator /phenotype gene /phenotype misc_difference /phenotype misc_feature /phenotype misc_signal /phenotype operon /phenotype promoter /phenotype variation /plasmid source /pop_variant source /product C_region /product CDS /product D_segment /product exon /product gene /product J_segment /product mat_peptide /product misc_feature /product misc_RNA /product mRNA /product N_region /product precursor_RNA /product rRNA /product S_region /product scRNA /product sig_peptide /product snoRNA /product snRNA /product transit_peptide /product tRNA /product V_region /product V_segment /product variation /protein_id CDS /proviral source /pseudo C_region /pseudo CDS /pseudo D_segment /pseudo exon /pseudo gene /pseudo J_segment /pseudo mat_peptide /pseudo misc_feature /pseudo mRNA /pseudo N_region /pseudo operon /pseudo promoter /pseudo rRNA /pseudo S_region /pseudo scRNA /pseudo sig_peptide /pseudo snoRNA /pseudo snRNA /pseudo transit_peptide /pseudo tRNA /pseudo V_region /pseudo V_segment /rearranged source /replace conflict /replace misc_difference /replace old_sequence /replace unsure /replace variation /rpt_family oriT /rpt_family repeat_region /rpt_family repeat_unit /rpt_family satellite /rpt_type oriT /rpt_type repeat_region /rpt_type repeat_unit /rpt_type satellite /rpt_unit oriT /rpt_unit repeat_region /rpt_unit repeat_unit /rpt_unit satellite /segment source /serotype source /serovar source /sex source /specific_host source /specimen_voucher source /standard_name -10_signal /standard_name -35_signal /standard_name 3'clip /standard_name 3'UTR /standard_name 5'clip /standard_name 5'UTR /standard_name C_region /standard_name CDS /standard_name D_segment /standard_name enhancer /standard_name exon /standard_name gene /standard_name iDNA /standard_name intron /standard_name J_segment /standard_name LTR /standard_name mat_peptide /standard_name misc_difference /standard_name misc_feature /standard_name misc_recomb /standard_name misc_RNA /standard_name misc_signal /standard_name misc_structure /standard_name mRNA /standard_name N_region /standard_name operon /standard_name oriT /standard_name precursor_RNA /standard_name prim_transcript /standard_name primer_bind /standard_name promoter /standard_name protein_bind /standard_name RBS /standard_name rep_origin /standard_name repeat_region /standard_name rRNA /standard_name S_region /standard_name satellite /standard_name scRNA /standard_name sig_peptide /standard_name snoRNA /standard_name snRNA /standard_name stem_loop /standard_name STS /standard_name terminator /standard_name transit_peptide /standard_name tRNA /standard_name V_region /standard_name V_segment /standard_name variation /strain source /sub_clone source /sub_species source /sub_strain source /tissue_lib source /tissue_type source /transgenic source /transl_except CDS /transl_table CDS /translation CDS /transposon repeat_region /usedin -10_signal /usedin -35_signal /usedin 3'clip /usedin 3'UTR /usedin 5'clip /usedin 5'UTR /usedin attenuator /usedin C_region /usedin CAAT_signal /usedin CDS /usedin conflict /usedin D_segment /usedin D-loop /usedin enhancer /usedin exon /usedin GC_signal /usedin gene /usedin iDNA /usedin intron /usedin J_segment /usedin LTR /usedin mat_peptide /usedin misc_binding /usedin misc_difference /usedin misc_feature /usedin misc_recomb /usedin misc_RNA /usedin misc_signal /usedin misc_structure /usedin modified_base /usedin mRNA /usedin N_region /usedin old_sequence /usedin operon /usedin oriT /usedin polyA_signal /usedin polyA_site /usedin precursor_RNA /usedin prim_transcript /usedin primer_bind /usedin promoter /usedin protein_bind /usedin RBS /usedin rep_origin /usedin repeat_region /usedin repeat_unit /usedin rRNA /usedin S_region /usedin satellite /usedin scRNA /usedin sig_peptide /usedin snoRNA /usedin snRNA /usedin source /usedin stem_loop /usedin STS /usedin TATA_signal /usedin terminator /usedin transit_peptide /usedin tRNA /usedin unsure /usedin V_region /usedin V_segment /usedin variation /variety source /virion source 7.5 Appendix V: Controlled vocabularies This appendix contains information on the restricted vocabulary fields used in the Feature Table. The information contained in this appendix is subject to change, please contact the database staff for the most recent information concerning controlled vocabularies. This appendix is organized as follows: Authority The organization with authority to define the vocabulary Reference Publications of (or about) the vocabulary Contact Name of database staff responsible for maintaining the database copy of the vocabulary Scope Feature Table qualifiers which take members of this vocabulary as values Listing A listing of the current vocabulary with definitions or explanations This appendix includes reference lists for the following controlled vocabulary fields: - Nucleotide base codes (IUPAC) - Modified base abbreviations - Amino acid abbreviations - Modified and unusual Amino Acids - Genetic Code Tables - Country Names 7.5.1 Nucleotide base codes (IUPAC) Authority Nomenclature Committee of the International Union of Biochemistry Reference Cornish-Bowden, A. Nucl Acid Res 13, 3021-3030 (1985) Contact EMBL Scope Location descriptors Listing Symbol Meaning ------ ------- a a; adenine c c; cytosine g g; guanine t t; thymine in DNA; uracil in RNA m a or c r a or g w a or t s c or g y c or t k g or t v a or c or g; not t h a or c or t; not g d a or g or t; not c b c or g or t; not a n a or c or g or t 7.5.2 Modified base abbreviations Authority Sprinzl, M. and Gauss, D.H. Reference Sprinzl, M. and Gauss, D.H. Nucl Acid Res 10, r1 (1982). (note that in Cornish_Bowden, A. Nucl Acid Res 13, 3021-3030 (1985) the IUPAC-IUB declined to recommend a set of abbreviations for modified nucleotides) Contact NCBI Scope /mod_base Abbreviation Modified base description ------------ ------------------------- ac4c 4-acetylcytidine chm5u 5-(carboxyhydroxylmethyl)uridine cm 2'-O-methylcytidine cmnm5s2u 5-carboxymethylaminomethyl-2-thiouridine cmnm5u 5-carboxymethylaminomethyluridine d dihydrouridine fm 2'-O-methylpseudouridine gal q beta,D-galactosylqueosine gm 2'-O-methylguanosine i inosine i6a N6-isopentenyladenosine m1a 1-methyladenosine m1f 1-methylpseudouridine m1g 1-methylguanosine m1i 1-methylinosine m22g 2,2-dimethylguanosine m2a 2-methyladenosine m2g 2-methylguanosine m3c 3-methylcytidine m5c 5-methylcytidine m6a N6-methyladenosine m7g 7-methylguanosine mam5u 5-methylaminomethyluridine mam5s2u 5-methoxyaminomethyl-2-thiouridine man q beta,D-mannosylqueosine mcm5s2u 5-methoxycarbonylmethyl-2-thiouridine mcm5u 5-methoxycarbonylmethyluridine mo5u 5-methoxyuridine ms2i6a 2-methylthio-N6-isopentenyladenosine ms2t6a N-((9-beta-D-ribofuranosyl-2-methyltiopurine-6-yl)car bamoyl)threonine mt6a N-((9-beta-D-ribofuranosylpurine-6-yl)N-methyl-carbam oyl)threonine mv uridine-5-oxyacetic acid-methylester o5u uridine-5-oxyacetic acid (v) osyw wybutoxosine p pseudouridine q queosine s2c 2-thiocytidine s2t 5-methyl-2-thiouridine s2u 2-thiouridine s4u 4-thiouridine t 5-methyluridine t6a N-((9-beta-D-ribofuranosylpurine-6-yl)carbamoyl)threo nine tm 2'-O-methyl-5-methyluridine um 2'-O-methyluridine yw wybutosine x 3-(3-amino-3-carboxypropyl)uridine, (acp3)u OTHER (requires /note= qualifier) 7.5.3 Amino acid abbreviations Authority IUPAC-IUB Joint Commission on Biochemical Nomenclature. Reference IUPAC-IUB JOint Commission on Biochemical Nomenclature. Nomenclature and Symbolism for Amino Acids and Peptides. Eur. J. Biochem. 138:9-37(1984). Scope /anticodon, /codon, /transl_except Contact EMBL Listing (note that the abbreviations are legal values for amino acids, not the full names) Abbreviation Amino acid name ------------ --------------- Ala A Alanine Arg R Arginine Asn N Asparagine Asp D Aspartic acid (Aspartate) Cys C Cysteine Gln Q Glutamine Glu E Glutamic acid (Glutamate) Gly G Glycine His H Histidine Ile I Isoleucine Leu L Leucine Lys K Lysine Met M Methionine Phe F Phenylalanine Pro P Proline Ser S Serine Sec U Selenocysteine Thr T Threonine Trp W Tryptophan Tyr Y Tyrosine Val V Valine Asx B Aspartic acid or Asparagine Glx Z Glutamine or Glutamic acid. Xaa X Any amino acid. TERM termination codon 7.5.4 Modified and unusual Amino Acids Abbreviation Amino acid ------------ --------- Aad 2-Aminoadipic acid bAad 3-Aminoadipic acid bAla beta-Alanine, beta-Aminoproprionic acid Abu 2-Aminobutyric acid 4Abu 4-Aminobutyric acid, piperidinic acid Acp 6-Aminocaproic acid Ahe 2-Aminoheptanoic acid Aib 2-Aminoisobutyric acid bAib 3-Aminoisobutyric acid Apm 2-Aminopimelic acid Dbu 2,4-Diaminobutyric acid Des Desmosine Dpm 2,2'-Diaminopimelic acid Dpr 2,3-Diaminoproprionic acid EtGly N-Ethylglycine EtAsn N-Ethylasparagine Hyl Hydroxylysine aHyl allo-Hydroxylysine 3Hyp 3-Hydroxyproline 4Hyp 4-Hydroxyproline Ide Isodesmosine aIle allo-Isoleucine MeGly N-Methylglycine, sarcosine MeIle N-Methylisoleucine MeLys 6-N-Methyllysine MeVal N-Methylvaline Nva Norvaline Nle Norleucine Orn Ornithine OTHER (requires /note=) 7.5.5 Genetic Code Tables Authority International Sequence Databank Collaboration Contact NCBI Scope /transl_table qualifier URL http://www3.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c Genetic Code [1] Standard Code (transl_table=1) AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = ---M---------------M---------------M---------------------------- Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [2] Vertebrate Mitochondrial Code (transl_table=2) AAs = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG Starts = --------------------------------MMMM---------------M------------ Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [3] Yeast Mitochondrial Code (transl_table=3) AAs = FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = ----------------------------------MM---------------------------- Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [4] Mold, Protozoan, Coelenterate Mitochondrial Code & Mycoplasma/Spiroplasma Code (transl_table=4) AAs = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = --MM---------------M------------MMMM---------------M------------ Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [5] Invertebrate Mitochondrial Code (transl_table=5) AAs = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG Starts = ---M----------------------------MMMM---------------M------------ Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [6] Ciliate, Dasycladacean and Hexamita Nuclear Code (transl_table=6) AAs = FFLLSSSSYYQQCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = -----------------------------------M---------------------------- Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [9] Echinoderm and Flatworm Mitochondrial Code (transl_table=9) AAs = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG Starts = -----------------------------------M---------------M------------ Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [10] Euplotid Nuclear Code (transl_table=10) AAs = FFLLSSSSYY**CCCWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = -----------------------------------M---------------------------- Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [11] Bacterial and Plant Plastid Code (transl_table=11) AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = ---M---------------M------------MMMM---------------M------------ Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [12] Alternative Yeast Nuclear Code (transl_table=12) AAs = FFLLSSSSYY**CC*WLLLSPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = -------------------M---------------M---------------------------- Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [13] Ascidian Mitochondrial Code (transl_table=13) AAs = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG Starts = -----------------------------------M---------------------------- Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [14] Alternative Flatworm Mitochondrial Code (transl_table=14) AAs = FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG Starts = -----------------------------------M---------------------------- Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [15] Blepharisma Nuclear Code (transl_table=15) AAs = FFLLSSSSYY*QCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = -----------------------------------M---------------------------- Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [16] Chlorophycean Mitochondrial Code (transl_table=16) AAs = FFLLSSSSYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = -----------------------------------M---------------------------- Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [21] Trematode Mitochondrial Code (transl_table=21) AAs = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNNKSSSSVVVVAAAADDEEGGGG Starts = -----------------------------------M---------------M------------ Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [22] Scenedesmus obliquus mitochondrial AAs = FFLLSS*SYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = -----------------------------------M---------------------------- Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Genetic Code [23] Thraustochytrium Mitochondrial Code (transl_table=23) AAs = FF*LSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = --------------------------------M--M---------------M------------ Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG 7.5.6 Country Names Authority International Sequence Databank Collaboration Contact NCBI Scope /country qualifier URL http://www.ncbi.nlm.nih.gov/projects/collab/country.html