Schema Change Document Description:

 

The table that lists the schema change has three columns.  The first two columns show the build id. The third column has change description.

Table names are in bold. Only columns are indented and only columns with changes are shown under the table.

Please see data dictionary for full table details.

 ‘ -- -- --‘ means no corresponding objects ( table or column) for the build.

For example, in build 118, a new table “SNPAlleleFreq” is added.

 

Build 122 Schema Change

Date last updated: August 18, 2004

 

Build 121

Build 122

Note

    -- -- --                           

    ChiSqPValueLookUp                  

This table replaces ChiSqPValue. This is used to get the p-value from ChiSq when computing H-W Prob.

 

 

 

    dn_Allele_rev                       

    dn_Allele_rev                      

This table is used internally for processing.

        rev_flag                       

        rev_flag                       

 

          Datatype:bit                 

          Datatype:tinyint             

 

 

 

 

    -- -- --                           

    dn_Gty_rev                         

Used for internal processing.

 

 

 

    dn_snpFxnCnt                       

    dn_snpFxnCnt                       

This is a summary table showing count of snp in each functional class(intron, coding etc).

        -- -- --                       

        tax_id                         

Added tax_id to this table since we start to have genomes for more organisms.

 

 

 

    dn_table_rowcount                  

    dn_table_rowcount                  

This table lists the row count of each table. When users re-create their own dbSNP, they can use this table to check if they data were loaded OK.

        org_prefix                     

        -- -- --                       

org_prefix and tax_id is not used yet. Currently, the table rowcount just the rowcount for a table. In the future when we have organism division databases, we may have tables shared by multiple organism. tax_id field will be used to show row count per tax_id.

        -- -- --                       

        tax_id                         

 

 

 

 

    FreqSummaryBySsPop                 

    FreqSummaryBySsPop                 

Column type changes for this table. Float to real to save storage space.

        source                         

        source                         

 

          Datatype:char(2)             

          Datatype:varchar(1)          

We compute H-W prob from either individual genotype data(I), or just genotype frequency (P). So we only need varcchar(1).

 

 

 

        chisq                          

        chisq                          

 

          Datatype:float               

          Datatype:real                

 

          Null Option:NOT NULL         

          Null Option:NULL             

 

 

 

 

        df                              

        df                             

 

          Null Option:NOT NULL         

          Null Option:NULL             

 

 

 

 

        hwp                            

        hwp                            

 

          Datatype:float               

          Datatype:real                

 

          Null Option:NOT NULL         

          Null Option:NULL             

 

 

 

 

    -- -- --                           

    ggaContigExon                       

The new set of chicken mapping table are added. They all have prefix 'gga'.  Organism prefix is defined in OrganismTax.prefix column.

 

 

 

    -- -- --                           

    ggaContigInfo                      

 

 

 

    -- -- --                           

    ggaMapLink                         

 

 

 

    -- -- --                           

    ggaMapLinkPid                      

 

 

 

    -- -- --                           

    ggaMapList                          

 

 

 

 

    -- -- --                           

    ggaSNPContigLoc                    

 

 

 

    -- -- --                           

    ggaSNPContigLocusId                

 

 

 

    -- -- --                           

    ggaSNPMapInfo                      

 

 

 

    -- -- --                           

    IndGrpCode                         

Added this new table to group individuals. See the data in IndGrpCode for a list of groups. Currently, the grouping only applies to human.

 

 

 

    IndivBySource                      

    IndivBySource                      

        -- -- --                       

        src_ind_grp                    

This column stores grouping supplied by sample source ,such as Coriell.

 

 

 

    Individual                         

    Individual                         

 

        -- -- --                       

        ind_grp                        

This column stores the grouping code assigned by dbSNP.

 

 

 

    IndivSourceCode                    

    IndivSourceCode                    

        -- -- --                       

        src_type                       

Added this column to distinguish various sources. For example, Coriell's src_type is "repository" and CEPH is "curator".

 

 

 

        -- -- --                       

        display_order                  

One individual often has several id from different source. This value will be used to order the display.

 

 

 

    -- -- --                           

    IndMergeArch                       

As we get more individual sample information, we may see that previously two separated individuals are actually the same. This table keeps individual merging history. This is not used yet.

 

 

 

 

 

 

    miscMapLink                        

    miscMapLink                        

        -- -- --                       

        i_tax_id                       

For organisms with no NCBI genome assembly, the hit to GenBank accessions are stored in this table. Since there're multiple organisms, tax_id was added. But since tax_id is in SNP table, we could get tax_id by joining SNP and this table. So this column may be unnecessary and be taken out in future schema modifications.

 

 

 

    Publication                        

    Publication                        

 

        asn                            

        -- -- --                       

This field was not used.

 

 

 

    SNPHWProb                          

    SNPHWProb                          

This table stays basically the same except the column name listed below. The column name is not consistent with table FreqSummaryBySsPop which has H-W prob for each ss over an population.

        degreeFreedom                  

        df                             

 

 

 

 

        chi_sq                         

        chisq                          

 

 

 

 

        hw_prob                        

        hwp                            

 

 

 

 

        -- -- --                       

        ind_cnt                         

Added column ind_cnt for individual count.

 

 

 

        source                         

        -- -- --                       

This field is removed because it is not as meaningful for an refSNP(rs) as the source in FreqSummaryBySsPop: some member ss may have individual genotype while other may only have genotype frequency.

 

 

 

    SubmittedIndividual                

    SubmittedIndividual                

        -- -- --                       

        loc_ind_grp                    

loc_ind_grp will store the individual group name submitted by submitter.

 

 

 

    -- -- --                           

    UniVar_Valid_Allele                

This table is used internally.

 

 

Build 121 Schema Change

Date last updated: June 2, 2004

 

To keep dbSNP in manageable size, we have partitioned four large tables by ss# and moved the data with ss# less than 20 million to a supplemental database called dbSNP_sup.

The four large tables that have been partitioned are: SubSNPSeq3, SubSNPSeq5, SubSNPCommLine and SubSNPAcc.

 

For ss# between 1 and 10,000,000, it is stored in table <tablename>_p1, where <tablename> represents the above four tables.

For ss# between 10,000,001 and 20,000,000, it is stored in table <tablename>_p2

Currently, we do not have ss# over 30,000,000, when we do, we will create <tablename>_p3 for ss# between 20,000,001 and 30,000,000,

 

 For example:

SubSNPSeq3_p1 has 3’ sequence for all ss# between 1 and 10,000,000.

SubSNPSeq3_p2 has 3’ sequence for all ss# between 10,000,001 and 20,000,000.

 

SubSNPSeq3_p1 and SubSNPSeq3_p1 still have the same column definitions as the original SubSNPSeq3.

 

In dbSNP, for ss# over 20,000,000, we still store them in dbSNP, the main database.  We renamed these four tables by adding a suffix “_ins” to note that we will be add new data into these tables as new submissions come in.   The original names for these four tables are now partitioned views across dbSNP and dbSNP_sup having all ss#.  For more details on partitioned view,

 

This change will be reflected in ftp bcp files. Using SubSNPSeq3 as an example, instead of having a huge file for SubSNPSeq3 prior to this buid, we now have bcp file for SubSNPSeq_p1, SubSNPSeq_p2 and SubSNPSeq_ins.  Between builds, sequence for old snp seldom change (they only change when submitter updates them which is rare). We will have separate small update file for SubSNPSeq3_p1 and SubSNPSeq3_p2.  So for each new build, users only need to get the new bcp file from SubSNPSeq3_ins.

 

 

Build 120 Schema Change

Date last updated: April 24, 2004

Note:  ER diagram and Data dictionary for build 120 will be updated on soon.

 

 

Build 119

Build 120

Note

    agContigInfo                       

    agContigInfo                        

Please see change note in "Build 117 Schema Change - ContigInfo" in this document.

        -- -- --                       

        group_term                     

 

        -- -- --                       

        group_label                     

 

        -- -- --                       

        contig_label                   

 

 

 

 

    agSNPMapInfo                       

    agSNPMapInfo               

Please see change note in "Build 117 Schema Change-SNPMapInfo" in this document.

        -- -- --                       

        hap_cnt                        

 

        -- -- --                       

        unplaced_cnt                   

 

        -- -- --                       

        ref_cnt                        

 

        -- -- --                       

        assembly_cnt                   

 

 

 

 

    -- -- --                           

    dn_table_rowcount

Added this table to simply store the row count for each table. So users who create local dbSNP can compare the counts.

 

 

 

    -- -- --                           

    dn_validationSummary

An internal validation table.

 

 

 

    Individual                          

    Individual                  

SubmittedIndividual has tax_id due to annotation process. It is possible several submitted individuals are from the same sample. This curated sample data is in Individual table. So to normalize data,  tax_id should be in Individual table. Currently, we have tax_id in both SubmittedIndividual and Individual tables. Eventually, we will only have tax_id in Individual table.

        -- -- --                       

        tax_id                         

 

 

 

 

SNPHapMapPriority

    SNPHapMapPriority

 

        priority_code                  

        priority_code                  

 

Datatype:tinyint             

          Datatype:smallint            

HapMapClassCode has 9 codes, so priority_code needs to be a smallint.

 

Build 119 Schema Change

Date last updated: Jan. 30, 2004

Note:  ER diagram and Data dictionary for build 119 will be updated on soon.

Please see most recent data dictionary at: ftp://ftp.ncbi.nih.gov/snp/mssql/schema/DataDictionary_b118.html

 

Build 118

Build 119

Note

 

 

 

    -- -- --                           

    dgContigInfo                        

Added 'dg' prefixed tables for dog snp mapping data.

    -- -- --                           

    dgSNPContigLoc                     

 

    -- -- --                           

    dgSNPContigLocusId                 

 

    -- -- --                           

    dgSNPMapInfo                       

 

 

 

 

    -- -- --                           

    dn_chr_locusList                   

This table is a de-normalized table, storing list of locus_id for each contig chromosomes. This is used in individual genotype summary page.

 

 

 

    -- -- --                           

    dn_gty_rsCnt_byChr                 

This table is a de-normalized table, storing genotyped snp count for each individual on chromosomes.

 

 

 

    -- -- --                           

    dn_gty_rsCnt_byLocus               

This table is a de-normalized table, storing genotyped snp count for each individual on a locus.

 

 

 

    MethodClass                        

    MethodClass                        

 

 

 

 

        -- -- --                       

        validation_status              

Added validation_status. Value 0 means the method is computational. 1 - method can be considered validating when multiple ss  are in the same rs cluster.

 

 

 

 

 

 

    SNPContigLocusId                   

    SNPContigLocusId                    

Added mrna_ver and protein_ver. Contig_acc, mrna_acc and protein_acc fields are changed to varcahr(15) to be consistent with accession fields in other tables such as SNPContigLoc.  The following int field are changed to tinyint: contig_ver, fxn_class, reading_frame to save space.

 

 

 

    SubmittedIndividual                

    SubmittedIndividual                

 

        -- -- --                       

        loc_ind_id                     

Submitter individual id can be case sensitive. For ex. Mouse strain name use case in special ways. This field is needed to preserve case information.

 

Build 118 Schema Change

Date last updated: Nov. 14, 2003

Please see data dictionary at: ftp://ftp.ncbi.nih.gov/snp/mssql/schema/DataDictionary_b118.html

 

Build 117

Build 118

Note

 

 

 

    -- -- --                           

    dgMapLink                          

Added mapping data for dog.

 

 

 

    -- -- --                           

    dgMapLinkPid                       

Added mapping data for dog.

 

 

 

    dn_batchCount                      

    dn_batchCount                      

 

        build_id                       

        -- -- --                       

We will keep snp counts in each batch for current batch only. So there is no need for this column.

 

 

 

    dn_handleCount                     

    dn_handleCount                     

 

        build_id                       

        -- -- --                       

We will keep snp counts in each batch for current batch only. So there is no need for this column.

 

 

 

    dn_IND_batchCount                  

    dn_IND_batchCount                  

 

        build_id                       

        -- -- --                        

We will keep snp counts in each batch for current batch only. So there is no need for this column.

 

 

 

    SubSNPGenotypeSum                  

    FreqSummaryBySsPop                 

This table is renamed to be consistent with the two tables used to compute this summary: GtyFreqBySsPop and GtyFreqBySsPop. The new name clearly shows that the data ins for each SubSNP within a specific population.

        source                         

        source                          

 

          Datatype:char(1)             

          Datatype:char(2)             

Use char(2): GF - source data is genotype frequency, IG - source data is individual genotype, AF - source data is allele frequency.

        non_founder_cnt                

        non_founder_ind_cnt            

Renamed so it is clear that the number in this column count of individuals, not chromosomes.

        -- -- --                       

        het                             

heterozygosity

        -- -- --                       

        het_se                         

standard error for the above heterozygosity.

 

 

 

    mmContigInfo                       

    mmContigInfo                       

The changes in this table is the same in ContigInfo. Please see change description for ContigInfo in build 117.

        contig_class                   

        -- -- --                       

 

        -- -- --                       

        group_term                     

 

        -- -- --                       

        group_label                    

 

        -- -- --                       

        contig_label                   

 

 

 

 

    mmSNPContigLocusId                 

    mmSNPContigLocusId                 

 

        -- -- --                       

        mrna_ver                       

Added this to keep track the version of the mrna accession, like the contig_ver.

        -- -- --                       

        protein_ver                    

Added this to keep track the version of a protein accession, like the contig_ver.

 

 

 

    mmSNPMapInfo                       

    mmSNPMapInfo                        

The changes in this table are the same in ContigInfo. Please see change description for ContigInfo in build 117.

        -- -- --                       

        ref_cnt                        

 

        -- -- --                        

        assembly_cnt                   

 

 

 

 

    -- -- --                           

    SNPAlleleFreq                      

This new table keeps the average allele frequency for refSNP.

 

 

 

    -- -- --                            

    SNPGtyFreq                         

This new table keeps the average genotype freqeuncy for refSNP.

 

 

 

    SubmittedIndividual                

    SubmittedIndividual                

 

        -- -- --                        

        loc_ind_alias                  

Added this column to track the different id for a submitted individual for genotype data.

 

 

 

    t_vwSNP_avgFreq                    

    -- -- --                           

This is replaced with SNPAlleleFreq.

 

 

 

    t_vwSNP_MinorAlleleCnt2More        

    -- -- --                           

This  table is no longer needed.

 

 

 

    t_vwSubPopGty_alleleFreq           

    -- -- --                            

This table is replaced with GtyFreqBySsPop.

 

 

 

 t_vwSubSNP_validationByMinorAlleleCnt 

    -- -- --                           

This table is no longer needed.

 

 

 

 

Build 117 Schema Change

 

Date last updated: Oct. 16, 2003

Note:  Data dictionary and ER diagram for build 117 will be available on Oct.17, 2003.

 

Build 116

Build 117

Note

 

 

 

    Allele                             

    Allele                              

Removed 3 columns in Allele table because an allele may have many motifs (ex. (GAT)10AAT(GAT)5). So keep allele motif relationship in enw table AlleleMotif.

        motif_id                       

        -- -- --                        

 

        repeat_cnt                     

        -- -- --                       

 

        rev_allele                     

        -- -- --                       

 

        -- -- --                       

        last_updated_time              

 

 

 

 

    -- -- --                           

 

Added this table to save computed allele frequency per subsnp and population. The frequency computed from individual genotype takes the highest precedence which then takes precedence over the allele frequency computed from genotype frequency. When none of the above data is available, we use the submitted allele frequency.

 

 

 

    -- -- --                            

    AlleleMotif                        

Added this table to keep the relationship between an allele and all its motif.

 

 

 

    ContigInfo           

ContigInfo

contig_class was used to track the haplotype data of humans and the strain data of mouse for each of the various contigs. But it became clear that different ncbi assembly tasks required different ways of grouping the alternatives so the single category contig_class was broken out into three fields. For dbSNP, the group_term gives the best breakdown of the various alternative forms, but contig_label and/or group_label give a better idea of the common name associated with haplotype, strain, alternate_assembly, etc.

        contig_class                   

        -- -- --                       

 

        -- -- --                       

        group_term                     

id used to group contigs together for annotation purposes. This allows
                annotaters to group contigs by assembly (see note below for PAR
                special case).
                Possible values:
                        ref_[strain/haplotype] (haplotype human, strain others)
                        alt_[strain/haplotype]
                        ref_PAR
                        alt_assembly
                the ref_ and alt_ prefixes assume that these are contigs
                of type NT/NW and the NCBI-assembly team will be loading these to ID.
                ref_PAR: The pseudoautosomal region requires special processing.
                The Y contigs of the PAR region will have this group_term to
                make sure the NCBI-assembly team don't penalize features in this
  region for going to two chromosomes. These contigs will be joined back into the
                reference assembly upon loading to MapViewer
                alt_assembly: this designation is used when the NCBI-assembly
  team have an extra assembly
                for a chromosome and the NCBI-assembly team don't want to load
  the annotation/contigs from this chromosome to ID.

        -- -- --                       

        group_label                    

Label used to label an assembly in the MapViewer.
                On a non-sequence based map, this term can serve to group
                objects into a common coordinate system.
                Example values:
                        reference
                        C57BL/6J
                        129_substrain

        -- -- --                       

        contig_label                   

Label used to label a particular contig in an assembly. For
                example, in mouse, the NCBI-assembly team have grouped all contigs that are of
                type 129/??? for annotation purposes. When the NCBI-assembly team display a
                particular contig, it would be useful to display the
                actual strain name (129/Sv, 129/Ola, etc).
                Example values:
                        C57BL/6J
                        129/SvJ
                        129/SvEvTac

 

 

 

    -- -- --                           

    GtyFreqBySsPop                  

Added this table to save computed genotype frequency per subsnp and population. The frequency computed from individual genotype takes the precedence over submitted genotype frequency.

 

 

 

    ObsVariation                       

    ObsVariation                        

 

        -- -- --                       

        var_flag                       

refers to VarFlagCode.code

 

 

 

    PopClassUniqCount                  

    -- -- --                           

 Not used.

 

 

 

    -- -- --                           

    rnContigExon                       

Added mapping information for rat, with prefix 'rn' added to mapping tables. This note also applies to the following 7 tables.

 

 

 

    -- -- --                            

    rnContigInfo                       

 

 

 

 

    -- -- --                           

    rnMapLink                          

 

 

 

 

    -- -- --                           

    rnMapLinkPid                        

 

 

 

 

    -- -- --                           

    rnMapList                          

 

 

 

 

    -- -- --                           

    rnSNPContigLoc                     

 

 

 

 

    -- -- --                            

    rnSNPContigLocusId                 

 

 

 

 

    -- -- --                           

    rnSNPMapInfo                       

 

 

 

 

    SnpFunctionCode                    

    SnpFunctionCode                     

 

        -- -- --                       

        is_coding                      

Added this column for internal query.

 

 

 

    SNPMapInfo                         

    SNPMapInfo                         

 

        -- -- --                       

        ref_cnt                        

set greater than zero if the snp maps onto the reference strand.

        -- -- --                       

        assembly_cnt                   

Counts the number of alternate assemblies (excluding reference) which the snp hits.

 

 

 

    sts_markers                        

    -- -- --                           

 

 

 

 

    SubPop                             

    SubPop                              

Change some float column to real to save storage space.

 

 

 

    -- -- --                           

    ValidGtyByVariation                

Internal use currently.

 

 

 

    -- -- --                           

    VarFlagCode                        

Some submitted variation pattern is problematic. Define codes to track problems.

 

Build 116 Schema Change

 

Date: Aug 7, 2003

 

The following details all schema changes between build 116 and 115, sorted by table names.

Table names are in bold. Only columns are indented and only columns with changes are shown under the table. See data dictionary for full table details.

  -- -- --‘ means no corresponding objects ( table or column) for the build.

For example, in build 116, pop_cnt is added to table dn_batchCount.

 

Please see ER diagram for more details: ftp://ftp.ncbi.nih.gov/snp/mssql/schema/erd_dbSNP.pdf

Please see data dictionary at: ftp://ftp.ncbi.nih.gov/snp/mssql/schema/DataDictionary_b116.html

 

 

 

Build 116

Build 115

Note

    dn_batchCount                      

    dn_batchCount                      

Count to speed up query.

        pop_cnt                        

        -- -- --                       

 For frequency and genotype batch, added count of population and count of individual.

        ind_cnt                        

        -- -- --                       

 

 

 

 

    dn_handleCount                     

    dn_handleCount                     

        rs_cnt                          

        rs_cnt                         

Make the column nullable, so 'NULL' means not computed yet, while '0' means the result.

          Null Option:NULL             

          Null Option:NOT NULL         

        rs_validated_cnt               

        rs_validated_cnt               

          Null Option:NULL             

          Null Option:NOT NULL         

 

 

 

   dn_IND_batchCount

   -- -- --

This count table helps speed up "New Batches" web query.

 

 

 

    -- -- --                           

    dn_new_batch                   

dn_new_xxx_batch tables are no longer needed. It was used internally to support query most recent batches. Now web page uses Batch.submit_time to decide submission order.

    -- -- --                           

    dn_new_ind_batch                   

 

 

 

 

    -- -- --                           

    dn_new_nov_batch                    

 

 

 

    -- -- --                           

    dn_new_pop_batch                   

 

 

 

    -- -- --                           

    dn_new_snp_batch                   

 

 

 

    GtyAllele                           

    GtyAllele                          

The new table makes it easy to compute allele frequency from genotype submission on forward or reverse strand.

        -- -- --                       

        allele_id                      

 

        rev_flag                       

        -- -- --                       

 

        fwd_allele_id                  

        -- -- --                       

 

        unigty_id                      

        -- -- --                        

 

        create_time                    

        create_time                    

 

          Null Option:NOT NULL         

          Null Option:                 

 

          Default:-- -- --             

          Default:DefCurTime           

        last_updated_time              

        -- -- --                       

 

 

 

 

    IndivAlias                         

    -- -- --                           

Genotype submission includes individual id. Different submitters sometimes use different individual id for the same DNA sample.  Ex. Mouse strain 'AKR' or 'AKR/J' are the same.

 

 

 

    IndivBySource                      

    IndivBySource                      

Minor database change

        src_id                         

        src_id                         

 

          Datatype:int                 

          Datatype:numeric(3)          

 

 

 

    IndivSourceCode                    

    IndivSourceCode                    

Minor database change

        code                           

        code                           

 

          Datatype:int                 

          Datatype:numeric(3)          

 

 

 

    LoadHistory                        

    LoadHistory                        

        -- -- --                       

        create_time                    

Each build releases different component in slight different time.  Now each individual component release time is stored here.

        ftp_done_date                  

        -- -- --                       

 

        entrez_done_date               

        -- -- --                       

 

        blastdb_done_date              

        -- -- --                       

 

        web_date                       

        -- -- --                       

 

 

 

 

    mmSNPMapInfo                       

    mmSNPMapInfo                        

See data dictionary.

        hap_cnt                        

        -- -- --                       

 

        unplaced_cnt                   

        -- -- --                       

 

 

 

 

    ObsGenotype                         

    ObsGenotype                        

See data dictionary.

        -- -- --                       

        submitted_strand_code          

        -- -- --                       

        obs_ss_fwd                     

        -- -- --                       

        create_time                    

 

        -- -- --                       

        unigty_id                      

 

        -- -- --                       

        allele_id_1                     

 

        -- -- --                       

        allele_id_2                    

 

 

 

 

    ObsVariation                       

    ObsVariation                       

subsnp_clas and iupack_code are moved to table UniVariation.

        -- -- --                       

        subsnp_class                   

        univar_id                      

        univar_id                      

 

          Null Option:NULL              

          Null Option:NOT NULL         

        -- -- --                       

        iupack_code                    

 

 

 

    OrganismTax                        

    OrganismTax                        

Added division_cd in preparation to split dbSNP database into several divisions. Divisions are loosely based on GenBank divisions.

        division_cd                    

        -- -- --                       

 

    Population                          

    Population                         

When curating submitted individuals, it helps to know the source of the population samples.  So src_id is added. For ex. If we know a population's source is NIHPDR, then we know '1' or 'P001' or 'PD001' is the same sample in NIHPDR panel. We currently manually set this field based on population description.

        src_id                         

        -- -- --                       

 

 

 

 

   SNPContigLocusId

   SNPContigLocusId

 

      fxn_class

      fxn_class

data type changed from int to tinyint to save storage.

      reading_frame

      reading_frame

same as above.

      contig_ver

      contig_ver

same as above.

 

 

 

    SNP                                 

    SNP                                

 

        -- -- --                       

        snp_type                       

snp_type was used to record snp withdraw reason code defined in SnpTypes. We currently do not keep withdrawn snp in SNP table. Withdrawn snp is in SNPHistory.

        tax_id                         

        -- -- --                       

tax_id, snp_class and validation_status was orignianlly in SNP table. From build 111 to build 115, they were moved to t_vwSNP to faciliate database processing. But during database size increase in build 116, views proved too slow to merit the separating these three fields. So starting build 116, these three fields are back in SNP table.

        snp_class                      

        -- -- --                       

 

        validation_status              

        -- -- --                       

 

 

 

 

 

 

 

    SnpFunctionCode                    

    SnpFunctionCode                     

        top_level_class                

        -- -- --                       

Added this field to group the functional class code into "exon" and "other".

 

 

 

    SNPHWProb                          

    SNPHWProb                           

        chi_sq                         

        chi_sq                         

float' takes 8 byte storage. 'Real' takes 4 byte. In dbSNP, 'real' offers enough precision.

          Datatype:real                

          Datatype:float               

 

        hw_prob                        

        hw_prob                        

 

          Datatype:real                

          Datatype:float               

 

 

 

 

    SNPMapInfo                          

    SNPMapInfo                         

        hap_cnt                        

        -- -- --                       

 

        unplaced_cnt                   

        -- -- --                       

 

 

 

 

 

 

 

    SnpValidationCode                  

    SnpValidationCode                  

        abbrev                         

        abbrev                         

 

          Datatype:varchar(40)         

          Datatype:varchar(20)         

With the addition of double hit validation type, the abbreviation string gets longer.

        create_time                    

        -- -- --                       

 

        last_updated_time              

        -- -- --                       

 

 

 

 

        -- -- --                       

   SNPValBy_Illumina_Apr1003                                                                                   

The snp list is this table is now in SNPVal and used to set the validation_status for the snp.

 

 

 

        -- -- --                       

   SNPValBy_PERLEGEN_OCT0802

The snp list is this table is now in SNPVal and used to set the validation_status for the snp.

 

 

 

    StrandCode                         

    StrandCode                         

        rs_to_ss_orien                 

        -- -- --                       

See data in this table to see why this new column is needed.

 

 

 

    -- -- --                           

    sts_markers                        

 

 

 

    SubInd                             

    SubInd                             

Streamlined SubInd table by normalizing observed genotype into ObsGenotype table.

        -- -- --                       

        allele_cnt                     

 

        -- -- --                       

        a_bit                          

 

        -- -- --                       

        c_bit                           

 

        -- -- --                       

        g_bit                          

 

        -- -- --                       

        t_bit                          

 

        -- -- --                       

        del_bit                         

 

        -- -- --                       

        oth_bit                        

 

        -- -- --                       

        observed                       

 

        submitted_rs                   

        -- -- --                       

 

 

 

 

    SubIndDeletedBySubmitter           

    -- -- --                           

Added this table for internal tracking.

 

 

 

    SubPop                             

    SubPop                              

        subpop_id                      

        -- -- --                       

Added this column to shorten primary key, especially since we have added more tables referencing SubPop table, such as SubPopGty.

 

 

 

    SubPopAllele                       

    SubPopAllele                       

        subpop_id                      

        -- -- --                       

See above. In the future SubPopAllele will use this column to reference SubPop so batch_id, pop_id,subsnp_id,type will no be needed. These four columns are planed to be dropped in build 117.

 

 

 

    SubPopDeletedBySubmitter           

    SubPopDeletedBySubmitter           

Added this table for internal tracking.

        -- -- --                       

        build_id                       

 

        delete_build_id                

        -- -- --                       

 

 

 

 

    SubPopGty                          

    -- -- --                            

In the past, genotype frequency were stored in SubPopAllele. But submitted genotype frequency has different property from submitted allele frequency. So this table is created to keep just genotype frequency to faciliate computing, for ex, to derive allele frequency based on genotype frequency.

 

 

 

    SubPopHet                          

    -- -- --                           

This table is created, but not populated yet. We plan to move the heterozygosity frequency from SubPopAllele into this table in build 117.

 

 

 

    SubPopRsMerge                      

    -- -- --                           

Submitters may submit frequency on an rs that was already merged to another rs. This table keeps a record of this fact. Since we always stored frequency on the rs exemplar ss, the effect of this merging is that the frequency may be stored under a different ss#.

 

 

 

    SubSNP                             

    SubSNP                             

        -- -- --                       

        hw_prob                        

SubSNPHWProb is used for this values. Less than 5% subsnp has data that make the computation of Hardy-Weinberg probability possible. Storing this value in SubSNP table wastes space since SubSNP has over 10 millions rows.

        -- -- --                       

        subsnp_type                    

This was intended to indicate the withdraw detail of a subsnp. Now SubSNP table does not keep deleted snp. Deleted SubSNP is tracked in SubSNPDeletedBySubmitter.

        top_or_bot_strand              

        -- -- --                       

This is an experimental way of identifying strand.

        validation_status              

        -- -- --                       

Since build 111, we started to compute validation status from a view. But with data growth, it is no longer efficient to compute this value in each query.  Maintaining a table copy of view t_vwSubSNP_validation defeats the beauty of views. So we are returning this field back into SubSNP.

 

 

 

    SubSNPSeqPos                       

    -- -- --                           

In build 116, we received submission of snp with flanking sequence over 200Kb for snp's in repeat regions and the submitter already know the genome location of the snp. In this case, storing all flanking sequence does not provide value since the flanks will be repeat masked out anyway. So we save the accession and the snp position of the accession.

 

 

 

    SubSNPSeqTypeCode                  

    -- -- --                           

Explains the meaning of 'type' field in table SubSNPSeq3 and SubSNPSeq5.

 

 

 

    -- -- --                           

    t_vwSNP                            

 

 

 

    t_vwSubPopGty_alleleFreq           

    -- -- --                           

 

 

 

 

    -- -- --                           

    t_vwSubSNP_validation              

See note for SubSNP.validation_status.

 

 

 

 t_vwSubSNP_validationByMinorAlleleCnt 

                                       

 

 

 

 

    UniGty                             

    -- -- --                            

 

 

 

 

    UniVariation                       

    -- -- --                           

 

 

 

Build 115 Schema Change

 

Date: June 6, 2003

 

-              Dropped tables: dn_new_snp_batch, dn_new_pop_batch, dn_new_ind_batch, dn_new_nov_batch

These are internal table that are no longer needed.

-              Added tax_id to SubmittedIndividual table because some populations group individuals from different organism.

              

 

Build 114 Schema Change

Date:                            April 29, 2003

 

Table

Column

Note

ObsGenotype

 

This table keeps all submitted genotypes and submitted strand code in obs and submitted_strand_code columns.  Obs_upp is the all uppercase version of obs.  Obs_ss_fwd is the genotype on the forward strand of the SubSNP. Ex.  If genotype is submitted as A/C|STRAND_SS_REV, then obs has “A/C”, obs_ss_fwd has “G/T”.

SubSNPGenoSum is renamed to SubSNPGenotypeSum.

  • Freq_obs and genotype_obs are dropped.
  • Hw_prob is renamed to hwp
  • degreeFreedom is renamed to df
  • chi_sq is renamed to chisq

Source has value “I” or “P”:

“I” means the genotype frequency is summarized from individual genotypes.

“P” means the genotype frequency is submitted for a population.

SubSNPHWProb

  • Hw_prob is renamed to hwp
  • degreeFreedom is renamed to df
  • chi_sq is renamed to chisq

Source has same means as in SubSNPGenotypeSum.source

SubPopDeletedBySubmitter

 

Added this table to keep track deleted data, so summary statistics needs to be recomputed for the affected batch and refSNP cluster.

t_vwSNP

-        Dropped rarely used columns.

- Current columns are snp_id, validation_status, snp_class and tax_id, create_time ( is the time this row is inserted ).

 

Dn_new_snp_batch

Dn_new_pop_batch

Dn_new_ind_batch

Dn_new_nov_batch

 

dropped

Dn_new_batch

 

Added to replace dn_new_<batch_type>_batch, where <batch_type> has value of snp,pop,ind or nov.

 

 

Build 112 Schema Change

Date:                            April 2, 2003

 

Table

Column

Note

SubSNP

  • observed is replaced with ObsVariation.pattern. 
  • subsnp_class is replaced with ObsVariation.subsnp_class.
  • Dropped columns allele_cnt,a_bit,c_bit,g_bit,t_bit,del_bit,oth_bit,allele_bits

To get the observed variation and subsnp_class for a give subsnp, use ObsVariation.pattern, join SubSNP and ObsVariation on SubSNP.variation_id=ObsVariation.var_id

Batch

·        Organism is replaced with OrganismTax.organism.

Join Batch and OrganismTax on tax_id to get the organism of a batch.

 

Build 111 Schema Change

 

Date:                            March 23, 2003

 

There are a few major schema changes between build 110 and build 111. 

 

Tables/Columns in Build 111 that are replacing old tables/columns in build 110:

 

Table

Column

Note

t_vwSubSNP_validation

 

- replace SubSNP.validaton_status

 

 

validation_status

- t_vwSubSNP_validation is a direct copy from view vwSubSNP_validation.  Please see vwSubSNP_validation.sql for details.

SNPSeqPick

 

- This is a view. It replaces the old table with the same name and columns: SNPSeqPick.

snp_id,

subsnp_id

- see SNPSeqPick.sql for view details.

t_vwSNP

 

- replace the same columns in SNP.

 

PCR_confirmed_ind

assay_sample_size

cnt_subsnp

indv_homozygosity_obs_ind

num_segregate

popdata_sample_size

snp_class

tax_id

total_sample_size

validation_status

t_vwSNP is direct copy from view vwSNP. Please see vwSNP.sql for details on how each field is computed. 

SubPopAllele

cnt

cnt_int

- cnt is changed to type float. This will help eliminate rounding error in computing average allele frequency etc.

- cnt_int will be deleted after cnt is set to be number of chromosomes for allele frequency and number of individuals for genotype and heterozygosity data.

 

New table and columns:

 

Table

Column

Note

Further change in Build 112

SubSNP

variation_id

variation_id is foreign key to ObsVariation (var_id).

- SubSNP.observed will be replaced with ObsVariation.pattern.

- SubSNP.subsnp_class will replaced with ObsVariation.subsnp_class.

Motif

motif_id

motif,

rev_motif_id

This table keeps all unique motifs in an allele.

For ex. An observed variation is (CA)2/3/4/6.

Allele table will have

four rows: (CA)2, (CA)3, (CA)4, (CA)6. 

Each row

with the same motif_id, referring to one motif

record of  motif (CA).

-  rev_motif_id is id of the reverse motif. For ex. (CA)’s reverse motif is (TG). rev_motif_id is the motif_id for motif (TG).

 

Allele

allele_id,

allele

motif_id

repeat_cnt

- This table keeps all unique occurrence of an observed allele in a variation submission.

 

VariAllele

var_id,

allele_id

- This table associates a variation with 2 or more alleles.

 

ObsVariation

var_id,

pattern,

subsnp_class

- This table keeps all unique submitted observed field from the old SubSNP.observed field.  

 

ObsGenotype

gty_id,

var_id,

allele_id1,

allele_id2

For an observed variation, this table lists all possible genotypes.  For ex. If an observed variation is ‘A/C’, then there could possibly be three genotypes: A/A, A/C, C/C.

 

SubInd

gty_id

foreign key to ObsGenotype(gty_id)

- The following columns in SubInd will be replace with gty_id:

observed,

allele_cnt

a_bit,

c_bit

g_bit

t_bit

del_bit

oth_bit

 SubPopAllele

allele_id

foreign key to Allele(allele_id).

-The following columns in SubPopAllele will be replaced with allele_id:

allele, 

other

t_vwSNP_avgFreq

 

- This is a table directly copied from view vwSNP_avgFreq.

snp_id,

allele,

freq

 

- This information is previously only available on refSNP page.

ChiSqPValue

chi_sq

df

pvalue

This is a lookup table with chi-square probability distribution use by Hardy-weinberg probability computation.

 

 

 

 

Schema Change from builds before 111: