The table
that lists the schema change has three columns.
The first two columns show the build id. The third column has change
description.
Table names
are in bold. Only columns are indented and only columns with changes are shown
under the table.
Please see
data dictionary for full table details.
‘ -- -- --‘ means no
corresponding objects ( table or column) for the build.
For example,
in build 118, a new table “SNPAlleleFreq” is added.
Date last
updated: August 18, 2004
Build 121 |
Build 122 |
Note |
-- -- -- |
ChiSqPValueLookUp |
This table replaces ChiSqPValue. This is used to get the p-value from ChiSq when computing H-W Prob. |
|
|
|
dn_Allele_rev |
dn_Allele_rev |
This table is used
internally for processing. |
rev_flag |
rev_flag |
|
Datatype:bit |
Datatype:tinyint |
|
|
|
|
-- -- -- |
dn_Gty_rev |
Used for internal
processing. |
|
|
|
dn_snpFxnCnt |
dn_snpFxnCnt |
This is a summary table
showing count of snp in each functional class(intron, coding etc). |
-- -- -- |
tax_id |
Added tax_id
to this table since we start to have genomes for more organisms. |
|
|
|
dn_table_rowcount |
dn_table_rowcount |
This table lists the row
count of each table. When users re-create their own dbSNP,
they can use this table to check if they data were loaded OK. |
org_prefix |
-- -- -- |
org_prefix and tax_id is not used
yet. Currently, the table rowcount just the rowcount for a table. In the future when we have organism
division databases, we may have tables shared by multiple organism.
tax_id field will be used
to show row count per tax_id. |
-- -- -- |
tax_id |
|
|
|
|
FreqSummaryBySsPop |
FreqSummaryBySsPop |
Column type changes for
this table. Float to real to save storage space. |
source |
source |
|
Datatype:char(2) |
Datatype:varchar(1) |
We compute H-W prob from either individual genotype data(I),
or just genotype frequency (P). So we only need varcchar(1). |
|
|
|
chisq |
chisq |
|
Datatype:float |
Datatype:real |
|
Null Option:NOT
NULL |
Null Option:NULL |
|
|
|
|
df |
df |
|
Null Option:NOT
NULL |
Null Option:NULL |
|
|
|
|
hwp |
hwp |
|
Datatype:float |
Datatype:real |
|
Null Option:NOT
NULL |
Null Option:NULL |
|
|
|
|
-- -- -- |
ggaContigExon |
The new set of chicken
mapping table are added. They all have prefix 'gga'. Organism prefix is defined in OrganismTax.prefix column. |
|
|
|
-- -- -- |
ggaContigInfo |
|
|
|
|
-- -- -- |
ggaMapLink |
|
|
|
|
-- -- -- |
ggaMapLinkPid |
|
|
|
|
-- -- -- |
ggaMapList |
|
|
|
|
-- -- -- |
ggaSNPContigLoc |
|
|
|
|
-- -- -- |
ggaSNPContigLocusId |
|
|
|
|
-- -- -- |
ggaSNPMapInfo |
|
|
|
|
-- -- -- |
IndGrpCode |
Added this new table to
group individuals. See the data in IndGrpCode for a
list of groups. Currently, the grouping only applies to human. |
|
|
|
IndivBySource |
IndivBySource |
|
-- -- -- |
src_ind_grp |
This column stores
grouping supplied by sample source ,such as Coriell. |
|
|
|
Individual |
Individual |
|
-- -- -- |
ind_grp |
This column stores the
grouping code assigned by dbSNP. |
|
|
|
IndivSourceCode |
IndivSourceCode |
|
-- -- -- |
src_type |
Added this column to
distinguish various sources. For example, Coriell's
src_type is "repository" and CEPH is
"curator". |
|
|
|
-- -- -- |
display_order |
One individual often has
several id from different source. This value will be
used to order the display. |
|
|
|
-- -- -- |
IndMergeArch |
As we get more individual
sample information, we may see that previously two separated individuals are
actually the same. This table keeps individual merging history. This is not
used yet. |
|
|
|
|
|
|
miscMapLink |
miscMapLink |
|
-- -- -- |
i_tax_id |
For organisms with no NCBI
genome assembly, the hit to GenBank accessions are
stored in this table. Since there're multiple organisms, tax_id
was added. But since tax_id is in SNP table, we
could get tax_id by joining SNP and this table. So
this column may be unnecessary and be taken out in future schema
modifications. |
|
|
|
Publication |
Publication |
|
asn |
-- -- -- |
This field was not used. |
|
|
|
SNPHWProb |
SNPHWProb |
This table stays basically
the same except the column name listed below. The column name is not
consistent with table FreqSummaryBySsPop which has
H-W prob for each ss over an population. |
degreeFreedom |
df |
|
|
|
|
chi_sq |
chisq |
|
|
|
|
hw_prob |
hwp |
|
|
|
|
-- -- -- |
ind_cnt |
Added column ind_cnt for individual count. |
|
|
|
source |
-- -- -- |
This field is removed
because it is not as meaningful for an refSNP(rs) as the source in FreqSummaryBySsPop: some member ss
may have individual genotype while other may only have genotype frequency. |
|
|
|
SubmittedIndividual |
SubmittedIndividual |
|
-- -- -- |
loc_ind_grp |
loc_ind_grp will store the individual group name submitted by
submitter. |
|
|
|
-- -- -- |
UniVar_Valid_Allele |
This table is used
internally. |
Date last
updated: June 2, 2004
To keep dbSNP in manageable size,
we have partitioned four large tables by ss# and
moved the data with ss# less than 20 million to a
supplemental database called dbSNP_sup.
The four large tables that have been partitioned are:
SubSNPSeq3, SubSNPSeq5, SubSNPCommLine and SubSNPAcc.
For ss# between 1 and 10,000,000,
it is stored in table <tablename>_p1, where
<tablename> represents the above four tables.
For ss# between 10,000,001 and
20,000,000, it is stored in table <tablename>_p2
Currently, we do not have ss# over
30,000,000, when we do, we will create <tablename>_p3 for ss# between
20,000,001 and 30,000,000,
For example:
SubSNPSeq3_p1 has 3’ sequence for all ss#
between 1 and 10,000,000.
SubSNPSeq3_p2 has 3’ sequence for all ss#
between 10,000,001 and 20,000,000.
SubSNPSeq3_p1 and SubSNPSeq3_p1 still have the same column
definitions as the original SubSNPSeq3.
In dbSNP, for ss# over 20,000,000, we still store them in dbSNP, the main database. We renamed these four tables by adding a suffix “_ins” to note that we will be add new data into these tables as new submissions come in. The original names for these four tables are now partitioned views across dbSNP and dbSNP_sup having all ss#. For more details on partitioned view,
This change will be reflected in ftp bcp files. Using SubSNPSeq3 as an example, instead of having a huge file for SubSNPSeq3 prior to this buid, we now have bcp file for SubSNPSeq_p1, SubSNPSeq_p2 and SubSNPSeq_ins. Between builds, sequence for old snp seldom change (they only change when submitter updates them which is rare). We will have separate small update file for SubSNPSeq3_p1 and SubSNPSeq3_p2. So for each new build, users only need to get the new bcp file from SubSNPSeq3_ins.
Date last
updated: April 24, 2004
Note: ER diagram and Data dictionary for build 120
will be updated on soon.
Build 119 |
Build 120 |
Note |
agContigInfo |
agContigInfo |
Please see change note in
"Build 117 Schema Change - ContigInfo" in
this document. |
-- -- -- |
group_term |
|
-- -- -- |
group_label |
|
-- -- -- |
contig_label |
|
|
|
|
agSNPMapInfo |
agSNPMapInfo |
Please see change note in
"Build 117 Schema Change-SNPMapInfo" in
this document. |
-- -- -- |
hap_cnt |
|
-- -- -- |
unplaced_cnt |
|
-- -- -- |
ref_cnt |
|
-- -- -- |
assembly_cnt |
|
|
|
|
-- -- -- |
dn_table_rowcount |
Added this table to simply
store the row count for each table. So users who create local dbSNP can compare the counts. |
|
|
|
-- -- -- |
dn_validationSummary |
An internal validation
table. |
|
|
|
Individual |
Individual |
SubmittedIndividual has tax_id due to
annotation process. It is possible several submitted individuals are from the
same sample. This curated sample data is in
Individual table. So to normalize data, tax_id should be in Individual table. Currently, we have tax_id in both SubmittedIndividual
and Individual tables. Eventually, we will only have tax_id
in Individual table. |
-- -- -- |
tax_id |
|
|
|
|
SNPHapMapPriority |
SNPHapMapPriority |
|
priority_code |
priority_code |
|
Datatype:tinyint
|
Datatype:smallint |
HapMapClassCode has 9 codes, so priority_code
needs to be a smallint. |
Date last
updated: Jan. 30, 2004
Note: ER diagram and Data dictionary for build 119
will be updated on soon.
Please see
most recent data dictionary at: ftp://ftp.ncbi.nih.gov/snp/mssql/schema/DataDictionary_b118.html
Build 118 |
Build 119 |
Note |
|
|
|
-- -- -- |
dgContigInfo |
Added 'dg' prefixed tables
for dog snp mapping data. |
-- -- -- |
dgSNPContigLoc |
|
-- -- -- |
dgSNPContigLocusId |
|
-- -- -- |
dgSNPMapInfo |
|
|
|
|
-- -- -- |
dn_chr_locusList |
This table is a de-normalized
table, storing list of locus_id for each contig chromosomes. This is used in individual genotype
summary page. |
|
|
|
-- -- -- |
dn_gty_rsCnt_byChr |
This table is a de-normalized
table, storing genotyped snp count for each
individual on chromosomes. |
|
|
|
-- -- -- |
dn_gty_rsCnt_byLocus |
This table is a
de-normalized table, storing genotyped snp count
for each individual on a locus. |
|
|
|
MethodClass |
MethodClass |
|
|
|
|
-- -- -- |
validation_status |
Added validation_status.
Value 0 means the method is computational. 1 - method
can be considered validating when multiple ss are in the same rs
cluster. |
|
|
|
|
|
|
SNPContigLocusId |
SNPContigLocusId |
Added mrna_ver
and protein_ver. Contig_acc,
mrna_acc and protein_acc
fields are changed to varcahr(15) to be consistent with accession fields in other
tables such as SNPContigLoc. The following int
field are changed to tinyint:
contig_ver, fxn_class, reading_frame to save space. |
|
|
|
SubmittedIndividual |
SubmittedIndividual |
|
-- -- -- |
loc_ind_id |
Submitter individual id
can be case sensitive. For ex. Mouse strain name use case in special ways.
This field is needed to preserve case information. |
Date last updated: Nov. 14, 2003
Please see data dictionary
at: ftp://ftp.ncbi.nih.gov/snp/mssql/schema/DataDictionary_b118.html
Build 117 |
Build 118 |
Note |
|
|
|
-- -- -- |
dgMapLink |
Added mapping data for
dog. |
|
|
|
-- -- -- |
dgMapLinkPid |
Added mapping data for
dog. |
|
|
|
dn_batchCount |
dn_batchCount |
|
build_id |
-- -- -- |
We will keep snp counts in each batch for current batch only. So there
is no need for this column. |
|
|
|
dn_handleCount |
dn_handleCount |
|
build_id |
-- -- -- |
We will keep snp counts in each batch for current batch only. So there
is no need for this column. |
|
|
|
dn_IND_batchCount |
dn_IND_batchCount |
|
build_id |
-- -- -- |
We will keep snp counts in each batch for current batch only. So there
is no need for this column. |
|
|
|
SubSNPGenotypeSum |
FreqSummaryBySsPop |
This table is renamed to be
consistent with the two tables used to compute this summary: GtyFreqBySsPop and GtyFreqBySsPop.
The new name clearly shows that the data ins for each SubSNP
within a specific population. |
source |
source |
|
Datatype:char(1) |
Datatype:char(2) |
Use char(2): GF - source
data is genotype frequency, IG - source data is individual genotype, AF -
source data is allele frequency. |
non_founder_cnt |
non_founder_ind_cnt |
Renamed so it is clear
that the number in this column count of individuals, not chromosomes. |
-- -- -- |
het |
heterozygosity |
-- -- -- |
het_se |
standard error for the
above heterozygosity. |
|
|
|
mmContigInfo |
mmContigInfo |
The changes in this table
is the same in ContigInfo. Please see change
description for ContigInfo in build 117. |
contig_class |
-- -- -- |
|
-- -- -- |
group_term |
|
-- -- -- |
group_label |
|
-- -- -- |
contig_label |
|
|
|
|
mmSNPContigLocusId |
mmSNPContigLocusId |
|
-- -- -- |
mrna_ver |
Added this to keep track
the version of the mrna accession, like the contig_ver. |
-- -- -- |
protein_ver |
Added this to keep track
the version of a protein accession, like the contig_ver. |
|
|
|
mmSNPMapInfo |
mmSNPMapInfo |
The changes in this table
are the same in ContigInfo. Please see change
description for ContigInfo in build 117. |
-- -- -- |
ref_cnt |
|
-- -- -- |
assembly_cnt |
|
|
|
|
-- -- -- |
SNPAlleleFreq |
This new table keeps the
average allele frequency for refSNP. |
|
|
|
-- -- -- |
SNPGtyFreq |
This new table keeps the
average genotype freqeuncy for refSNP. |
|
|
|
SubmittedIndividual |
SubmittedIndividual |
|
-- -- -- |
loc_ind_alias |
Added this column to track
the different id for a submitted individual for genotype data. |
|
|
|
t_vwSNP_avgFreq |
-- -- -- |
This is replaced with SNPAlleleFreq. |
|
|
|
t_vwSNP_MinorAlleleCnt2More |
-- -- -- |
This table is no longer needed. |
|
|
|
t_vwSubPopGty_alleleFreq |
-- -- -- |
This table is replaced
with GtyFreqBySsPop. |
|
|
|
t_vwSubSNP_validationByMinorAlleleCnt |
-- -- -- |
This table is no longer
needed. |
|
|
|
Date last updated:
Note: Data dictionary and ER diagram for build 117 will be available on Oct.17, 2003.
Build 116 |
Build 117 |
Note |
|
|
|
Allele |
Allele
|
Removed 3 columns in
Allele table because an allele may have many motifs (ex. (GAT)10AAT(GAT)5).
So keep allele motif relationship in enw table AlleleMotif. |
motif_id |
-- -- -- |
|
repeat_cnt |
-- -- -- |
|
rev_allele |
-- -- -- |
|
-- -- -- |
last_updated_time |
|
|
|
|
-- -- -- |
|
Added this table to save
computed allele frequency per subsnp and
population. The frequency computed from individual genotype takes the highest
precedence which then takes precedence over the allele frequency computed
from genotype frequency. When none of the above data is available, we use the
submitted allele frequency. |
|
|
|
-- -- --
|
AlleleMotif |
Added this table to keep
the relationship between an allele and all its motif. |
|
|
|
ContigInfo |
contig_class was used to track the haplotype
data of humans and the strain data of mouse for each of the various contigs. But it became clear that different ncbi assembly tasks required different ways of grouping
the alternatives so the single category contig_class
was broken out into three fields. For dbSNP, the group_term gives the best breakdown of the various
alternative forms, but contig_label and/or group_label give a better idea of the common name
associated with haplotype, strain, alternate_assembly, etc. |
|
contig_class |
-- -- -- |
|
-- -- -- |
group_term |
id used to group contigs together for annotation purposes. This allows |
-- -- -- |
group_label |
Label used to label an
assembly in the MapViewer. |
-- -- -- |
contig_label |
Label used to label a
particular contig in an assembly. For |
|
|
|
-- -- -- |
GtyFreqBySsPop |
Added this table to save computed
genotype frequency per subsnp and population. The
frequency computed from individual genotype takes the precedence over
submitted genotype frequency. |
|
|
|
ObsVariation |
ObsVariation |
|
-- -- -- |
var_flag |
refers to VarFlagCode.code |
|
|
|
PopClassUniqCount |
-- -- -- |
Not used. |
|
|
|
--
-- -- |
rnContigExon |
Added mapping information
for rat, with prefix 'rn' added to mapping tables.
This note also applies to the following 7 tables. |
|
|
|
-- -- -- |
rnContigInfo |
|
|
|
|
-- -- -- |
rnMapLink |
|
|
|
|
-- -- -- |
rnMapLinkPid |
|
|
|
|
-- -- -- |
rnMapList |
|
|
|
|
-- -- -- |
rnSNPContigLoc |
|
|
|
|
-- -- -- |
rnSNPContigLocusId |
|
|
|
|
-- -- -- |
rnSNPMapInfo |
|
|
|
|
SnpFunctionCode |
SnpFunctionCode |
|
-- -- -- |
is_coding |
Added this column for
internal query. |
|
|
|
SNPMapInfo |
SNPMapInfo |
|
-- -- -- |
ref_cnt |
set greater than zero if
the snp maps onto the reference strand. |
-- -- -- |
assembly_cnt |
Counts the number of alternate
assemblies (excluding reference) which the snp
hits. |
|
|
|
sts_markers |
-- -- -- |
|
|
|
|
SubPop |
SubPop |
Change some float column
to real to save storage space. |
|
|
|
-- -- -- |
ValidGtyByVariation |
Internal use currently. |
|
|
|
-- -- -- |
VarFlagCode |
Some submitted variation
pattern is problematic. Define codes to track problems. |
Date:
The following details all schema changes between build 116 and 115, sorted by table names.
Table names are in bold. Only columns are indented and only columns with changes are shown under the table. See data dictionary for full table details.
‘ -- -- --‘ means no corresponding objects ( table or
column) for the build.
For example, in build 116, pop_cnt is added to table dn_batchCount.
Please see ER diagram for
more details: ftp://ftp.ncbi.nih.gov/snp/mssql/schema/erd_dbSNP.pdf
Please see data dictionary
at: ftp://ftp.ncbi.nih.gov/snp/mssql/schema/DataDictionary_b116.html
Build 116 |
Build 115 |
Note |
dn_batchCount |
dn_batchCount |
Count to speed up query. |
pop_cnt |
-- -- -- |
For frequency and genotype batch, added count
of population and count of individual. |
ind_cnt |
-- -- -- |
|
|
|
|
dn_handleCount |
dn_handleCount |
|
rs_cnt |
rs_cnt |
Make the column nullable, so 'NULL' means not computed yet, while '0'
means the result. |
Null Option:NULL |
Null Option:NOT
NULL |
|
rs_validated_cnt |
rs_validated_cnt |
|
Null Option:NULL |
Null Option:NOT
NULL |
|
|
|
|
dn_IND_batchCount |
-- -- -- |
This count table helps
speed up "New Batches" web query. |
|
|
|
-- -- -- |
dn_new_batch |
dn_new_xxx_batch tables are no longer needed. It was used
internally to support query most recent batches. Now web page uses Batch.submit_time to decide submission order. |
-- -- -- |
dn_new_ind_batch |
|
|
|
|
-- -- -- |
dn_new_nov_batch |
|
|
|
|
-- -- -- |
dn_new_pop_batch |
|
|
|
|
-- -- -- |
dn_new_snp_batch |
|
|
|
|
GtyAllele |
GtyAllele |
The new table makes it
easy to compute allele frequency from genotype submission on forward or
reverse strand. |
-- -- -- |
allele_id |
|
rev_flag |
-- -- -- |
|
fwd_allele_id |
-- -- -- |
|
unigty_id |
-- -- -- |
|
create_time |
create_time |
|
Null Option:NOT
NULL |
Null Option: |
|
Default:-- -- -- |
Default:DefCurTime |
|
last_updated_time |
-- -- -- |
|
|
|
|
IndivAlias |
-- -- -- |
Genotype submission includes
individual id. Different submitters sometimes use different individual id for
the same DNA sample. Ex. Mouse strain
'AKR' or 'AKR/J' are the same. |
|
|
|
IndivBySource |
IndivBySource |
Minor database change |
src_id |
src_id |
|
Datatype:int |
Datatype:numeric(3) |
|
|
|
|
IndivSourceCode |
IndivSourceCode |
Minor database change |
code |
code |
|
Datatype:int |
Datatype:numeric(3) |
|
|
|
|
LoadHistory |
LoadHistory |
|
-- -- -- |
create_time |
Each build releases
different component in slight different time.
Now each individual component release time is stored here. |
ftp_done_date |
-- -- -- |
|
entrez_done_date |
-- -- -- |
|
blastdb_done_date |
-- -- -- |
|
web_date |
-- -- -- |
|
|
|
|
mmSNPMapInfo |
mmSNPMapInfo |
See data dictionary. |
hap_cnt |
-- -- -- |
|
unplaced_cnt |
-- -- -- |
|
|
|
|
ObsGenotype |
ObsGenotype |
See data dictionary. |
-- -- -- |
submitted_strand_code |
|
-- -- -- |
obs_ss_fwd |
|
-- -- -- |
create_time |
|
-- -- -- |
unigty_id |
|
-- -- -- |
allele_id_1 |
|
-- -- -- |
allele_id_2 |
|
|
|
|
ObsVariation |
ObsVariation |
subsnp_clas and iupack_code are
moved to table UniVariation. |
-- -- -- |
subsnp_class |
|
univar_id |
univar_id |
|
Null Option:NULL |
Null Option:NOT
NULL |
|
-- -- -- |
iupack_code |
|
|
|
|
OrganismTax |
OrganismTax |
Added division_cd
in preparation to split dbSNP database into several
divisions. Divisions are loosely based on GenBank
divisions. |
division_cd |
-- -- -- |
|
Population |
Population |
When curating
submitted individuals, it helps to know the source of the population
samples. So src_id
is added. For ex. If we know a population's source is NIHPDR, then we know
'1' or 'P001' or 'PD001' is the same sample in NIHPDR panel. We currently
manually set this field based on population description. |
src_id |
-- -- -- |
|
|
|
|
SNPContigLocusId |
SNPContigLocusId |
|
fxn_class |
fxn_class |
data type changed from int to tinyint to save storage. |
reading_frame |
reading_frame |
same as above. |
contig_ver |
contig_ver |
same as above. |
|
|
|
SNP |
SNP |
|
-- -- -- |
snp_type |
snp_type was used to record snp
withdraw reason code defined in SnpTypes. We
currently do not keep withdrawn snp in SNP table.
Withdrawn snp is in SNPHistory.
|
tax_id |
-- -- -- |
tax_id, snp_class and validation_status was orignianlly
in SNP table. From build 111 to build 115, they were moved to t_vwSNP to faciliate database
processing. But during database size increase in build 116, views proved too
slow to merit the separating these three fields. So starting build 116, these
three fields are back in SNP table. |
snp_class |
-- -- -- |
|
validation_status |
-- -- -- |
|
|
|
|
|
|
|
SnpFunctionCode |
SnpFunctionCode |
|
top_level_class |
-- -- -- |
Added this field to group
the functional class code into "exon" and
"other". |
|
|
|
SNPHWProb |
SNPHWProb |
|
chi_sq |
chi_sq |
float' takes 8 byte
storage. 'Real' takes 4 byte. In dbSNP, 'real'
offers enough precision. |
Datatype:real |
Datatype:float |
|
hw_prob |
hw_prob |
|
Datatype:real |
Datatype:float |
|
|
|
|
SNPMapInfo |
SNPMapInfo |
|
hap_cnt |
-- -- -- |
|
unplaced_cnt |
-- -- -- |
|
|
|
|
|
|
|
SnpValidationCode |
SnpValidationCode |
|
abbrev |
abbrev |
|
Datatype:varchar(40) |
Datatype:varchar(20) |
With the addition of
double hit validation type, the abbreviation string gets longer. |
create_time |
-- -- -- |
|
last_updated_time |
-- -- -- |
|
|
|
|
-- -- -- |
SNPValBy_Illumina_Apr1003
|
The snp
list is this table is now in SNPVal and used to set
the validation_status for the snp. |
|
|
|
-- -- -- |
SNPValBy_PERLEGEN_OCT0802 |
The snp
list is this table is now in SNPVal and used to set
the validation_status for the snp. |
|
|
|
StrandCode |
StrandCode |
|
rs_to_ss_orien |
-- -- -- |
See data in this table to
see why this new column is needed. |
|
|
|
-- -- -- |
sts_markers |
|
|
|
|
SubInd |
SubInd |
Streamlined SubInd table by normalizing observed genotype into ObsGenotype table. |
-- -- -- |
allele_cnt |
|
-- -- -- |
a_bit |
|
-- -- -- |
c_bit |
|
-- -- -- |
g_bit |
|
-- -- -- |
t_bit |
|
-- -- -- |
del_bit |
|
-- -- -- |
oth_bit |
|
-- -- -- |
observed |
|
submitted_rs |
-- -- -- |
|
|
|
|
SubIndDeletedBySubmitter |
-- -- -- |
Added this table for
internal tracking. |
|
|
|
SubPop |
SubPop |
|
subpop_id |
-- -- -- |
Added this column to
shorten primary key, especially since we have added more tables referencing SubPop table, such as SubPopGty. |
|
|
|
SubPopAllele |
SubPopAllele |
|
subpop_id |
-- -- -- |
See above. In the future SubPopAllele will use this column to reference SubPop so batch_id, pop_id,subsnp_id,type will no be needed. These four
columns are planed to be dropped in build 117. |
|
|
|
SubPopDeletedBySubmitter |
SubPopDeletedBySubmitter |
Added this table for
internal tracking. |
-- -- -- |
build_id |
|
delete_build_id |
-- -- -- |
|
|
|
|
SubPopGty |
-- -- -- |
In the past, genotype
frequency were stored in SubPopAllele. But
submitted genotype frequency has different property from submitted allele
frequency. So this table is created to keep just genotype frequency to faciliate computing, for ex, to derive allele frequency
based on genotype frequency. |
|
|
|
SubPopHet |
-- -- -- |
This table is created, but
not populated yet. We plan to move the heterozygosity
frequency from SubPopAllele into this table in
build 117. |
|
|
|
SubPopRsMerge |
-- -- -- |
Submitters may submit
frequency on an rs that was already merged to
another rs. This table keeps a record of this fact.
Since we always stored frequency on the rs exemplar
ss, the effect of this merging is that the
frequency may be stored under a different ss#. |
|
|
|
SubSNP |
SubSNP |
|
-- -- -- |
hw_prob |
SubSNPHWProb is used for this values. Less than 5% subsnp has data that make the computation of
Hardy-Weinberg probability possible. Storing this value in SubSNP table wastes space since SubSNP
has over 10 millions rows. |
-- -- -- |
subsnp_type |
This was intended to
indicate the withdraw detail of a subsnp. Now SubSNP table does not keep deleted snp.
Deleted SubSNP is tracked in SubSNPDeletedBySubmitter. |
top_or_bot_strand |
-- -- -- |
This is an experimental
way of identifying strand. |
validation_status |
-- -- -- |
Since build 111, we
started to compute validation status from a view. But with data growth, it is
no longer efficient to compute this value in each query. Maintaining a table copy of view t_vwSubSNP_validation defeats the beauty of views. So we
are returning this field back into SubSNP. |
|
|
|
SubSNPSeqPos |
-- -- -- |
In build 116, we received submission
of snp with flanking sequence over 200Kb for snp's in repeat regions and the submitter already know
the genome location of the snp. In this case,
storing all flanking sequence does not provide value since the flanks will be
repeat masked out anyway. So we save the accession and the snp position of the accession. |
|
|
|
SubSNPSeqTypeCode |
-- -- -- |
Explains the meaning of
'type' field in table SubSNPSeq3 and SubSNPSeq5. |
|
|
|
-- -- -- |
t_vwSNP |
|
|
|
|
t_vwSubPopGty_alleleFreq |
-- -- -- |
|
|
|
|
-- -- -- |
t_vwSubSNP_validation |
See note for SubSNP.validation_status. |
|
|
|
t_vwSubSNP_validationByMinorAlleleCnt |
|
|
|
|
|
UniGty |
-- -- -- |
|
|
|
|
UniVariation |
-- -- -- |
|
Date:
- Dropped tables: dn_new_snp_batch, dn_new_pop_batch, dn_new_ind_batch, dn_new_nov_batch
These are internal table that are no longer needed.
- Added tax_id to SubmittedIndividual table because some populations group individuals from different organism.
Date:
Table
|
Column
|
Note
|
ObsGenotype
|
|
This table keeps all submitted genotypes and submitted strand code in obs and submitted_strand_code
columns. Obs_upp
is the all uppercase version of obs. Obs_ss_fwd is the
genotype on the forward strand of the SubSNP.
Ex. If genotype is submitted as A/C|STRAND_SS_REV,
then obs has “A/C”, obs_ss_fwd
has “G/T”.
|
SubSNPGenoSum is renamed to SubSNPGenotypeSum.
|
|
Source has value “I” or “P”:
“I” means the genotype frequency is summarized from individual genotypes. “P” means the genotype frequency is submitted for a population. |
SubSNPHWProb
|
|
Source has same means as in SubSNPGenotypeSum.source
|
SubPopDeletedBySubmitter
|
|
Added this table to keep track deleted data, so summary statistics
needs to be recomputed for the affected batch and refSNP
cluster.
|
t_vwSNP
|
-
Dropped rarely used columns.
- Current columns are snp_id, validation_status, snp_class and tax_id, create_time ( is the time this row is inserted ). |
|
Dn_new_snp_batch
Dn_new_pop_batch Dn_new_ind_batch Dn_new_nov_batch |
|
dropped
|
Dn_new_batch
|
|
Added to replace dn_new_<batch_type>_batch, where <batch_type>
has value of snp,pop,ind or nov.
|
Date:
Table
|
Column
|
Note
|
SubSNP
|
|
To get the observed variation and subsnp_class
for a give subsnp, use ObsVariation.pattern,
join SubSNP and ObsVariation
on SubSNP.variation_id=ObsVariation.var_id
|
Batch
|
·
Organism is replaced with OrganismTax.organism.
|
Join Batch and OrganismTax on tax_id to get the organism of a batch.
|
Date:
There are a few major schema changes between build 110 and build 111.
Table |
Column |
Note |
t_vwSubSNP_validation - replace SubSNP.validaton_status |
validation_status |
- t_vwSubSNP_validation is a direct copy from view vwSubSNP_validation. Please see vwSubSNP_validation.sql for details. |
SNPSeqPick - This is a view. It replaces
the old table with the same name and columns: SNPSeqPick. |
snp_id, subsnp_id |
- see SNPSeqPick.sql for view details. |
t_vwSNP - replace the same columns in
SNP. |
PCR_confirmed_ind assay_sample_size cnt_subsnp indv_homozygosity_obs_ind num_segregate popdata_sample_size snp_class tax_id total_sample_size validation_status |
t_vwSNP is direct copy from view vwSNP. Please see vwSNP.sql for details on how each field is computed. |
SubPopAllele |
cnt cnt_int |
- cnt is changed to type float. This will help eliminate rounding error in computing average allele frequency etc. - cnt_int will be deleted after cnt is set to be number of chromosomes for allele frequency and number of individuals for genotype and heterozygosity data. |
Table |
Column |
Note |
Further change in Build 112 |
SubSNP |
variation_id |
variation_id is foreign key to ObsVariation (var_id). |
- SubSNP.observed will be replaced with ObsVariation.pattern. - SubSNP.subsnp_class will replaced with ObsVariation.subsnp_class. |
Motif |
motif_id motif, rev_motif_id |
This table keeps all unique motifs in an allele. For ex. An observed variation is (CA)2/3/4/6. Allele table will have four rows: (CA)2, (CA)3, (CA)4, (CA)6. Each row with the same motif_id, referring to one motif record of motif (CA). - rev_motif_id is id of the reverse motif. For ex. (CA)’s reverse motif is (TG). rev_motif_id is the motif_id for motif (TG). |
|
Allele |
allele_id, allele motif_id repeat_cnt |
- This table keeps all unique occurrence of an observed allele in a variation submission. |
|
VariAllele |
var_id, allele_id |
- This table associates a variation with 2 or more alleles. |
|
ObsVariation |
var_id, pattern, subsnp_class |
- This table keeps all unique submitted observed field from the old SubSNP.observed field. |
|
ObsGenotype |
gty_id, var_id, allele_id1, allele_id2 |
For an observed variation, this table lists all possible genotypes. For ex. If an observed variation is ‘A/C’, then there could possibly be three genotypes: A/A, A/C, C/C. |
|
SubInd |
gty_id |
foreign key to ObsGenotype(gty_id) |
- The following columns in SubInd will be replace with gty_id: observed, allele_cnt a_bit, c_bit g_bit t_bit del_bit oth_bit |
SubPopAllele |
allele_id |
foreign key to Allele(allele_id). |
-The following columns in SubPopAllele will be replaced with allele_id: allele, other |
t_vwSNP_avgFreq - This is a table directly copied from view vwSNP_avgFreq. |
snp_id, allele, freq |
|
- This information is previously only available on refSNP page. |
ChiSqPValue |
chi_sq df pvalue |
This is a lookup table with chi-square probability distribution use by Hardy-weinberg probability computation. |
|