Derived Data - or
More Investments in Science Leads to More More Science
John
Rumble, Jr.
National
Institute of Standards and Technology, USA
President,
CODATA
Derived Data
Derived data - data collections that
arise from analysis of more primitive data collections
Derived Data
Today’s Talk
Examples
General Features
Looking into the future
Derived Data - Examples
Protein structure
Proteins - sequences of amino acids
Structure and function determined by sequences
Can compare protein sequences to
other proteins
DNA
individual genes
interacting biomolecules
Derived Data - Examples
Protein structure
Protein Data Bank Macromolecular Structure Database (Rutgers, NIST, UCal San
Diego)
Important to include linkages to exact or almost exact sequences in other
molecules
Can be very computer intensive
Easiest to maintain separate linked databases with sequences and neighbors
Derived Data - Examples
Electron transport in gases
Derive microscopicelectron-atom and electron-molecule interactions
from macroscopic measurements
Use kinetic theory to link
Need comprehensive data collections for both microscopic and macroscopic
measurements
Derived Data - Examples
Electron transport in gases
Important for electrical industry, lighting, lasers
Can also calculate microscopic data via quantum mechanics
Self-consistent data very difficult to achieve
Especially difficult for gas mixtures
Derived Data - Examples
Properties of engineering materials
- Design values based on statistical analysis
Mandatory values used to design critical components
- Design limiting properties
Indicative value used for preliminary materials selection
- Both based on many individual measurements that give range and dispersion
of possible
measurements
- Derived totally differently
Derived Data
Today’s Talk
Examples
General Features
Looking into the future
Derived Data - General Features
Derived data
Based on data collections that are
comprehensive
high quality
often built for other uses
have complete metadata computerized
Derived Data - General Features
Derived data often separately done by another group
Often based on physical theory unimportant or unrelated to initial data collection
Often disseminated totally separate from initial data collection
Can be intimately linked to specific application
Derived Data - General Features
- Modern database management facilitates generating derived data
- Imposes data uniformity, makes metadata linkage easier, separates out independent
variables
Derived Data
Today’s Talk
Examples
General Features
Looking into the future
Derived Data - The Future
- Larger and more comprehensive databases becoming available
- More measurement done more easily - crystal structure, microscopic measurements,
advanced instruments, earth observations
- Data recording and exchange standards slowly progressing
- Data quality improving - evaluation progressing and being computerized
Derived Data - The Future
New data derivation techniques emerging
Knowledge discovery, neural networks, data mining, property object models,
expert systems, others
New statistical and mathematical approaches
Linking data collections from different disciplines
solar exposure and materials degradation
health records with toxic substance disposal
climate records with evolution records
Derived Data - The Future
What we have evolving is a new source of scientific discovery
- large scale databases as the focal point of research
Derived Data - The Future
- Obviously data collections have supported research since its beginning
- New research paradigms are emerging
- Information- and data-based research yielding new insights, perhaps
discovered poorly without modern computers
- Perhaps can compare to discovery of Greek, Latin and Arabic texts at start
of the Renaisance
Derived Data - The Future
- The Internet adds a new dimension
- Access without travel
- Not only collaboration across distance, but discovery of unknown resources
- Remember only 50 years into the computer revolution and less than ten in
the Internet revolution
Derived Data - The Future
- Another revolution - full text online
- Impossible to gauge impact so close to the start
- Access to thought as well as information and data
- If we could only capture all the existing record
400 years of scientific publications, 2000 years of observations
- New monitors,
electronic books, automated scanning
Derived Data - The Future
- Scientific progress is based on observation, and computerized databases
expand our field of observation
- It won’t go away - It will just get better
- A
Challenge to CODATA to be a leader