Derived Data - or
More Investments in Science Leads to More More Science

John Rumble, Jr.

National Institute of Standards and Technology, USA

President, CODATA

Derived Data


Derived data - data collections that arise from analysis of more primitive data collections

Derived Data

Today’s Talk
  • Examples
  • General Features
  • Looking into the future
  • Derived Data - Examples

    Protein structure
  • Proteins - sequences of amino acids
  • Structure and function determined by sequences
  • Can compare protein sequences to

  • other proteins
    individual genes
    interacting biomolecules

    Derived Data - Examples

    Protein structure

  • Protein Data Bank Macromolecular Structure Database (Rutgers, NIST, UCal San Diego)
  • Important to include linkages to exact or almost exact sequences in other molecules
  • Can be very computer intensive
  • Easiest to maintain separate linked databases with sequences and neighbors

    Derived Data - Examples

    Electron transport in gases

  • Derive microscopicelectron-atom and electron-molecule interactions from macroscopic measurements
  • Use kinetic theory to link
  • Need comprehensive data collections for both microscopic and macroscopic measurements


    Derived Data - Examples

    Electron transport in gases

  • Important for electrical industry, lighting, lasers
  • Can also calculate microscopic data via quantum mechanics
  • Self-consistent data very difficult to achieve
  • Especially difficult for gas mixtures

    Derived Data - Examples

    Properties of engineering materials

    Derived Data

    Today’s Talk

  • Examples
  • General Features
  • Looking into the future
  • Derived Data - General Features

    Derived data

    Based on data collections that are

    high quality
    often built for other uses
    have complete metadata computerized


    Derived Data - General Features

  • Derived data often separately done by another group
  • Often based on physical theory unimportant or unrelated to initial data collection
  • Often disseminated totally separate from initial data collection
  • Can be intimately linked to specific application

    Derived Data - General Features


    Derived Data

    Today’s Talk

  • Examples
  • General Features

    Looking into the future

    Derived Data - The Future


    Derived Data - The Future

  • New data derivation techniques emerging
  • Knowledge discovery, neural networks, data mining, property object models, expert systems, others
  • New statistical and mathematical approaches
  • Linking data collections from different disciplines

    solar exposure and materials degradation
    health records with toxic substance disposal
    climate records with evolution records


    Derived Data - The Future


    What we have evolving is a new source of scientific discovery - large scale databases as the focal point of research

    Derived Data - The Future

    Derived Data - The Future

    Derived Data - The Future


    Derived Data - The Future