In compliance with the taxonomy database and the associated Toolkit developed and maintained by NCBI, the Structure Group and taxonomists of the NCBI have undertaken taxonomy annotation for structure data in Entrez. A semi-automated approach has been implemented in which a human expert checks, corrects, and validates automatic assignments. The PDBeast software tool has been developed for this purpose. It parses text descriptions of "Source Organisms" from either the original PDB Entries or user-specified strings, and looks for matches in the NCBI Taxonomy database to finally record taxonomy assignments.

A set of rules has been established by and for those involved in the annotation process. Each individual chain in an MMDB Entry has to be annotated separately, because components of larger complexes may have diverse origins. So far more than 10.000 entries, for a total of over 18.000 chains, have been linked to a proper node in the taxonomy tree. Following the MMDB update cycle, taxonomy assignments are done monthly when new structure data are imported from PDB.

Structure taxonomy annotation can be used within the NCBI Entrez query system. Queries for structures from certain species or under a certain taxa node are now made possible. The source organisms for all protein/DNA/RNA chains that have been assigned are summarized in a table, which can be downloaded. Despite efforts to maintain a high level of quality in the annotation, MMDB's taxonomy assignments might contain errors, and we encourage the users who happen to spot misassignments to report these to the NCBI Help desk.

