Bookshelf logo  
PubMed Nucleotide Protein Genome Structure PopSet

Books


Overview


How to access the books


Information for authors and publishers


Project background


FAQs


Contact us


   Background

The first book to be made available at NCBI was Molecular Biology of the Cell, 3rd edition, by Bruce Alberts, Dennis Bray, Julian Lewis, Martin Raff, Keith Roberts, and James D. Watson, published by Garland Publishing, Inc. Molecular Biology of the Cell is one of the most widely used undergraduate textbooks in molecular and cell biology.

The wide use and positive feedback from people using the book at NCBI, as well as significant interest from other authors, editors, and publishers, has meant that we have expanded the project to include more books.

Linked to PubMed abstracts and now available for direct searching, the expanding book list provides further information on a broad range of topics central to molecular and cell biology. It is envisioned that the selection of books available for consultation will grow to include new topics and different approaches to biology and medicine.

Rather than mirroring the textbook in print (i.e., treating the book as a whole, to be read sequentially from the first page to the last) we have divided the book into units of content based on the organization of chapters, sections, subsections, etc., within the book. The entry point for a user is a page within the book, found either by a search, or through a link from a relevant PubMed abstract.

Once browsing this book page, it is possible to navigate around a whole unit of content. The size of the unit of content and its interconnection with other parts of the book will depend on both the organization of the book and the wishes of the publisher.

   Linking the books with PubMed

Linking to PubMed. References cited in the books are linked to their PubMed abstracts by using the Citation Matcher. This gives the reader a starting point to further explore the literature using PubMed's "Related Articles" function. Because books usually carry established knowledge, the references cited within them are often several years old; by using "Related Articles", a reader can move forward in time to find similar, more recent articles to those cited in the book.

Linking from PubMed. The book sections are accessed from PubMed by an indexing system based on natural language expressions, developed by W.J. Wilbur and colleagues at NCBI. First, the text from the book is carved up into all the possible one- or multi-word phrases that do not include stop words (such as "of", "but", "the", etc.). These phrases are then used to search PubMed abstracts and their titles to generate a list of phrases that appear both in PubMed and the book. The list is edited to remove those that do not convey useful meaning; those that remain constitute the index. Each section of the book is listed alongside the phrases it contains.

A statistical weighting system based on the frequency of occurrence of each phrase in a book section, relative to the rest of the book, is used to identify "good" phrases. A phrase that appears repeatedly in only a few sections, and rarely in other parts of the book, indicates a definitive phrase for those few sections and therefore ranks highly. Furthermore, the appearance of a phrase in the title, for example, has greater value than one appearing in the text alone.

Each PubMed abstract can thus be linked to the appropriate textbook pages. This method allows two very dissimilar types of text - the dense, focused PubMed abstracts and the more descriptive book text - to find common ground.