Charles Petzold



Google Books Needs to be More Like JSTOR

September 8, 2007
New York, N.Y.

When I first heard about Google's plan to scan the contents of the major libraries of the world, it seemed like a great idea, but it wasn't books I was considering. Books are certainly important, and it would be great to perform searches across collections of books. But books are already well catalogued by libraries, and it's fairly straightforward to hunt down books that touch on particular topics.

Instead, I was anticipating a revolution in researching periodicals. It is in centuries worth of periodicals — everything from scholarly journals with circulations in the hundreds to popular magazines and newspapers — that I think most of the world's collective knowledge is encapsulated. This is where digitization would compensate for the historic deficiencies in periodical cataloguing.

Cataloguing these periodicals would mean treating each issue as an entity in itself, but also treating it as a collection of articles with distinct titles and authors. For example, if two articles overlap on the same page, a search hit would have to indicate not only the page of the magazine, but also the title and author of the particular article. This is a tough cataloging job, for sure, but I was confident that Google had the ability to do it because a successful model already existed known as JSTOR.

JSTOR ("journal store") was founded in 1995 to provide online access to scholarly journals, including at least one journal with issues going back to the late 17th century. To get at this collection you need to access www.jstor.org from an IP address of a JSTOR subscriber, who are pretty much limited to libraries and universities.

(Tip to New York City residents: I've been unsuccessful accessing JSTOR from my own notebook plugged into one of the Ethernet ports in Room 315 at the Humanities & Social Sciences Library; I've had much better luck downstairs at SIBL along the wall in the back reading room.)

Yesterday I described my travails searching for the phrase "Turing Machine" in Google Books. Here's a 150K 1400×1017 screen shot of a JSTOR search I performed earlier today at SIBL for the phrase "Turing Machine" occurring in publications from 1940 to 1950. I got five hits. Notice that I was able to sort them by date. Three of these hits are "reviews," that is, discussions of articles in other journals. But notice the important information included with each item:

From this list of search results, JSTOR lets you view a scan of the article, print it, or download it as a TIFF or PDF. The high-resolution PDF files are, I believe, 600 DPI scans with a cover page that includes the precise bibliographic information to cite this article in something you're writing.

Alternatively, you can use JSTOR's browse facility to locate particular issues of a periodical and then page through the entire issue.

The big problem with JSTOR is that while it includes a whole bunch of journals, it's missing many many more. It doesn't get anywhere close to popular magazines. This is where I hoped Google Books would fill the gap. And it didn't really bother me that Google Books might not actually show you the article on screen. As long as I had a title and author I might know if this article was really something I needed to see. And as long as I had the name of the magazine and the issue date, I could go to the actual library to look it up.

But Google Books doesn't even give you enough information to do that. With Google Books, you get the name of the magazine and possibly the issue date. Most often you get the date that the magazine began publication. You might possibly figure out the title of the article from an abbreviated table of contents, but pretty much forget about knowing who the author is.

I am aware that Google Books is in beta and hence can't be expected to be perfect. But a search engine for periodicals that fails to include basic bibliographic information is like a calculator that doesn't yet display digits and decimal points. This is not beta software. It's not even alpha software. It's software that needs a lot more work before it's ready for the public.

Somewhat better in the bibliographic department is Google Scholar, which gives you links to a bunch of other places on the web where periodicals are stored. Sometimes a hit on Google Scholar gives you very precise bibliographic information. And you know where those links go? That's right — JSTOR.