Gutenberg Distributed Proofreading

By Deane Barker on November 9, 2002

I’ve written before about Project Gutenberg. It’s a massive archive of free books that they simply give away online. They have several thousand books which are in the public domain, and the number is increasing steadily.

It’s probably going to increase a little more rapidly due to the new Distributed Proofreaders project. They’ve been building the system for two years and have just released it to the public. After signing up for an account, you can select a book that’s in the process of being turned into an e-text, and be assigned a single page.

The working area is a a two-pane frameset, with an image of the page on the left, and a textbox on the right with the text from that page as recognized by an OCR. Your responsibility is to proofread the text. Since no OCR is perfect, there are usually a half-dozen little mistakes to correct. Once you’re done, you can “Save and Do Another,” or “Save and Quit.” You can keep doing pages as long as your attention span allows.

Each page is looked at twice. After you save your page, it goes through a second round, then when all the pages for that book are done, they’re glued together and published as a Guternberg e-text.

I’ve used the system, and it’s extremely simple. Before I knew it, I had proofread three pages. The book (I don’t even remember the title, but it seemed to be fiction) had 300 pages, so I contributed 1% of the labor for that book. Very cool. Gutenberg’s goal is 34,000 pages per month, and they’re asking Gutenberg volunteers to do one page per day in support of that goal.

