Gutenburg Project

Client

Project Gutenberg is a not-for-profit organization of volunteers from around the world, connected via the Internet, and dedicated to producing copyright-free books electronically for the use and enjoyment of anyone with access to the Internet, free for the downloading.

We produce about two million dollars for each hour we work. The time it takes us, a rather conservative estimate, is fifty hours to get any etext selected, entered, proofread, edited, copyright searched and analyzed, the copyright letters written, etc.

This projected audience is one hundred million readers. If our value per text is nominally estimated at one dollar then we produce $2 million dollars per hour this year as we release fifty new Etext files per month, or 500 more Etexts in 2000 for a total of 3000+. If they reach just 1-2% of the world's population then the total should reach over 300 billion Etexts given away by year's end.

Challenge

The processing and conversion of dozens of etext books each month while still maintaining quality standards and consistent formatting "look and feel" from multiple contributors was the reason we needed a powerful, flexible tool that is small, fast and easy to use.

Solution

TextPipe quickly revealed it's virtues and paid for itself in the first month of use. The TextPipe tool provides the framework for mastering the etext beast with all of the features needed to publish electronic etext. The wide range of rule-based filter elements can be easily and flexibly combined to create standard and custom filter and character conversion maps.

Process

The publishing of literary ebooks, after copyright clearance research, requires the transcribing of the original printed paper image into etext. This usually involves manually typing or scanning the pages and using optical character recognition tools. This is followed by a final phase of production, where TextPipe works to reformat the material into a high quality, consistent appearance result. We have built three core filters for various situations: to remove certain raw markup syntax, to convert certain extended ASCII accented characters into 7bit printable/e-mailable characters, and the main workhorse filter to apply our formatting standards to the material (such as, removing excess white space, consistent margins, inter-sentence spacing, insertion of standard header and footer material, alerting and review of certain common typographical errors, to mention only a few). Additional specialized filters are easily constructed to handle unusual situations as the need arises. No programming knowledge is needed to build a filter, just click, drag and drop. After this point the book is submitted for storage in the central repository and replication to access sites around the world, where Internet users may freely download the books for simple reading enjoyment or more rigorous lexical research by students and professionals.