Digitizing Books with CAPTCHA

Did you know that every time you fill out a two-word CAPTCHA, you’re helping to digitize books?

A CAPTCHA is a program that can tell whether its user is a human or a computer. We all encounter these on the web, and spend about 10 seconds filling out a CAPTCHA each time. Worldwide, that’s 500,000+ hours daily! A few years ago, Carnegie Mellon University professor Luis von Ahn figured out a way to direct these hours towards a larger effort.

reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher.”


Many projects for digitizing physical books currently exist — the process involves scanning the book pages and turning them into text using OCR (Optical Character Recognition). However, OCR isn’t perfect and requires additional attention to decipher hard-to-read words. reCAPTCHA has revolutionized the process of digitizing books by sending an image that can’t be read by OCR to be deciphered by people through the CAPTCHA process.

reCAPTCHA began as a project of the School of Computer Science at Carnegie Mellon University, and was bought by Google in 2009. The project is currently digitizing old editions of The New York Times as well as books from Google Books. Check out this podcast for an additional overview on the reCAPTCHA process as well as this 2006 Google TechTalk with Luis von Ahn, the founder of reCAPTCHA.

The next time you fill out a CAPTCHA, think about the book you just helped digitize!

CATEGORIES: Books

Leave a Reply