Google and OCR

By Deane Barker on February 18, 2005

Do you think Google will ever start scanning all of the images in its database for OCR-able text and then index that text? This way, you could search for some text that appears in a thought ballon in the latest Dilbert comic and get a GIF of that comic returned.

The other day I searched for the text the character in this cartoon had written on the blackboard, and I got hundreds of links back. How, exactly? I guess I’m not quite sure how Google Images works, but I’m guessing the text was on those pages surrounding the images somewhere?

Just wondering. If you have some insight and can share what you know about how Google Images works currently, please share.



  1. Quote: “©2005 Google – Searching 1,187,630,000 images.”

    It would be a great thing, of course, but I couldn’t even begin to imagine the kind of processing power needed for that task.

  2. Imagine how much spammers would love that technology. Many people try to thwart spammers by putting their email address in a small gif. If google OCR those gifs, suddenly that technique goes right out the window. Interesting idea though. I have been searching for a Dilbert comic I read a long time ago …

  3. Even if spammers can do OCR, I think the resolution of the GIF might be too small for yield an accurate result.

  4. You probably could degrade the email address so that the OCR wouldn’t work. It really doesn’t take much. For instance if the letters touch one another that can skrew up OCR, but people could do just fine. There is actually a lot of work being done developing ways of creating text that people can read and computers can’t. They are using it in situations where they want to be sure that a person is interacting with their web site and not some computer program.

  5. If a human can read it, I’m convinced it is possible to devise an algorithm to allow a computer to read it. It’s a pattern recognition problem, which is something the human brain handles as a matter of course, and not something at all easily translated into an algorithm. But that doesn’t mean it’s impossible.

Comments are closed. If you have something you really want to say, tweet @gadgetopia.