Use Google Docs to Perform OCR on PDF’s and Images

If you’ve been following the progress Google has been making with their Google Docs offering, you would know that they have been moving at a fast clip. The latest feature to come out of Mountain View is the ability to perform text recognition on PDF and image files that you upload to Google Docs. OCR or Optical Character Recognition is the machine translation of scanned images containing text or handwritten notes into a text file that can be edited. To become familiar with this really cool feature, check out the screenshot tour below.

Google Docs OCR Feature

upload-pdf-convert-ocr

A. Next time you try to upload a PDF file or an image containing text, Google Docs will offer you the option to Convert text from PDF or image files to Google Docs documents.

upload-pdf-convert-ocr-a

Files as they are being uploaded…

upload-pdf-convert-ocr-d

The file you uploaded should now be listed as Google Doc on your list.

upload-pdf-convert-ocr-b

You will notice that when the PDF file is opened there will be a yellow text box above the document (highlighted above).

This document contains text automatically extracted from a PDF or image file. Formatting may have been lost and not all text may have been recognized.

To remove this note, right-click and select “Delete table”.

upload-pdf-convert-ocr-c

The original document (PDF) will be included in the document. When you scroll to the bottom of the page, the text from the PDF/image file will be there.