Python Automation: How to Read Text from Images (OCR)

3D visualization of a laser scanner extracting glowing text from a physical photo, representing Python OCR.

OCR (Optical Character Recognition) is the process of “reading” the text out of an image file. Python OCR is perfect solution for automating data entry from scanned invoices, receipts, or old documents.

Python’s most popular tool for this is pytesseract.

โš ๏ธ Step 1: Installation (Crucial!)

This is a 2-part process. 1. Install the Python library:

pip install pytesseract pillow

2. Install Google’s Tesseract Engine: pytesseract is just a “wrapper.” It needs the actual Tesseract program to be installed on your computer.

Step 2: The Python Code

Let’s assume you have an image file named receipt.png that contains text.

from PIL import Image
import pytesseract

# On Windows, you might need to point to the .exe file
# pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

try:
    # 1. Open the image using Pillow
    img = Image.open('receipt.png')
    
    # 2. Use tesseract to convert the image to text
    text = pytesseract.image_to_string(img)
    
    # 3. Print the result!
    print("--- Text found in image: ---")
    print(text)
    
except FileNotFoundError:
    print("Error: Could not find 'receipt.png'.")
except Exception as e:
    print(f"An error occurred. Did you install Tesseract? Error: {e}")

You can now write scripts that scan a folder of images, extract the text, and save it to a .txt file or an Excel sheet.


Key Takeaways

  • OCR (Optical Character Recognition) extracts text from images, making Python OCR ideal for automating data entry tasks.
  • The main tool for Python OCR is pytesseract, which acts as a wrapper for the Tesseract engine.
  • Installation requires two steps: install the Python library and the Tesseract program on your system.
  • On Mac, use ‘brew install tesseract’; on Windows, download the installer and add it to your system PATH.
  • You can write scripts to scan images, extract text, and save the results to a .txt file or Excel sheet.

Similar Posts

Leave a Reply