
The Hugging Face Document AI is one of the most commercially valuable AI tasks. We’re moving beyond simple OCR (which just dumps text) to an AI that understands the layout of a document.
A Document Question Answering model can look at an image of an invoice and answer questions like, “What is the total amount?” or “What is the invoice number?”
Step 1: Installation
You will need pytesseract for the OCR engine.
pip install transformers torch pillow pytesseract # Don't forget to install the Tesseract engine itself!
Step 2: The Code
We will use the document-question-answering pipeline. It requires both an image and a question.
from transformers import pipeline
from PIL import Image
import requests
# 1. Load the pipeline
# This will download a model like 'LayoutLM'
# This model is LARGE and may take time
dqa_pipeline = pipeline(
"document-question-answering",
model="impira/layoutlm-document-qa"
)
# 2. Get an image of a document
# We'll use a sample receipt image
url = "https://huggingface.co/spaces/impira/docquery/resolve/main/receipt.png"
image = Image.open(requests.get(url, stream=True).raw)
# 3. Ask questions about the document!
question1 = "What is the total amount?"
question2 = "What is the name of the merchant?"
# 4. Run the pipeline
result1 = dqa_pipeline(image, question=question1)
result2 = dqa_pipeline(image, question=question2)
# 5. Print the results
print(f"Question: {question1}")
print(f"Answer: {result1[0]['answer']} (Score: {result1[0]['score']:.4f})")
print("\n")
print(f"Question: {question2}")
print(f"Answer: {result2[0]['answer']} (Score: {result2[0]['score']:.4f})")Step 3: The Result
The AI will read the image and find the answers:
Question: What is the total amount? Answer: $12.00 (Score: 0.9995) Question: What is the name of the merchant? Answer: T-A-B-L-E (Score: 0.9812)
This is the core technology behind automated invoice processing and data entry.
Key Takeaways
- The Hugging Face Document AI advances beyond basic OCR to understand document layout.
- It allows for Document Question Answering, enabling the AI to respond to questions about documents, such as invoices.
- Installation of the OCR engine, pytesseract, is the first step in using this technology.
- The document-question-answering pipeline requires an image and a relevant question for processing.
- This technology streamlines automated invoice handling and data entry tasks.





