AI Project: How to Deploy a Hugging Face Model as a REST API (with Flask)

ByAhmed Nabil April 11, 2026March 22, 2026

3D isometric illustration of a Hugging Face robot on a Flask server rack handing data cubes to client devices, representing API deployment.

You’ve built amazing AI models with Hugging Face, but they’re stuck in your script. Want to know how to deploy Hugging Face API so other applications (like a website or a mobile app) can use them?

wrap them in an API. We’ll use Flask to create a simple web server that runs your AI model.

Step 1: Install Libraries

pip install flask transformers torch

Step 2: The Flask Server (`app.py`)

This script will:

Load the AI model (only once, when the server starts).
Create a Flask “route” (a URL) that can accept POST requests.
Run the model on the data sent to it and return the result as JSON.

from flask import Flask, request, jsonify
from transformers import pipeline

# 1. Initialize the Flask app
app = Flask(__name__)

# 2. Load the AI model ONCE at startup
# We'll use the sentiment analyzer
print("Loading AI model...")
classifier = pipeline("sentiment-analysis")
print("Model loaded!")

# 3. Define the API endpoint
@app.route("/analyze", methods=['POST'])
def analyze_text():
    # 4. Get the JSON data from the request
    data = request.json
    if not data or 'text' not in data:
        return jsonify({"error": "Missing 'text' key"}), 400
    
    text_to_analyze = data['text']
    
    # 5. Run the model and return the result
    result = classifier(text_to_analyze)
    return jsonify(result)

# 6. Run the app
if __name__ == "__main__":
    app.run(debug=True, port=5000)

Step 3: Run It and Test It

Run your script: python app.py
Your server is now running at http://127.0.0.1:5000.
You can’t test this in a browser (it’s a POST request). Use a tool like Insomnia/Postman or another Python script to send it data!

You now have a real, working AI microservice.

Key Takeaways

To deploy your Hugging Face AI models, wrap them in an API using Flask.
First, install the necessary libraries for your project.
Create a Flask server with a route that accepts POST requests and runs the AI model.
After running the server with python app.py, test the API using tools like Insomnia or Postman.
You will successfully create a working AI microservice.

Ahmed Nabil

Python Engineer and the founder of Python Pro Hub. With a focus on modern data science (Polars), backend architecture (FastAPI/Django), and automation, builds production-grade tutorials designed to take developers from absolute beginners to advanced software engineers.

Data Science
Polars Feature Engineering: Lags, Diffs, and Percent Changes
ByAhmed Nabil July 1, 2026May 17, 2026
If you are training a Machine Learning model to predict stock prices or sales, you can’t just feed it “Today’s Price.” You need to feed…
Read More Polars Feature Engineering: Lags, Diffs, and Percent Changes
Data Science | Python Projects | Web Development
AI Project: Deploying Hugging Face with FastAPI for Async Speed
ByAhmed Nabil June 8, 2026May 1, 2026
In our previous Flask project, we built an AI server. But it has a huge flaw: Flask is synchronous. If one user sends a request…
Read More AI Project: Deploying Hugging Face with FastAPI for Async Speed
Data Science | Python Projects
Polars Project: A-to-Z Data Cleaning (The 2026 Guide)
ByAhmed Nabil June 1, 2026May 1, 2026
You’ve learned all the individual Polars methods. Now, let’s put them together in one “A-to-Z” project to clean a messy dataset and look at effective…
Read More Polars Project: A-to-Z Data Cleaning (The 2026 Guide)
Data Science
How to Find and Remove Duplicate Rows in Polars (2026 Guide)
ByAhmed Nabil May 16, 2026April 22, 2026
Duplicate data is a silent killer for analysis and machine learning. Polars provides high-speed, easy-to-use methods for finding and removing duplicate rows. The Setup Let’s…
Read More How to Find and Remove Duplicate Rows in Polars (2026 Guide)
Data Science | Python Errors
How to Fix: RuntimeError: CUDA out of memory (PyTorch & Hugging Face)
ByAhmed Nabil March 11, 2026May 7, 2026
This is the most common—and most frustrating—error when you start working with real AI models. It means: “This AI model is too big to fit…
Read More How to Fix: RuntimeError: CUDA out of memory (PyTorch & Hugging Face)
Data Science
A Guide to Polars Data Types (pl.dtypes)
ByAhmed Nabil May 25, 2026April 25, 2026
In Polars, choosing the correct data type (or “dtype”) is the most important step for performance and memory usage. Using a massive Int64 for a…
Read More A Guide to Polars Data Types (pl.dtypes)

Step 1: Install Libraries

Step 2: The Flask Server (app.py)

Step 3: Run It and Test It

Key Takeaways

Similar Posts

Leave a Reply Cancel reply

Step 2: The Flask Server (`app.py`)