|

How to Fix: RuntimeError: CUDA out of memory (PyTorch & Hugging Face)

3D illustration of a GPU graphics card's memory chips overflowing with data blocks, causing sparks and a CUDA error.

This is the most common—and most frustrating—error when you start working with real AI models.

It means: “This AI model is too big to fit in your graphics card’s dedicated memory (VRAM).”

VRAM vs. RAM (The Key Difference)

  • RAM (System Memory): You have lots (e.g., 16GB, 32GB). It’s used by your CPU.
  • VRAM (Video Memory): You have a little (e.g., 4GB, 8GB). It’s super-fast memory on your GPU (NVIDIA card) where all the AI math happens.

When you load a big model, it has to fit entirely inside that 8GB of VRAM.

How to Fix It

Fix 1: Reduce Your Batch Size

Are you training a model? You’re probably trying to process 32 sentences at once. Reduce your “batch size” in your training code. Try batch_size: 16 or batch_size: 8. This processes less data at a time, using less VRAM.

Fix 2: Use a Smaller Model

You can’t run GPT-4 on a laptop. If you’re using a Hugging Face model, try a smaller version.

  • Instead of: model="gpt2-large"
  • Try: model="gpt2" or model="distilgpt2" (a “distilled,” smaller version)

Fix 3: Clear the Cache (PyTorch)

If you’re in a Jupyter Notebook, Python might be “holding on” to old models in memory. You can try to force-clear it:

import torch
torch.cuda.empty_cache()

Often, the only real fix is to Restart your kernel to get a clean slate.

Similar Posts

Leave a Reply