Reading Huge Files in Python: Using Generators to Avoid Memory Crashes

3D visualization comparing a crashing crane lifting a huge file versus a smooth machine processing it line-by-line using Python generators.

We learned about Generators earlier. Now let’s use them for a real-world problem: Big Data. One common challenge is reading huge files with Python efficiently.

Imagine you have a 50GB log file. If you try this standard beginner approach, your computer will crash:

# DON'T DO THIS with big files!
with open("massive_log.txt", "r") as f:
    # This tries to load ALL 50GB into RAM at once
    lines = f.readlines()
    for line in lines:
        process(line)

The Generator Solution

Python’s file object is already a generator! You don’t even need to write a special function. You just need to loop over the file object directly.

# DO THIS instead
with open("massive_log.txt", "r") as f:
    # The file object 'f' yields one line at a time efficiently
    for line in f:
        process(line)

This uses almost zero memory, whether the file is 5MB or 5TB.

Writing a Custom Chunk Reader

Sometimes a “line” isn’t the right unit. Maybe you want to read in 1MB chunks.

def read_in_chunks(file_object, chunk_size=1024*1024):
    """Lazy function (generator) to read a file piece by piece."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data

with open("massive_video.mp4", "rb") as f:
    for chunk in read_in_chunks(f):
        process(chunk)

Similar Posts

Leave a Reply