Advanced File Renaming: Using Regex to Organize Messy Files in Python

3D isometric illustration of a Regex lens scanning messy file names and converting them into organized formats.

Sometimes simple string replacements aren’t enough, especially when dealing with complex file names. That’s where Python Regex Renaming comes in handy. Imagine you have files like:

  • invoice_2023_Jan_customer1.pdf
  • INV-feb-2023-client99.pdf

They are messy and inconsistent. You want them ALL to look like: 2023-01_Invoice.pdf. You need Regular Expressions (Regex).

What is Regex?

Regex is a special language for defining search patterns.

  • \d means “any digit” (0-9).
  • [a-z]+ means “one or more lowercase letters”.
  • (.*) captures a group of characters to use later.

The Script

Let’s write a script that finds dates in filenames and standardizes them.

import os
import re

folder = "./messy_invoices/"

# Pattern to find: "invoice" followed by a 4-digit year
# (ignore case with re.IGNORECASE)
pattern = re.compile(r"invoice.*?(\d{4})", re.IGNORECASE)

for filename in os.listdir(folder):
    match = pattern.search(filename)
    if match:
        # We found a year! match.group(1) is the (\d{4}) part.
        year = match.group(1)
        new_name = f"{year}_Invoice.pdf"
        
        # Rename it
        old_path = os.path.join(folder, filename)
        new_path = os.path.join(folder, new_name)
        os.rename(old_path, new_path)
        print(f"Renamed: {filename} -> {new_name}")

Regex is powerful but confusing. Use tools like regex101.com to test your patterns before running them on real files!

Similar Posts

Leave a Reply