|

Data Science Project: Visualize IMDb Movie Ratings with Pandas

3D visualization of a film reel transforming into data charts and star ratings, representing an IMDb data analysis project.

Let’s answer an age-old question: Are movies getting worse? We can use Python to analyze thousands of movie ratings and visualize IMDb ratings to find out.

Note: You can download free datasets from sites like Kaggle. For this tutorial, we’ll simulate a small dataset.

Step 1: The Data

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = {
    'Title': ['Movie A', 'Movie B', 'Movie C', 'Movie D', 'Movie E', 'Movie F'],
    'Year': [1980, 1985, 1995, 2005, 2015, 2023],
    'Rating': [8.5, 8.2, 7.9, 6.5, 7.2, 5.8]
}
df = pd.DataFrame(data)

Step 2: The Visualization

We want to see the relationship between Year and Rating. A scatter plot with a trend line is perfect for this.

# Set a nice theme
sns.set_theme(style="darkgrid")

# Create the plot with a regression line (regplot)
sns.regplot(data=df, x="Year", y="Rating", color="b")

plt.title("Movie Ratings Over Time")
plt.show()

If you run this with a real, large dataset, the trend line will show you the answer!

Challenge

Try finding a real IMDb CSV file on Kaggle, load it with pd.read_csv(), and run this same code on 10,000 movies instead of 6.

Similar Posts

Leave a Reply