|

Intermediate Python Project: Analyzing Spotify Data with Pandas

3D illustration of music notes transforming into data charts on a turntable, representing Spotify data analysis.

Learning Pandas syntax is one thing, but using it to answer real questions is another. In this project, we’ll simulate analyzing a dataset of top songs to find insights. It serves as a practical example of Spotify Data Analysis using Pandas.

Note: For this tutorial, we’ll create a small sample dataset directly in the code so you don’t need to download a file. In a real project, you would load this from a CSV.

Step 1: Create the Dataset

First, let’s import Pandas and create our simulated Spotify data. This is the foundation of conducting any of them with Pandas.

import pandas as pd

# Create a dictionary of data
data = {
    'Track': ['Blinding Lights', 'Watermelon Sugar', 'Levitating', 'Save Your Tears', 'Peaches'],
    'Artist': ['The Weeknd', 'Harry Styles', 'Dua Lipa', 'The Weeknd', 'Justin Bieber'],
    'Streams_Millions': [3000, 1800, 1600, 1200, 900],
    'Danceability': [0.514, 0.548, 0.702, 0.680, 0.677]
}

# Convert it to a DataFrame
df = pd.DataFrame(data)

print("--- Our Spotify Dataset ---")
print(df)

Step 2: Artist Analysis

Question 1: How many songs does ‘The Weeknd’ have in this top list?

We can use filtering to find out in our analysis of Spotify data.

# Filter for The Weeknd
weeknd_songs = df[df['Artist'] == 'The Weeknd']

print("\n--- The Weeknd's Songs ---")
print(weeknd_songs)

# Count them using len()
count = len(weeknd_songs)
print(f"\nThe Weeknd has {count} songs in the top list.")

Step 3: Finding the Most “Danceable” Song

Question 2: Which song is the best to dance to?

We need to sort our data by the ‘Danceability’ column in descending order (highest first) when conducting Spotify data analyses using Pandas.

# Sort by Danceability, highest first
most_danceable = df.sort_values(by='Danceability', ascending=False)

# Get the top result
top_dance_song = most_danceable.iloc[0] # .iloc[0] gets the first row

print("\n--- Most Danceable Song ---")
print(f"Track: {top_dance_song['Track']}")
print(f"Artist: {top_dance_song['Artist']}")
print(f"Score: {top_dance_song['Danceability']}")

Step 4: Total Streams

Question 3: What is the total number of streams for all these top 5 songs combined?

Pandas columns have built-in math methods like .sum(), .mean(), .min(), and .max(), which are fundamental for Spotify data analysis.

total_streams = df['Streams_Millions'].sum()

print(f"\nTotal streams for top 5 songs: {total_streams} Million")

Challenge for You

Can you add more songs to the data dictionary at the start? Try adding your favourite songs and their estimated streams, then re-run the Spotify Data Analysis using Pandas to see how they rank!

Similar Posts

Leave a Reply