
Learning Pandas syntax is one thing, but using it to answer real questions is another. In this project, we’ll simulate analyzing a dataset of top songs to find insights. It serves as a practical example of Spotify Data Analysis using Pandas.
Note: For this tutorial, we’ll create a small sample dataset directly in the code so you don’t need to download a file. In a real project, you would load this from a CSV.
Step 1: Create the Dataset
First, let’s import Pandas and create our simulated Spotify data. This is the foundation of conducting any of them with Pandas.
import pandas as pd
# Create a dictionary of data
data = {
'Track': ['Blinding Lights', 'Watermelon Sugar', 'Levitating', 'Save Your Tears', 'Peaches'],
'Artist': ['The Weeknd', 'Harry Styles', 'Dua Lipa', 'The Weeknd', 'Justin Bieber'],
'Streams_Millions': [3000, 1800, 1600, 1200, 900],
'Danceability': [0.514, 0.548, 0.702, 0.680, 0.677]
}
# Convert it to a DataFrame
df = pd.DataFrame(data)
print("--- Our Spotify Dataset ---")
print(df)Step 2: Artist Analysis
Question 1: How many songs does ‘The Weeknd’ have in this top list?
We can use filtering to find out in our analysis of Spotify data.
# Filter for The Weeknd
weeknd_songs = df[df['Artist'] == 'The Weeknd']
print("\n--- The Weeknd's Songs ---")
print(weeknd_songs)
# Count them using len()
count = len(weeknd_songs)
print(f"\nThe Weeknd has {count} songs in the top list.")Step 3: Finding the Most “Danceable” Song
Question 2: Which song is the best to dance to?
We need to sort our data by the ‘Danceability’ column in descending order (highest first) when conducting Spotify data analyses using Pandas.
# Sort by Danceability, highest first
most_danceable = df.sort_values(by='Danceability', ascending=False)
# Get the top result
top_dance_song = most_danceable.iloc[0] # .iloc[0] gets the first row
print("\n--- Most Danceable Song ---")
print(f"Track: {top_dance_song['Track']}")
print(f"Artist: {top_dance_song['Artist']}")
print(f"Score: {top_dance_song['Danceability']}")Step 4: Total Streams
Question 3: What is the total number of streams for all these top 5 songs combined?
Pandas columns have built-in math methods like .sum(), .mean(), .min(), and .max(), which are fundamental for Spotify data analysis.
total_streams = df['Streams_Millions'].sum()
print(f"\nTotal streams for top 5 songs: {total_streams} Million")Challenge for You
Can you add more songs to the data dictionary at the start? Try adding your favourite songs and their estimated streams, then re-run the Spotify Data Analysis using Pandas to see how they rank!





