Popular Music Across Different Streaming Platforms: Youtube Vs Spotify¶

 
 

Objective of the project: The goal of this project is to test the hypothisis of whether the same songs are popular across different streaming platforms and explore the characteristic (genre/album/tempo/duration) of music that is more popular in each streaming platform.
Source of the data: The dataset I will be using is from kaggle and you can find it via this link (https://www.kaggle.com/datasets/salvatorerastelli/spotify-and-youtube)

 
 

Setting the work enviornment¶

Loading the libraries:¶

 
In [ ]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np 
import seaborn as sns
import plotly.express as px
import matplotlib.cm as cm
 
 

Importing the data¶

 
In [ ]:
datam= pd.read_csv("Spotify_Youtube.csv")
 
 

Assessing and Preparing our data¶

 
In [ ]:
datam.info()
 
 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20718 entries, 0 to 20717
Data columns (total 28 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Unnamed: 0        20718 non-null  int64  
 1   Artist            20718 non-null  object 
 2   Url_spotify       20718 non-null  object 
 3   Track             20718 non-null  object 
 4   Album             20718 non-null  object 
 5   Album_type        20718 non-null  object 
 6   Uri               20718 non-null  object 
 7   Danceability      20716 non-null  float64
 8   Energy            20716 non-null  float64
 9   Key               20716 non-null  float64
 10  Loudness          20716 non-null  float64
 11  Speechiness       20716 non-null  float64
 12  Acousticness      20716 non-null  float64
 13  Instrumentalness  20716 non-null  float64
 14  Liveness          20716 non-null  float64
 15  Valence           20716 non-null  float64
 16  Tempo             20716 non-null  float64
 17  Duration_ms       20716 non-null  float64
 18  Url_youtube       20248 non-null  object 
 19  Title             20248 non-null  object 
 20  Channel           20248 non-null  object 
 21  Views             20248 non-null  float64
 22  Likes             20177 non-null  float64
 23  Comments          20149 non-null  float64
 24  Description       19842 non-null  object 
 25  Licensed          20248 non-null  object 
 26  official_video    20248 non-null  object 
 27  Stream            20142 non-null  float64
dtypes: float64(15), int64(1), object(12)
memory usage: 4.4+ MB
 
In [ ]:
datam.head()
 
Out[ ]:
  Unnamed: 0 Artist Url_spotify Track Album Album_type Uri Danceability Energy Key ... Url_youtube Title Channel Views Likes Comments Description Licensed official_video Stream
0 0 Gorillaz https://open.spotify.com/artist/3AA28KZvwAUcZu... Feel Good Inc. Demon Days album spotify:track:0d28khcov6AiegSCpG5TuT 0.818 0.705 6.0 ... https://www.youtube.com/watch?v=HyHNuVaZJ-k Gorillaz - Feel Good Inc. (Official Video) Gorillaz 693555221.0 6220896.0 169907.0 Official HD Video for Gorillaz' fantastic trac... True True 1.040235e+09
1 1 Gorillaz https://open.spotify.com/artist/3AA28KZvwAUcZu... Rhinestone Eyes Plastic Beach album spotify:track:1foMv2HQwfQ2vntFf9HFeG 0.676 0.703 8.0 ... https://www.youtube.com/watch?v=yYDmaexVHic Gorillaz - Rhinestone Eyes [Storyboard Film] (... Gorillaz 72011645.0 1079128.0 31003.0 The official video for Gorillaz - Rhinestone E... True True 3.100837e+08
2 2 Gorillaz https://open.spotify.com/artist/3AA28KZvwAUcZu... New Gold (feat. Tame Impala and Bootie Brown) New Gold (feat. Tame Impala and Bootie Brown) single spotify:track:64dLd6rVqDLtkXFYrEUHIU 0.695 0.923 1.0 ... https://www.youtube.com/watch?v=qJa-VFwPpYA Gorillaz - New Gold ft. Tame Impala & Bootie B... Gorillaz 8435055.0 282142.0 7399.0 Gorillaz - New Gold ft. Tame Impala & Bootie B... True True 6.306347e+07
3 3 Gorillaz https://open.spotify.com/artist/3AA28KZvwAUcZu... On Melancholy Hill Plastic Beach album spotify:track:0q6LuUqGLUiCPP1cbdwFs3 0.689 0.739 2.0 ... https://www.youtube.com/watch?v=04mfKJWDSzI Gorillaz - On Melancholy Hill (Official Video) Gorillaz 211754952.0 1788577.0 55229.0 Follow Gorillaz online:\nhttp://gorillaz.com \... True True 4.346636e+08
4 4 Gorillaz https://open.spotify.com/artist/3AA28KZvwAUcZu... Clint Eastwood Gorillaz album spotify:track:7yMiX7n9SBvadzox8T5jzT 0.663 0.694 10.0 ... https://www.youtube.com/watch?v=1V_xRb0x9aw Gorillaz - Clint Eastwood (Official Video) Gorillaz 618480958.0 6197318.0 155930.0 The official music video for Gorillaz - Clint ... True True 6.172597e+08

5 rows × 28 columns

 
 

In this dataset we have 28 variables and 20718 entries.

Our next step is to clean the data by deleting the columns that we won't need and checking for missing data.

 
In [ ]:
# Detele the variables we won't need 
datam.drop(['Unnamed: 0', 'Url_spotify', 'Uri', 'Url_youtube', 'Title', 'Description'], axis=1, inplace=True)
 
In [ ]:
#checking for missing data
datam.isnull().values.any()
datam.isnull().sum()
 
Out[ ]:
Artist                0
Track                 0
Album                 0
Album_type            0
Danceability          2
Energy                2
Key                   2
Loudness              2
Speechiness           2
Acousticness          2
Instrumentalness      2
Liveness              2
Valence               2
Tempo                 2
Duration_ms           2
Channel             470
Views               470
Likes               541
Comments            569
Licensed            470
official_video      470
Stream              576
dtype: int64
 
In [ ]:
#Checking for duplicates: 
datam.duplicated().values.any()
 
Out[ ]:
False
 
 

As we can see we don't have any duplicates but we do have some missing data so we will drop the rows with missing entries.

 
In [ ]:
#Deleting missing data:
datam.dropna(inplace=True)
 
In [ ]:
datam.info()
 
 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20718 entries, 0 to 20717
Data columns (total 22 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Artist            20718 non-null  object 
 1   Track             20718 non-null  object 
 2   Album             20718 non-null  object 
 3   Album_type        20718 non-null  object 
 4   Danceability      20716 non-null  float64
 5   Energy            20716 non-null  float64
 6   Key               20716 non-null  float64
 7   Loudness          20716 non-null  float64
 8   Speechiness       20716 non-null  float64
 9   Acousticness      20716 non-null  float64
 10  Instrumentalness  20716 non-null  float64
 11  Liveness          20716 non-null  float64
 12  Valence           20716 non-null  float64
 13  Tempo             20716 non-null  float64
 14  Duration_ms       20716 non-null  float64
 15  Channel           20248 non-null  object 
 16  Views             20248 non-null  float64
 17  Likes             20177 non-null  float64
 18  Comments          20149 non-null  float64
 19  Licensed          20248 non-null  object 
 20  official_video    20248 non-null  object 
 21  Stream            20142 non-null  float64
dtypes: float64(15), object(7)
memory usage: 3.5+ MB
 
In [ ]:
datam.head()
 
Out[ ]:
  Artist Track Album Album_type Danceability Energy Key Loudness Speechiness Acousticness ... Valence Tempo Duration_ms Channel Views Likes Comments Licensed official_video Stream
0 Gorillaz Feel Good Inc. Demon Days album 0.818 0.705 6.0 -6.679 0.1770 0.008360 ... 0.772 138.559 222640.0 Gorillaz 693555221.0 6220896.0 169907.0 True True 1.040235e+09
1 Gorillaz Rhinestone Eyes Plastic Beach album 0.676 0.703 8.0 -5.815 0.0302 0.086900 ... 0.852 92.761 200173.0 Gorillaz 72011645.0 1079128.0 31003.0 True True 3.100837e+08
2 Gorillaz New Gold (feat. Tame Impala and Bootie Brown) New Gold (feat. Tame Impala and Bootie Brown) single 0.695 0.923 1.0 -3.930 0.0522 0.042500 ... 0.551 108.014 215150.0 Gorillaz 8435055.0 282142.0 7399.0 True True 6.306347e+07
3 Gorillaz On Melancholy Hill Plastic Beach album 0.689 0.739 2.0 -5.810 0.0260 0.000015 ... 0.578 120.423 233867.0 Gorillaz 211754952.0 1788577.0 55229.0 True True 4.346636e+08
4 Gorillaz Clint Eastwood Gorillaz album 0.663 0.694 10.0 -8.627 0.1710 0.025300 ... 0.525 167.953 340920.0 Gorillaz 618480958.0 6197318.0 155930.0 True True 6.172597e+08

5 rows × 22 columns

 
 

After re-shaping our data we are left with 22 variables and 19549 entries.

 
 

Exploring our Data¶

The percentage of the tracks with a licence¶

 
In [ ]:
License_count = datam['Licensed'].value_counts()
print(License_count)
labels = License_count.index.tolist()
sizes = License_count.values.tolist()

colors = colors = plt.cm.inferno(np.linspace(0.9, 0.8, len(labels)))
plt.pie(sizes, labels=labels, colors= colors, autopct='%1.1f%%', startangle=90)
plt.title('Number of tracks that have a license')
plt.legend(labels, loc='best')
plt.show()
 
 
True     14140
False     6108
Name: Licensed, dtype: int64
 
 
 

As you can see, 70,2% of the songs that are featured in this dataset are licensed.

Types of Albums¶

 
In [ ]:
Album_count = datam['Album_type'].value_counts()
print(Album_count)

labels = Album_count.index.tolist()
sizes = Album_count.values.tolist()

colors = plt.cm.inferno(np.linspace(0.9, 0.7, len(labels)))
plt.pie(sizes, labels=labels, colors= colors, autopct='%1.1f%%', startangle=90)
plt.title('Distribution of the Types of Albums')
plt.legend(labels, loc='best')
plt.show()
 
 
album          14148
single          4689
compilation      712
Name: Album_type, dtype: int64
 
 
 

Most of the tracks in this list are from albums with 72.4% followed by singles 24.0%.Compilations represent only 3.6% of all tracks.

Average Duration of the tracks¶

 
In [ ]:
#Changing the measure of duration from milliseconds to minutes
datam['Duration_ms'] = (round(datam['Duration_ms']/(1000*60),2))
datam.rename(columns={'Duration_ms': 'Duration'}, inplace=True)
datam.head()
 
Out[ ]:
  Artist Track Album Album_type Danceability Energy Key Loudness Speechiness Acousticness ... Valence Tempo Duration Channel Views Likes Comments Licensed official_video Stream
0 Gorillaz Feel Good Inc. Demon Days album 0.818 0.705 6.0 -6.679 0.1770 0.008360 ... 0.772 138.559 3.71 Gorillaz 693555221.0 6220896.0 169907.0 True True 1.040235e+09
1 Gorillaz Rhinestone Eyes Plastic Beach album 0.676 0.703 8.0 -5.815 0.0302 0.086900 ... 0.852 92.761 3.34 Gorillaz 72011645.0 1079128.0 31003.0 True True 3.100837e+08
2 Gorillaz New Gold (feat. Tame Impala and Bootie Brown) New Gold (feat. Tame Impala and Bootie Brown) single 0.695 0.923 1.0 -3.930 0.0522 0.042500 ... 0.551 108.014 3.59 Gorillaz 8435055.0 282142.0 7399.0 True True 6.306347e+07
3 Gorillaz On Melancholy Hill Plastic Beach album 0.689 0.739 2.0 -5.810 0.0260 0.000015 ... 0.578 120.423 3.90 Gorillaz 211754952.0 1788577.0 55229.0 True True 4.346636e+08
4 Gorillaz Clint Eastwood Gorillaz album 0.663 0.694 10.0 -8.627 0.1710 0.025300 ... 0.525 167.953 5.68 Gorillaz 618480958.0 6197318.0 155930.0 True True 6.172597e+08

5 rows × 22 columns

 
In [ ]:
datam['Duration'].describe()
 
Out[ ]:
count    20716.000000
mean         3.745316
std          2.079864
min          0.520000
25%          3.000000
50%          3.550000
75%          4.210000
max         77.930000
Name: Duration, dtype: float64
 
 

The average duration of songs is 3.74 minutes.

The shortest song has 53 seconds and the longest one has one hour and 17 minutes.

 
 

Comparing the TOP 10 Popular Songs Across the Two Streaming Platforms¶

 
In [ ]:
# select the top 10 songs based on number of views on Youtube

YouTubeTOP10ALL =datam.sort_values(by='Views', ascending=False)[:10]
YouTubeTOP10=YouTubeTOP10ALL[['Views','Track']]

# create a list of colors from the the inferno colormap
colors = plt.cm.inferno(np.linspace(0.9, 0.5, len(YouTubeTOP10)))

# Show the plots next to each other 

frame,(H_Plot1, H_Plot2) = plt.subplots(1, 2, figsize=(16,8))


# create the horizontal bar plot
H_Plot1 = YouTubeTOP10.plot(kind='barh', x='Track', y='Views', color=colors,ax=H_Plot1)

# set the title and axis labels
H_Plot1.set_title('Top 10 Songs by Number of Views on Youtube')
H_Plot1.set_xlabel('Views')
H_Plot1.set_ylabel('Song Title')

# select the top 10 songs based on number of streams on Spotify

SpotifyTOP10ALL =datam.sort_values(by='Stream', ascending=False)[:10]
SpotifyTOP10=SpotifyTOP10ALL[['Stream','Track']]

# create a list of colors from the the inferno colormap
colors = plt.cm.inferno(np.linspace(0.9, 0.5, len(SpotifyTOP10)))

# create the horizontal bar plot
H_Plot2 = SpotifyTOP10.plot(kind='barh', x='Track', y='Stream', color=colors,ax= H_Plot2)

# set the title and axis labels
H_Plot2.set_title('Top 10 Songs by Number of Streams on Spotify')
H_Plot2.set_xlabel('Streams')
H_Plot2.set(ylabel=None)
# show the plots
frame.tight_layout()
plt.show()
 
 
/var/folders/lk/l0x0m4v150vc1h4g637pjgyw0000gn/T/ipykernel_28494/3057268177.py:38: UserWarning: Glyph 44053 (\N{HANGUL SYLLABLE GANG}) missing from current font.
  frame.tight_layout()
/var/folders/lk/l0x0m4v150vc1h4g637pjgyw0000gn/T/ipykernel_28494/3057268177.py:38: UserWarning: Glyph 45224 (\N{HANGUL SYLLABLE NAM}) missing from current font.
  frame.tight_layout()
/var/folders/lk/l0x0m4v150vc1h4g637pjgyw0000gn/T/ipykernel_28494/3057268177.py:38: UserWarning: Glyph 49828 (\N{HANGUL SYLLABLE SEU}) missing from current font.
  frame.tight_layout()
/var/folders/lk/l0x0m4v150vc1h4g637pjgyw0000gn/T/ipykernel_28494/3057268177.py:38: UserWarning: Glyph 53440 (\N{HANGUL SYLLABLE TA}) missing from current font.
  frame.tight_layout()
/var/folders/lk/l0x0m4v150vc1h4g637pjgyw0000gn/T/ipykernel_28494/3057268177.py:38: UserWarning: Glyph 51068 (\N{HANGUL SYLLABLE IL}) missing from current font.
  frame.tight_layout()
/Users/khouloud/opt/anaconda3/lib/python3.9/site-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 44053 (\N{HANGUL SYLLABLE GANG}) missing from current font.
  fig.canvas.print_figure(bytes_io, **kw)
/Users/khouloud/opt/anaconda3/lib/python3.9/site-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 45224 (\N{HANGUL SYLLABLE NAM}) missing from current font.
  fig.canvas.print_figure(bytes_io, **kw)
/Users/khouloud/opt/anaconda3/lib/python3.9/site-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 49828 (\N{HANGUL SYLLABLE SEU}) missing from current font.
  fig.canvas.print_figure(bytes_io, **kw)
/Users/khouloud/opt/anaconda3/lib/python3.9/site-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 53440 (\N{HANGUL SYLLABLE TA}) missing from current font.
  fig.canvas.print_figure(bytes_io, **kw)
/Users/khouloud/opt/anaconda3/lib/python3.9/site-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 51068 (\N{HANGUL SYLLABLE IL}) missing from current font.
  fig.canvas.print_figure(bytes_io, **kw)
 
 
 

Aside from Shape of You, we can observe that the top 10 songs on Spotify and YouTube are completely different.

We would like to explore this disprency farther and compare the most streamed artists on each platform.

Comparing the Famous Artists Across the Two Platforms¶

 
In [ ]:
#Grouping the data by the variable group
Artist_Y= datam.groupby('Artist')[['Views']].sum()
#Storing the top 10 artists
top_10_artist_Y= Artist_Y.sort_values(['Views'], ascending=False)[:10]
Artist_Y.rename(columns={'':'Artist'}, inplace=True)
#Prepare the data to create the visuals
top_10_artist_Y.reset_index(inplace=True)
#Grouping the data by the variable group
Artist_S= datam.groupby('Artist')[['Stream']].sum()
#Storing the top 10 artists
top_10_artist_S= Artist_S.sort_values(['Stream'], ascending=False)[:10]
Artist_S.rename(columns={'':'Artist'}, inplace=True)
#Prepare the data to create the visuals
top_10_artist_S.reset_index(inplace=True)
 
In [ ]:
# create a list of colors from the the inferno colormap
colors = plt.cm.inferno(np.linspace(0.9, 0.5, len(top_10_artist_Y)))

# Show the plots next to each other 

frame2,(H_Plot3, H_Plot4) = plt.subplots(1, 2, figsize=(16,8))

# create the horizontal bar plot
H_Plot3 = top_10_artist_Y.plot(kind='barh', x='Artist', y='Views', color=colors,ax= H_Plot3)

# set the title and axis labels
H_Plot3.set_title('Top 10 Streamed Artists on YouTube')
H_Plot3.set_xlabel('Views')
H_Plot3.set_ylabel('Artist')

H_Plot5 = top_10_artist_S.plot(kind='barh', x='Artist', y='Stream', color=colors,ax= H_Plot4)

# set the title and axis labels
H_Plot4.set_title('Top 10 Streamed Artists on Spotify')
H_Plot4.set_xlabel('Streams')
H_Plot4.set(ylabel= None)

frame2.tight_layout()
plt.show()
 
 
 
 

Similar to our previous analysis we can observe that "Ed Sheeran" the singer of "Shape of you" is one of the artist present in the Top 10 of both platforms. Bruno Mars is also prsent in both lists.

Let's explore further the musical characteristics of songs popular on ech streaming service.

¶

 
In [ ]:
Youtube_Songs_Attributs=YouTubeTOP10ALL[['Track','Danceability','Loudness','Speechiness','Acousticness','Liveness','Valence','Tempo']]
print(Youtube_Songs_Attributs)
 
 
                                    Track  Danceability  Loudness  \
1147                            Despacito         0.655    -4.787   
365                             Despacito         0.655    -4.787   
12452                        Shape of You         0.825    -3.183   
14580  See You Again (feat. Charlie Puth)         0.689    -7.503   
12469  See You Again (feat. Charlie Puth)         0.689    -7.503   
20303                   Wheels on the Bus         0.941   -11.920   
10686      Uptown Funk (feat. Bruno Mars)         0.856    -7.223   
8937                Gangnam Style (강남스타일)         0.727    -2.871   
9569                                Sugar         0.748    -7.055   
13032                                Roar         0.671    -4.821   

       Speechiness  Acousticness  Liveness  Valence    Tempo  
1147        0.1530       0.19800    0.0670    0.839  177.928  
365         0.1530       0.19800    0.0670    0.839  177.928  
12452       0.0802       0.58100    0.0931    0.931   95.977  
14580       0.0815       0.36900    0.0649    0.283   80.025  
12469       0.0815       0.36900    0.0649    0.283   80.025  
20303       0.0427       0.18400    0.1570    0.965  125.021  
10686       0.0824       0.00801    0.0344    0.928  114.988  
8937        0.2860       0.00417    0.0910    0.749  132.067  
9569        0.0334       0.05910    0.0863    0.884  120.076  
13032       0.0316       0.00492    0.3540    0.436   90.003  
 
In [ ]:
Spotify_Songs_Attributs=SpotifyTOP10ALL[['Track','Danceability','Loudness','Speechiness','Acousticness','Liveness','Valence','Tempo']]
print(Spotify_Songs_Attributs)
 
 
                                               Track  Danceability  Loudness  \
15250                                Blinding Lights         0.514    -5.934   
12452                                   Shape of You         0.825    -3.183   
19186                              Someone You Loved         0.501    -5.679   
17937                     rockstar (feat. 21 Savage)         0.585    -6.136   
17445  Sunflower - Spider-Man: Into the Spider-Verse         0.755    -4.368   
17938  Sunflower - Spider-Man: Into the Spider-Verse         0.755    -4.368   
13503                                      One Dance         0.792    -5.609   
16099                                         Closer         0.748    -5.599   
16028                                         Closer         0.748    -5.599   
14030                                       Believer         0.776    -4.374   

       Speechiness  Acousticness  Liveness  Valence    Tempo  
15250       0.0598       0.00146    0.0897    0.334  171.005  
12452       0.0802       0.58100    0.0931    0.931   95.977  
19186       0.0319       0.75100    0.1050    0.446  109.891  
17937       0.0712       0.12400    0.1310    0.129  159.801  
17445       0.0575       0.53300    0.0685    0.925   89.960  
17938       0.0575       0.53300    0.0685    0.925   89.960  
13503       0.0536       0.00776    0.3290    0.370  103.967  
16099       0.0338       0.41400    0.1110    0.661   95.010  
16028       0.0338       0.41400    0.1110    0.661   95.010  
14030       0.1280       0.06220    0.0810    0.666  124.949