In recent years, the trend of artists collaborating on songs has become increasingly prevalent. This analysis examines whether this shift is reflected in the data, specifically focusing on the Billboard Hot 100 charts. By parsing and analyzing data from 1958 to 2021, this study investigates the frequency and impact of collaborations versus solo performances among top-charting artists.
The analysis reveals a significant rise in the number of collaborative tracks over time, with a notable decrease in songs released by solo artists. Moreover, the study explores the relationship between the number of collaborators and the success of a song, finding that increased collaborations are associated with higher peak rankings and longer durations on the Billboard charts.
This study underscores the growing importance of collaborations in achieving commercial success in the music industry, as evidenced by the trends observed on the Billboard Hot 100 charts.
There has been a noticeable increase in artists releasing songs in collaboration with other artists. This project explores whether this trend is reflected in the data: Are Top 100 artists more likely to perform solo or collaborate?
Data Description:
The Billboard Hot 100 is the United States’ music industry standard record chart, published weekly by Billboard magazine. Chart rankings are determined by a combination of sales, radio play, and online streaming within the United States.
Each week, Billboard releases “The Hot 100” chart, listing the songs that were trending based on sales and airplay for that week. This dataset compiles all “The Hot 100” charts released since the chart’s inception in 1958.
The data for this analysis was extracted from here.
Determine how collaborations or featuring artists are indicated in the artist name.
During exploratory analysis, collaborations or featuring artists were indicated in various formats, including:
Maximum Number of Featured Artists: The song with the most featured artists is “Costa Rica” by Dreamville, featuring 9 additional artists (2019).
The dataset records Billboard Top 100 hits from 1958-08-04 to 2021-11-06.
Caveats:
Vocal Groups and Bands in the 1950s and 1960s: Many vocal groups and bands were named in a format like “A and B” or “A & B” (e.g., James Brown And The Famous Flames; Wade Flemons and the Newcomers; Robert & Johnny). These should be considered as a single artist rather than a collaboration.
Artists Requiring Special Consideration: Lil Nas X & B; Silk Sonic (Bruno Mars & Anderson .Paak)
# Single artist
# summary(lm(percentage~as.numeric(year),line_df %>% filter(artist_number==1)))
<-ggplot(line_df%>% filter(artist_number==1) ,aes(x=as.numeric(year),y=percentage))+
figgeom_point(colour=5,shape=18,size=2)+
geom_smooth(method=lm,se=TRUE,colour=6,alpha=0.2) +theme_minimal() +
labs(x="Year",y="Percentage of Songs (%)", title="Billboard Top100 Hits: Single Artist") +
theme(plot.title = element_text(hjust = 0.5))+scale_x_discrete(breaks=c(1958,2021))
ggplotly(fig)
# summary(lm(percentage~as.numeric(year),line_df %>% filter(artist_number==1))) # recently, significantly less top 100 hits are released by single artist
ggplot(data=line_df %>% filter(artist_number==1), aes(x=year,y=percentage,group=1))+
geom_line(colour=5,size=2)+
scale_x_discrete(name="Year",breaks=seq(1958,2021,3))+
scale_y_continuous(name="Percentage of Songs (%)",breaks=seq(55,100,5)) + # Line type
theme_minimal()
# ggplot(bb_df_cleaned %>% filter(artist_number==1),aes(x=as.numeric(year)))+ geom_histogram(colour="black", fill="white")+
# geom_density(aes(y=..count..), alpha=.2, fill="#FF6666",bw=1)
# summary(lm(percentage~as.numeric(year),line_df %>% filter(artist_number==2))) # recently, significantly more top 100 hits feature 1 artist (statistically significant at P < .000)
<-ggplot(line_df%>% filter(artist_number==2) ,aes(x=as.numeric(year),y=percentage))+
fig2geom_point(colour="996",shape=18,size=2)+
geom_smooth(method=lm,se=TRUE,colour=6,size=2,alpha=0.2)+
scale_x_discrete(breaks=c(1958,2021))+
theme_minimal() +
labs(x="Year",y="Percentage of Songs (%)", title="Billboard Top100 Hits: Featuring 1 Artist") +
theme(plot.title = element_text(hjust = 0.5))
ggplotly(fig2)
ggplot(data=line_df %>% filter(artist_number==2), aes(x=year,y=percentage,group=1))+
geom_line(colour="996",size=2)+
scale_x_discrete(name="Year",breaks=seq(1958,2021,3))+
scale_y_continuous(name="Percentage of Songs (%)",breaks=seq(0,100,5))+
geom_area(fill="996",alpha=0.5)+
theme_minimal()
# summary(lm(percentage~as.numeric(year),line_df %>% filter(artist_number==3))) # recently, significantly more top 100 hits feature 2 artists (statistically significant at P < .000)
<-ggplot(line_df%>% filter(artist_number==3) ,aes(x=as.numeric(year),y=percentage))+
fig3geom_point(colour="#14701D",shape=18,size=2)+
geom_smooth(method=lm,se=TRUE,colour=6,size=2,alpha=0.2) +theme_minimal() +
labs(x="Year",y="Percentage of Songs (%)", title="Billboard Top100 Hits: Featuring 2 Artists") +
theme(plot.title = element_text(hjust = 0.5))+scale_x_discrete(breaks=c(1958,2021))
ggplotly(fig3)
ggplot(data=line_df %>% filter(artist_number==3), aes(x=year,y=percentage,group=1))+
geom_line(colour="#14701D",size=2)+
scale_x_discrete(name="Year",breaks=seq(1958,2021,3))+
scale_y_continuous(name="Percentage of Songs (%)",breaks=seq(0,100,5))+
geom_area(fill="#14701D",alpha=0.5)+
theme_minimal()
# summary(lm(percentage~as.numeric(year),line_df %>% filter(artist_number==4))) # recently,significantly more top 100 hits feature 3 artists (statistically significant at P < .000)
<-ggplot(line_df%>% filter(artist_number==4) ,aes(x=as.numeric(year),y=percentage))+
fig4geom_point(colour="#8618B5",shape=18,size=2)+
scale_x_discrete(breaks=c(1958,2021))+
geom_smooth(method=lm,se=TRUE,colour=6,size=2,alpha=0.1) +
theme_minimal() +
labs(x="Year",y="Percentage of Songs (%)", title="Billboard Top100 Hits: Featuring 3 Artists") +
theme(plot.title = element_text(hjust = 0.5))
ggplotly(fig4)
ggplot(data=line_df %>% filter(artist_number==4), aes(x=year,y=percentage,group=1))+
geom_line(colour="#8618B5",size=2)+
scale_x_discrete(name="Year",breaks=seq(1958,2021,3))+
scale_y_continuous(name="Percentage of Songs (%)",breaks=seq(0,5,1))+
geom_area(fill="#8618B5",alpha=0.7) +
theme_minimal()
# summary(lm(percentage~as.numeric(year),line_df %>% filter(artist_number==5))) # not enough data to assess relationship (not significant)
<-ggplot(line_df%>% filter(artist_number==5) ,aes(x=as.numeric(year),y=percentage))+
fig5geom_point(colour="#4C0099",shape=18,size=2)+
scale_x_discrete(breaks=c(1958,2021))+
geom_smooth(method=lm,se=TRUE,colour=6,size=2,alpha=0.1) +
theme_minimal() +
labs(x="Year",y="Percentage of Songs (%)", title="Billboard Top100 Hits: Featuring 4 Artists") +
theme(plot.title = element_text(hjust = 0.5))
ggplotly(fig5)
ggplot(data=line_df %>% filter(artist_number==5), aes(x=year,y=percentage,group=1))+
geom_line(colour="#4C0099",size=2)+
scale_x_discrete(name="Year",breaks=seq(1958,2021,3))+
scale_y_continuous(name="Percentage of Songs (%)",breaks=seq(0,5,1))+
geom_area(fill="#4C0099",alpha=0.5)
# summary(lm(percentage~as.numeric(year),line_df %>% filter(artist_number=='6'))) # not enough data to assess relationship (not significant)
<-ggplot(line_df %>% filter(artist_number==6) ,aes(x=as.numeric(year),y=percentage))+
fig6geom_point(colour="#0000CC",size=2,shape=18)+
scale_x_discrete(breaks=c(1958,2021))+
geom_smooth(method=lm,se=TRUE,colour=6,size=2,alpha=0.1) + theme_minimal() +
labs(x="Year",y="Percentage of Songs (%)", title="Billboard Top100 Hits: Featuring 5 or More Artists") +
theme(plot.title = element_text(hjust = 0.5))
ggplotly(fig6)
ggplot(data=line_df %>% filter(artist_number==6), aes(x=year,y=percentage,group=1))+
geom_line(colour="#0000CC",size=2)+
scale_x_discrete(name="Year",breaks=seq(1958,2021,3))+
scale_y_continuous(name="Percentage of Songs (%)",breaks=seq(0,1,0.1))+
geom_area(fill="#0000CC",alpha=0.5)+
theme_minimal()
<-bb_df_cleaned %>% filter(year<='2021' & year > '2019') %>% distinct(song, .keep_all=TRUE) %>%
pie_dfgroup_by(artist_number) %>%
count() %>%
ungroup() %>%
mutate(perc=`n`/sum(`n`)) %>%
arrange(perc) %>% mutate(percentage = scales::percent(perc))
ggplot(pie_df, aes(x = "" , y = perc, fill = factor(artist_number))) +
geom_col(width = 2) +
coord_polar(theta = "y") +
scale_fill_brewer(palette = "Pastel1") +
geom_label_repel(data = pie_df,
aes(label = percentage),
size = 5, nudge_x = 0, show.legend = FALSE) +
guides(fill = guide_legend(title = "Number of Artists")) +theme_void() +
labs(title="Billboard Top100 Hit Collaboration Trend in 2020s")+
theme(plot.title = element_text(hjust = 0.5))
# 2010s
<-bb_df_cleaned %>% filter(year<='2019' & year > '2009') %>%
pie_dfgroup_by(artist_number) %>%
count() %>%
ungroup() %>%
mutate(perc=`n`/sum(`n`)) %>%
arrange(perc) %>% mutate(percentage = scales::percent(perc))
ggplot(pie_df, aes(x = "" , y = perc, fill = factor(artist_number))) +
geom_col(width = 2) +
coord_polar(theta = "y") +
scale_fill_brewer(palette = "Pastel1") +
geom_label_repel(data = pie_df,
aes(label = percentage),
size = 5, nudge_x = 0, show.legend = FALSE) +
guides(fill = guide_legend(title = "Number of Artists")) +theme_void() +
labs(title="Billboard Top100 Hit Collaboration Trend in 2010s")+
theme(plot.title = element_text(hjust = 0.5))
pie_df
## # A tibble: 6 × 4
## artist_number n perc percentage
## <dbl> <int> <dbl> <chr>
## 1 6 112 0.00215 0.21%
## 2 5 272 0.00521 0.52%
## 3 4 711 0.0136 1.36%
## 4 3 3083 0.0591 5.91%
## 5 2 13866 0.266 26.56%
## 6 1 34156 0.654 65.43%
<-bb_df_cleaned %>% filter(year<='2009' & year > '1999') %>%
pie_dfgroup_by(artist_number) %>%
count() %>%
ungroup() %>%
mutate(perc=`n`/sum(`n`)) %>%
arrange(perc) %>% mutate(percentage = scales::percent(perc))
ggplot(pie_df, aes(x = "" , y = perc, fill = factor(artist_number))) +
geom_col(width = 2) +
coord_polar(theta = "y") +
scale_fill_brewer(palette = "Pastel1") +
geom_label_repel(data = pie_df,
aes(label = percentage),
size = 5, nudge_x = 0, show.legend = FALSE) +
guides(fill = guide_legend(title = "Number of Artists")) +theme_void() +
labs(title="Billboard Top100 Hit Collaboration Trend in 2000s")+
theme(plot.title = element_text(hjust = 0.5))
<-bb_df_cleaned %>% filter(year<='1999' & year > '1989') %>%
pie_dfgroup_by(artist_number) %>%
count() %>%
ungroup() %>%
mutate(perc=`n`/sum(`n`)) %>%
arrange(perc) %>% mutate(percentage = scales::percent(perc))
ggplot(pie_df, aes(x = "" , y = perc, fill = factor(artist_number))) +
geom_col(width = 2) +
coord_polar(theta = "y") +
scale_fill_brewer(palette = "Pastel1") +
geom_label_repel(data = pie_df,
aes(label = percentage),
size = 5, nudge_x = 0, show.legend = FALSE) +
guides(fill = guide_legend(title = "Number of Artists")) +theme_void() +
labs(title="Billboard Top100 Hit Collaboration Trend in 1990s")+
theme(plot.title = element_text(hjust = 0.5))
<-bb_df_cleaned %>% filter(year<='1989' & year > '1979') %>%
pie_dfgroup_by(artist_number) %>%
count() %>%
ungroup() %>%
mutate(perc=`n`/sum(`n`)) %>%
arrange(perc) %>% mutate(percentage = scales::percent(perc))
ggplot(pie_df, aes(x = "" , y = perc, fill = factor(artist_number))) +
geom_col(width = 2) +
coord_polar(theta = "y") +
scale_fill_brewer(palette = "Pastel1") +
geom_label_repel(data = pie_df,
aes(label = percentage),
size = 5, nudge_x = 0, show.legend = FALSE) +
guides(fill = guide_legend(title = "Number of Artists")) +theme_void() +
labs(title="Billboard Top100 Hit Collaboration Trend in 1980s")+
theme(plot.title = element_text(hjust = 0.5))
<-bb_df_cleaned %>% filter(year<='1979' & year > '1969') %>%
pie_dfgroup_by(artist_number) %>%
count() %>%
ungroup() %>%
mutate(perc=`n`/sum(`n`)) %>%
arrange(perc) %>% mutate(percentage = scales::percent(perc))
ggplot(pie_df, aes(x = "" , y = perc, fill = factor(artist_number))) +
geom_col(width = 2) +
coord_polar(theta = "y") +
scale_fill_brewer(palette = "Pastel1") +
geom_label_repel(data = pie_df,
aes(label = percentage),
size = 5, nudge_x = 0, show.legend = FALSE) +
guides(fill = guide_legend(title = "Number of Artists")) +theme_void() +
labs(title="Billboard Top100 Hit Collaboration Trend in 1970s")+
theme(plot.title = element_text(hjust = 0.5))
<-bb_df_cleaned %>% filter(year<='1969' & year > '1959') %>%
pie_dfgroup_by(artist_number) %>%
count() %>%
ungroup() %>%
mutate(perc=`n`/sum(`n`)) %>%
arrange(perc) %>% mutate(percentage = scales::percent(perc))
ggplot(pie_df, aes(x = "" , y = perc, fill = factor(artist_number))) +
geom_col(width = 2) +
coord_polar(theta = "y") +
scale_fill_brewer(palette = "Pastel1") +
geom_label_repel(data = pie_df,
aes(label = percentage),
size = 5, nudge_x = 0, show.legend = FALSE) +
guides(fill = guide_legend(title = "Number of Artists")) +theme_void() +
labs(title="Billboard Top100 Hit Collaboration Trend in 1960s")+
theme(plot.title = element_text(hjust = 0.5))
<-bb_df_cleaned %>% filter(year<='1959' & year > '1949') %>%
pie_dfgroup_by(artist_number) %>%
count() %>%
ungroup() %>%
mutate(perc=`n`/sum(`n`)) %>%
arrange(perc) %>% mutate(percentage = scales::percent(perc))
ggplot(pie_df, aes(x = "" , y = perc, fill = factor(artist_number))) +
geom_col(width = 2) +
coord_polar(theta = "y") +
scale_fill_brewer(palette = "Pastel1") +
geom_label_repel(data = pie_df,
aes(label = percentage),
size = 5, nudge_x = 0, show.legend = FALSE) +
guides(fill = guide_legend(title = "Number of Artists")) +theme_void() +
labs(title="Billboard Top100 Hit Collaboration Trend in 1950s")+
theme(plot.title = element_text(hjust = 0.5))
# summary(lm(peak.rank~artist_number,data=bb_df_cleaned))
ggplot(data=bb_df_cleaned ,aes(x=artist_number,y=peak.rank))+
geom_smooth(method=lm,colour="#66CC00",size=2,se=TRUE) +
scale_x_discrete(limits=c(seq(1,6,1))) +
scale_y_discrete(limits=c(seq(0,100,10))) + theme_bw()+
labs(x="Number of Collaborators", y="Peak Rank", title="Increased Collaborations Predict a Higher Peak Rank on the Billboard Charts") +
theme(plot.title = element_text(hjust = 0.5,size=15))
# geom_point(position=position_jitter(seed=1,width=0.4),colour="#006600",alpha=0.1,size=0.5,shape=3)+
# summary(lm(weeks.on.board~artist_number,data=bb_df_cleaned))
ggplot(data=bb_df_cleaned ,aes(x=artist_number,y=weeks.on.board))+
geom_smooth(method=lm,colour="#E224D5",size=2,se=TRUE) +
scale_x_discrete(limits=c(seq(1,6,1))) +
scale_y_discrete(limits=c(seq(0,100,10)))+ ylim(c(0,20))+theme_bw()+
labs(x="Number of Collaborators", y="Number of Weeks on Billboard Top 100", title="More Collaborations Predict a Longer Duration on the Billboard Charts") +
theme(plot.title = element_text(hjust = 0.5,size=15))
# geom_point(colour="#4C0099",alpha=0.1,size=0.5,shape=3)+
# length(unique(bb_df_cleaned$artist1)) 7507 artists have hit Billboard Top100 as a main artist
=bb_df_cleaned %>% group_by(artist1) %>% summarise(Top100Hits=n()) %>%
temparrange(desc(Top100Hits))%>%
rename(Artist=artist1)
formattable(head(temp,30) , list(`Top100Hits`=color_bar(color="lightblue")))
Artist | Top100Hits |
---|---|
Drake | 1494 |
Taylor Swift | 1131 |
Elvis Presley | 988 |
Elton John | 941 |
Rihanna | 916 |
Kenny Chesney | 905 |
Madonna | 885 |
Chris Brown | 870 |
Tim McGraw | 828 |
Maroon 5 | 757 |
Mariah Carey | 734 |
Usher | 723 |
Keith Urban | 717 |
Beyonce | 704 |
Rod Stewart | 682 |
Michael Jackson | 665 |
Stevie Wonder | 665 |
Jason Aldean | 656 |
Whitney Houston | 654 |
R. Kelly | 645 |
Kanye West | 634 |
Rascal Flatts | 627 |
P!nk | 620 |
Brad Paisley | 618 |
Blake Shelton | 611 |
The Beatles | 608 |
Chicago | 607 |
Aretha Franklin | 604 |
Diana Ross | 597 |
The Weeknd | 596 |
=data.frame(Featured=c(bb_df_cleaned$artist2[!is.na(bb_df_cleaned$artist2)],bb_df_cleaned$artist3[!is.na(bb_df_cleaned$artist3)],bb_df_cleaned$artist4[!is.na(bb_df_cleaned$artist4)],bb_df_cleaned$artist5[!is.na(bb_df_cleaned$artist5)],bb_df_cleaned$artist6[!is.na(bb_df_cleaned$artist6)]))
c_df
=c_df %>% count(Featured) %>% arrange(desc(n)) %>% rename('How many times'=n)
c_df
formattable(head(c_df,30),list(`How many times`=color_bar(color="#FA614B66")))
Featured | How many times |
---|---|
Lil Wayne | 1193 |
Drake | 1159 |
Nicki Minaj | 843 |
Chris Brown | 664 |
The Gang | 469 |
T-Pain | 466 |
The Jordanaires | 466 |
Ludacris | 452 |
Kanye West | 413 |
Wind | 395 |
Future | 388 |
Justin Bieber | 386 |
Rihanna | 382 |
Fire | 379 |
Akon | 368 |
Dunn | 361 |
Lil Jon | 337 |
Young Thug | 333 |
The News | 320 |
DaBaby | 293 |
Jay-Z | 276 |
2 Chainz | 275 |
Cardi B | 272 |
T.I. | 272 |
Ne-Yo | 263 |
Rick Ross | 263 |
Usher | 260 |
Ty Dolla $ign | 242 |
Snoop Dogg | 240 |
Bruno Mars | 239 |