Shannon Diversity Index in Top Songs by Year


Can ecological measures be used to glean information about music, and perhaps music trends? This post explores the possible use of the Shannon Diversity Index as a means to understand diversity within the "lyrical ecosystem."


A list of List of Billboard Year-End number-one singles (1) was used for the data collection portion of this project. Excluded from the study was 1948's hit, "Twelfth Street Rag" by Pee Wee Hunt. It is an instrumental with no lyrics to study.

Length is the song length in seconds as measured by the first video to show up on a YouTube Search, with preference given to anything shown to be "official."

Unique words and total words are from song lyrics that were pulled from Google's lyric search. Where lyrics weren't available was used. Grunts, moans, yeahs, na-na-nas are all counted if they are found within the lyrics as provided by the stated sources.

Words per second is a measure of total words divided by song length. Repetition is a measure of total words divided by unique words where a score of 1 would be no repetition and a higher score shows more repetition of words.

The Shannon-Wiener diversity index is a measure of diversity used in ecology that combines species richness (the number of species in a given area) and their relative abundances. It tells the level of diversity in that particular area, giving us the ability to say if the diversity is low (a low number) or high (a high number).



Before the release of "Hey Jude" in 1968, average song length was 172 seconds roughly. After its release average song length went up to 255 seconds, a full 1:23 longer.

The longest song, Elton John's 1997 hit, "Candle in the Wind 1997 / Something About the Way You Look Tonight" weighs in at a lengthy 492 seconds, nearly 4 standard deviations above the mean. "All Shook Up" by Elvis Presley (1957) is 119 seconds and 1.66 standard deviations below average.

Words per Second

Prior to "Call me" in 1980 the average number of words per second in a song was about 1 word per second of music. After 1980, the average has been 1.68. There is not a strong correlation to year and words per second (R² 0.255).

Unique Words

Prior to Fitty's 2003 hit, "In Da Club", the number of unique words per song was roughly 83. After that the average number of words per song jumped to around 149. Old Town Road in 2019 broke the new trend and dropped the number back down to 73 unique words. It is noteworthy that the top five unique word songs are all rap and hip hop.

Can you believe that "Believe" (1999) has only 12 unique words, making it the least diverse song lyrically? "Thrift Shop" (2013) has most unique words at 276.

Total Words

The trend for total words has generally been growing (R² 0.462) throughout the observed years. "Hey Jude" in 1968 may be a distinct break point, but by the 1980's hit, "Call me", the trend was very much underway.

The fewest words in a song falls to "Song from Moulin Rouge" (1953). This is another outlier within the dataset. The song's lyrics do not begin until the 1:36 mark, which is at least partially why this is the case. The most words in a song is awarded to "Low" (2008), but 65 of those words are "low".

"Song from Moulin Rouge" (1953) finds itself with a record minimum words per second and Mariah Carey wastes no time fitting 650 words into a song 203 seconds long landing her the most words per second within the dataset.


The true outlier of this entire dataset is Cher's 1999 hit, "Believe." It skews the entire dataset for so many variables, but especially for repetition with its score 23 being an average of how many times you hear each word of the song. The word "you" is repeated in the song 47 times; "I'm" 38 times; "too" 38 times; "good" 38 times.

The least repetitious song is Cherry Pink and Apple Blossom White by Perez Prado (1955). Its lyrics appear closer to our traditional notion of a poem than perhaps any of the other songs on the list.

Shannon Diversity Index

There is a minor correlation between songs decreasing in Shannon Diversity over time (R² 0.335).

Cher's "Believe" (1999) again makes the outlier list with it having the lowest Shannon Diversity Index. In terms of an ecosystem comparison, this is on par with a Walmart parking lot. The Way We Were, by Barbra Streisand (1974) ranks highest in the Shannon Diversity Index.


This is the part of the post where I pick on "Believe" for a while. Look, we all lived through 1999, and despite us not realizing that it would be another 21 years before the world ended, we were all riding on the "Party like it's 1999" high. Maybe that accounts for our lack of interest in a diverse hit song.

Aside from pure statistical significance, there are certainly a number of interesting outlier songs that are worth mention. "Hey Jude" broke the trend of short songs. "Call Me" was more about its lyrics than its music, and began a trend opposite of "Song from Moulin Rouge".





stargene said…
To your knowledge, has there ever been a study of possible
Shannon Index of instrumental music itself, say rock, r&b, music sans words. I suspect that certain
music may have higher Shannon Index values than worded
text or lyrics. In this context, there is a quote from
one of Stravinsky's memoirs, "..It is not that music is
too is that music is too precise for words."
I also suspect that such research using pure instrumental
music may be much more difficult than verbal studies.

Popular posts from this blog

Missing People Map

Streams of Tennessee

Jungle Book Film (1994) and the Upper Cumberland