Tennessee Cave Survey - Caves Described over Time

Twilight, John Henry Demps Cave (Sullivan Entrance), White County, Tennessee  1

I'm often asked, "How many caves are there in Tennessee?" What they are really asking is, "how many caves do we know about in Tennessee?" or "how many caves have been described in Tennessee?" I like to say "described" as opposed to "discovered" since most caves that are newly described were likely known to historic and prehistoric people. Sometimes there is direct evidence to support this.

The simple answer to the question is to state the number of caves as were described in the last Tennessee Cave Survey data release. But a more thorough answer would note that number changes annually. Let's explore just how much it has changed, and maybe speculate on how much it will change into the future.

History of Cave Documentation in Tennessee

In 1959 Dr. Tom Barr's book, Caves of Tennessee described approximately 500 caves. This was the first attempt to create a single, central source for cave information in Tennessee. It set the standard for what we would come to call a "narrative" and provided a format to report basic attributes for each cave, like its latitude, longitude, geology, total horizontal length, total vertical depth, number of pits, and depth of the deepest pit.

At some unknown date in 1973, the Tennessee Cave Survey incorporated. Based on the scattered notes I was able to determine that at that time, there were exactly 1,163 described caves in their dataset.

By 1987, the TCS was managing records for exactly 4,663 caves. That's exactly 3,500 more than when TCS incorporated 14 years earlier.

I arrived on the scene with the TCS in 2006 when we were still below 10k described caves with 9,107. It would be 2014 when we finally reported more than 10k (10,067 to be precise). In a short period of time this number grew 11,283 in 2020.

I take you on this brief walk down the history of our organization to make a point. We cannot continue to find new caves forever. At some point, we will run out of caves with natural entrances, and caves with entrances obvious to us as being able to be enlarged to permit entry.

Estimating Cave Totals by Regression, Part 1

Let's start with an observation that is much tossed around in the caving community about caves with no entrances. I am paraphrasing, but it goes something like, "For every cave that has an entrance, there are 10 that do not". How did we arrive at this number? I don't know. And I cannot speak to its authenticity. I think someone was perhaps making a point about there being a lot of things that are unknown.

Could we estimate the number of known caves using data on the number of cave entrances that we do know of? For example, if we know how many caves have 1 entrance, how many caves have 2 entrances, and so on, can we estimate the number of caves that have zero entrances?

Depending on what trendline we choose to fit our data with, we end up with wildly different numbers, each of which are obviously incorrect. If we use linear regression we end up with an R²=0.336. A logarithmic fit looks good, has an R²=0.616, but never crosses the Y axis, so there are an infinite number of caves with no entrances. Sadly, this may be the closest real answer that one can model. There may be millions of caves in Tennessee alone with no entrance, buried in bedrock and collapse, but which otherwise qualify with the TCS definition of a cave.

Figure 1: Cave Entrance Frequency

Estimating Cave Totals by Regression, Part 2

Here's an idea of how one can estimate caves with natural entrances (or caves whose entrances are about to be natural). If we use a scatterplot graph and place on the X axis dates, and Y axis number of caves described, then we see a steadily rising graph, like the one below.

To this scatterplot I have added a second order polynomial trendline. This is our estimating line. As the polynomial line reaches its peak we are saying that's our best guess for the peak number of caves.

Figure 2: Caves Described over Time [Google Sheets]

The above data is displayed using Google Sheets so that you can enjoy the cool "gee-whiz" interaction qualities (and it will be easier for me to keep up to date). Unfortunately, Google Sheets won't let us use a polynomial forecast. None of the other line fitting types accurately forecast forward. Excel does allow us to do this though. Below is what that the forecast looks like.

Tennessee Caves Described Over Time with Forecast
Figure 3: Caves Described over Time [Excel]

What this looks like is that in the far off year of 2048, we will have realized peak caves in Tennessee with around 12,200. But does that really jive with what we know of caves? Let's look closer into the September 2020 update of TCS data where an individual turned in more than 500 caves.

Table 1: Caves Described over Time [Data]. Please note that the addition and reductions don't add up. This is both a historic record and nomenclature problem. Continuing forward we intend to rectify this.

If someone were to turn in that many caves in a single year again, and the above estimate were correct, they would have turned in 41% of the remaining caves in Tennessee. That probably doesn't make much sense then does it? What other methods are available to us to estimate total caves?

When You've Broken Math, Make Stuff Up

Right now I'm saying that there are at least 14,000 caves with natural, or nearly natural entrances in Tennessee. This estimate is based on my intuition of the data, the geology, and transforming technologies to find and access caves. Sources? Ummmm, the above? It can't be infinity. It can't be less than the observed number. Fourteen thousand seems like a safe guess. :)


Shannon Diversity Index in Top Songs by Year


Can ecological measures be used to glean information about music, and perhaps music trends? This post explores the possible use of the Shannon Diversity Index as a means to understand diversity within the "lyrical ecosystem."


A list of List of Billboard Year-End number-one singles (1) was used for the data collection portion of this project. Excluded from the study was 1948's hit, "Twelfth Street Rag" by Pee Wee Hunt. It is an instrumental with no lyrics to study.

Length is the song length in seconds as measured by the first video to show up on a YouTube Search, with preference given to anything shown to be "official."

Unique words and total words are from song lyrics that were pulled from Google's lyric search. Where lyrics weren't available Genius.com was used. Grunts, moans, yeahs, na-na-nas are all counted if they are found within the lyrics as provided by the stated sources.

Words per second is a measure of total words divided by song length. Repetition is a measure of total words divided by unique words where a score of 1 would be no repetition and a higher score shows more repetition of words.

The Shannon-Wiener diversity index is a measure of diversity used in ecology that combines species richness (the number of species in a given area) and their relative abundances. It tells the level of diversity in that particular area, giving us the ability to say if the diversity is low (a low number) or high (a high number).



Before the release of "Hey Jude" in 1968, average song length was 172 seconds roughly. After its release average song length went up to 255 seconds, a full 1:23 longer.

The longest song, Elton John's 1997 hit, "Candle in the Wind 1997 / Something About the Way You Look Tonight" weighs in at a lengthy 492 seconds, nearly 4 standard deviations above the mean. "All Shook Up" by Elvis Presley (1957) is 119 seconds and 1.66 standard deviations below average.

Words per Second

Prior to "Call me" in 1980 the average number of words per second in a song was about 1 word per second of music. After 1980, the average has been 1.68. There is not a strong correlation to year and words per second (R² 0.255).

Unique Words

Prior to Fitty's 2003 hit, "In Da Club", the number of unique words per song was roughly 83. After that the average number of words per song jumped to around 149. Old Town Road in 2019 broke the new trend and dropped the number back down to 73 unique words. It is noteworthy that the top five unique word songs are all rap and hip hop.

Can you believe that "Believe" (1999) has only 12 unique words, making it the least diverse song lyrically? "Thrift Shop" (2013) has most unique words at 276.

Total Words

The trend for total words has generally been growing (R² 0.462) throughout the observed years. "Hey Jude" in 1968 may be a distinct break point, but by the 1980's hit, "Call me", the trend was very much underway.

The fewest words in a song falls to "Song from Moulin Rouge" (1953). This is another outlier within the dataset. The song's lyrics do not begin until the 1:36 mark, which is at least partially why this is the case. The most words in a song is awarded to "Low" (2008), but 65 of those words are "low".

"Song from Moulin Rouge" (1953) finds itself with a record minimum words per second and Mariah Carey wastes no time fitting 650 words into a song 203 seconds long landing her the most words per second within the dataset.


The true outlier of this entire dataset is Cher's 1999 hit, "Believe." It skews the entire dataset for so many variables, but especially for repetition with its score 23 being an average of how many times you hear each word of the song. The word "you" is repeated in the song 47 times; "I'm" 38 times; "too" 38 times; "good" 38 times.

The least repetitious song is Cherry Pink and Apple Blossom White by Perez Prado (1955). Its lyrics appear closer to our traditional notion of a poem than perhaps any of the other songs on the list.

Shannon Diversity Index

There is a minor correlation between songs decreasing in Shannon Diversity over time (R² 0.335).

Cher's "Believe" (1999) again makes the outlier list with it having the lowest Shannon Diversity Index. In terms of an ecosystem comparison, this is on par with a Walmart parking lot. The Way We Were, by Barbra Streisand (1974) ranks highest in the Shannon Diversity Index.


This is the part of the post where I pick on "Believe" for a while. Look, we all lived through 1999, and despite us not realizing that it would be another 21 years before the world ended, we were all riding on the "Party like it's 1999" high. Maybe that accounts for our lack of interest in a diverse hit song.

Aside from pure statistical significance, there are certainly a number of interesting outlier songs that are worth mention. "Hey Jude" broke the trend of short songs. "Call Me" was more about its lyrics than its music, and began a trend opposite of "Song from Moulin Rouge".


1) https://en.wikipedia.org/wiki/List_of_Billboard_Year-End_number-one_singles_and_albums



Analysis of American Caving Accidents 2021

Car wreck, Capshaw Cave entrance, Putnam County, Tennessee 4 I've been looking forward to the release of the American Caving Accidents 2021. Many rescues and noteworthy events occurred in the Upper Cumberland of Tennessee (5 out of 17!) and even more involved rescuers that I'm friends with. Looking at the data provided by the ACA editor, I produced a few graphs and did some basic analysis.

Because I know I'll be asked, I'll explain in greater detail what the final charts represent. Basically one could think of it as a means of outlier detection. It's similar to a chi squared analysis, but instead of looking for significance, one looks for interesting patterns. For example in the chart Difference of Observed and Expected Caving Accidents by Type, between 1986 and 1994 numerous equipment problems are being documented. But around 1994, the problem seems to be resolved. What happened? Did equipment technology change in those years? This analysis shows the "peaks and troughs" compared to the harmonic mean of the data.