An Underappreciated Problem in Data Analytics




PHOTO:
Eric Nopanen | unsplash

Rolling Stone recently updated its list of the 500 Greatest Songs of All Time. To create the new list, it polled more than 250 artists, musicians and producers — from Angelique Kidjo to Zedd, Sam Smith to Megan Thee Stallion, M. Ward to Bill Ward — as well as figures from the music industry and leading critics and journalists. Each person submitted a ranked list of their top 50 songs, and Rolling Stone tabulated the results.

And the top 10 songs of all time are (drumroll):

  1. Aretha Franklin: “Respect”
  2. Public Enemy: “Fight the Power”
  3. Sam Cooke: “A Change Is Gonna Come”
  4. Bob Dylan: “Like a Rolling Stone”
  5. Nirvana: “Smells Like Teen Spirit”
  6. Marvin Gaye: “What’s Going On”
  7. The Beatles: “Strawberry Fields Forever”
  8. Missy Elliott: “Get Ur Freak On”
  9. Fleetwood Mac: “Dreams”
  10. Outkast: “Hey Ya!”

Say what?

The Problem With Recency Bias

“Recency bias” is a cognitive bias that favors recent events over historic ones. Any “Greatest of All Time” list is subject to this kind of bias. A poll of contemporary artists and current music journalists will of course skew toward contemporary tastes at the expense of, let’s say, more classic tastes. Admittedly, sometimes these lists can also have a “nostalgia bias” common to people in my demographic brackets in which nothing recent can get the time of day. But only two Springsteen songs (“Born to Run” at Number 27 and “Thunder Road” at 111) in the top 150? Come on.

Recency bias is an underappreciated challenge in data analytics, particularly when historical data is used to draw trends and to make predictions. The problem arises because rolling data sets may have a lot of the recent stuff but not enough older stuff. I like the way the BBC illustrated the problem in “The trouble with big data? It’s called the ‘recency bias'”

“Imagine looking back over a photo album representing the first 18 years of your life, from birth to adulthood. Let’s say that you have two photos for your first two years. Assuming a rate of information increase matching that of the world’s data, you will have an impressive 2,000 photos representing the years six to eight; 200,000 for the years 10 to 12; and a staggering 200,000,000 for the years 16 to 18. That’s more than three photographs for every single second of those final two years.”

(Author’s Note: There is a problem in this otherwise fine illustration of “recency bias” because it does not incorporate “birth order bias.” My sister Jeanne, the youngest of the six of us, had one picture taken when she was born, a second for her First Holy Communion, and another at her prom, and that was about it for her first 18 years.)

Again from the BBC:

“Here’s the problem with much of the big data currently being gathered and analyzed. The moment you start looking backwards to seek the longer view, you have far too much of the recent stuff and far too little of the old. Short-sightedness is built into the structure, in the form of an overwhelming tendency to over-estimate short-term trends at the expense of history.”

Related Article: AI Bias: When Algorithms Go Bad

Data Competencies to Counteract Data Bias

Of course recency bias isn’t the only challenge we face as we gather larger and larger quantities of data and dive more deeply into using analytics to draw conclusions from that data. Michael Singer cites some critical ones in “10 Cognitive Biases in Business Analytics and How to Avoid Them.” Here are three that I particularly like:

  • Clustering Illusion — you overestimate the importance of small runs, streaks, or clusters in large samples of random data (that is, seeing phantom patterns).
  • Confirmation Bias — you search for, interpret, focus on and remember information in a way that confirms one’s preconceptions.
  • Pro-Innovation Bias — you have an excessive optimism towards an invention or innovation’s usefulness throughout society, while often failing to identify its limitations and weaknesses.

An AIIM report from a long time ago (kudos to Doug Miles and Dave Jones) made an observation that I’ve always loved about the two kinds of competencies that organizations need when it comes to truly leveraging analytics capabilities. They noted that we need both data scientists and data entrepreneurs in our organizations.

“The data scientist has his ear to the business, and his eyes full on the data — the data entrepreneur has exactly the opposite focus, eyes full on the business, ear to the data.”

Without these complementary perspectives, we run the risk that our biases can turn our data against us, leading to conclusions in which we have great confidence, but are simply wrong. Mark Twain popularized the saying “Lies, damn lies, and statistics,” and it seems like the saying has never been truer or more relevant than now.

The alternative to using data intelligently, sensitively, and in context is we become overwhelmed by information, and more prone, whether in our organizations or in the public sphere, to rely on “decisions by anecdote.” As in “I heard it from my cousin’s friend in Trinidad ….”

Not that I am exhibiting any recency bias in my illustration.

John Mancini is the President of Content Results, LLC and the Past President of AIIM. He is a well-known author, speaker, and advisor on information management, digital transformation and intelligent automation.



Source link

We will be happy to hear your thoughts

Leave a reply

Logo
Shopping cart