To Data Or Not To Data

It is modern to be data-driven. Organizations brag: “We make decisions based on data.”

Having spent a decent amount of time building data-crunching systems, I have a lot of sympathy for this attitude. In fact, I have suggested data-driven approaches in previous articles. But I also think that the tendency to measure and quantify everything can backfire. Let us go into the good, the bad, and the ugly of data-driven culture.

The ideal of a data-driven organizational culture comes out of the same spirit as the idea of scientific management. It is not politics that should decide what gets done but the merit of the argument. Peter Drucker is wrongfully quoted with the statement “what gets measured gets managed” – an example of the many business superstitions we find ourselves confronted with.

There is something to the idea, of course, that not the highest-paid person’s opinion should be the guiding principle in decision-making. It would be great to have some objective standard when deciding what market to launch in next or which big data analytics framework or cloud vendor to choose. Nevertheless, establishing such an objective standard is harder than many are willing to acknowledge and it comes with its own problems.

Being Data-Driven Is Hard

Imagine the following scenario: You retrospect on recent failed projects in your organization and you find that a certain framework or technology was much more prevalent among those failed projects than among your current cash cows. Should you conclude that the data indicate that this technology should be avoided?

You might have spotted the problem already: Correlation is not causation! Even if the observed pattern is not just statistical noise, there are many factors that could explain both the technology choice and the failure of the project. Maybe all of these projects were targeting a specific type of customer which made the technology an obvious choice for interoperability reasons and your marketing simply does not vibe with this target audience.

This highlights one of the problems of the use of data in complex decision-making problems: Often, there are many, many variables involved and it is not clear if the ones that get measured are the ones that matter.

In fact, it is often hard to obtain relevant data for important variables which then leads to a disproportionate emphasis on factors that can get measured.

An example of this is that, in software projects, we tend to spend a lot of time on cost estimates. While measuring and extrapolating cost is an important ingredient, there is a second factor involved in deciding what to build next: Value. However, estimating value turns out to be harder than estimating cost. Even if your cost estimate is very precise, how does this help in prioritizing work if the estimated added value is no better than an educated guess? In the worst case, value stops being a factor in prioritization decisions and you only look at the thing for which you do receive data.

Of course, establishing an objective standard is a nice idea in theory. However, we cannot escape the fact that organizations are a collection of individuals with diverging incentives which may or may not be aligned with the organization’s incentives. In case of misalignment, negative political behavior can still occur: It is easy to make the data say what you want it to say, especially if you do not have access to reliable tools like randomized controlled trials to settle the question.

When that happens, the result is arguably worse: Even though a decision is born out of the same non-scientific motivation as a highest-payed-person-opinion, the presence of a seemingly objective standard gives it an authoritative appeal and leaves us less room to challenge it and correct course.

Look at it this way: We think that the scientific method is a good process, not because researchers are infallible experts that can always be trusted. They too face adverse incentives. We think that the scientific method works well overall because results can be challenged and tested over and over based on new data. Spurious conclusions may be accepted in the short term but will be rejected in the long run. This is a process that takes time – time that you probably do not have.

The provocative summary of a data-driven culture gone wrong is the following: Without data to used to make wrong decisions. With data, you’re still wrong – but now you have numbers to make you feel better.

Why Do We Need Data?

Every high-brow article worth its digital ink needs to include the phrase “let us take a step back”. Therefore I feel compelled to do the same: What is the purpose of the data that you are collecting, really?

I think in part it is about understanding how a system works. What is the mechanism? How does the mass of the earth influence other matter? How does a pull request review process impact delivery speed and quality?

Understanding the mechanism allows us to make predictions. If I drop this glass, what will happen to it? If I limit the size of a change that can be proposed for code review, how will our team’s output change?

Similar to how we want to understand the system of nature and physics to engineer bridges and rockets, we want to understand the economic-, technical-, and social system we are in when making an important business decision.

The reason why in many cases data is helpful is, that it is a way of rejecting bad hypotheses. It also helps when systems are too complex for analytical reasoning. Instead, with sufficient examples, we can detect the pattern statistically.

For decisions in software, we usually don’t have that data. Should your team do code reviews? What if we knew that teams practicing code reviews are 80% more successful? We still don’t understand the mechanism and this number could be a spurious correlation.

Importantly, understanding the mechanism and collecting data to refute a model is a scientific endeavor. In business practice we rarely get time for this degree of scientific rigor.

Low-stakes and High-stakes Decisions

What should an organization do in the face of this dilemma? The bad news is that I do not think there is an easy way out. The good news is that everyone else is in the same bind as you.

Here is my suggestion: Let’s think of decisions as bets on a possible future. That means, first of all, we should determine what is at stake. Some decisions could break your entire enterprise if you find yourself on the wrong side of the bet – others you may be able to recover from quite easily.

A first insight is, that the investment into the decision process should be proportional to the importance of the decision.

For low-stakes decisions we can ask ourselves: How will we recognize that we were wrong and revert the decision? This opens the door to a much more trial-and-error based approach that can exist without the scientific pretense.

In fact, I would argue that the most valuable element of data-driven culture is open mindedness about faulty assumptions. To use data correctly, an individual leader cannot be married to a strategic decision but remains humble enough to recognize observations that contradict the hypothesis. That same mentality is required when retrospecting on low-stakes decisions to evaluate with intellectual honesty if the observations align with the current system model.

For high-stakes decisions the situation is more difficult. I would suggest to make use of the quantitative data available both from outside and inside your organisation but to treat this data as not much more valuable than anecdotes. Secondly, I would suggest being explicit about the system model based on which you place your bet.

Take the following statement for example: “rewriting component X in Rust will make it 40% more energy efficient based on data from similar projects”. This seems insufficient to me as it does not describe the mechanism that makes this a reasonable bet, ignores costs and risks and fails to describe in which way energy efficiency is more important than alternative uses of development time.

I want to conclude with the following thought: The scientific spirit that allows us to look at the system, formulate a hypothesis and also acknowledge if observations do not align with the initial hypothesis is definitely valuable. Nevertheless, we rarely get the space and time to do the work that would be necessary to interpret data correctly. Let us use data as best as we can given the constraints we have but let us not obsess as the error bars can be significant and anecdotal evidence will be just as valuable in many decisions.

To Data Or Not To Data

Being Data-Driven Is Hard

Why Do We Need Data?

Low-stakes and High-stakes Decisions

One thought on “To Data Or Not To Data”

Add yours

Leave a comment Cancel reply

Being Data-Driven Is Hard

Why Do We Need Data?

Low-stakes and High-stakes Decisions

Spread the word:

One thought on “To Data Or Not To Data”

Add yours

Leave a comment Cancel reply