How to build the next version of X

updating based on feedback is trickier than you think

Psychologist Bertram Forer once asked some students to fill out a personality test and then gave them a personalized profile based on the results. The students rated how well the profile captured their individual personality, and at the end, the average rating was found to be 4.2 out of 5!

But actually, Forer had taken vague statements like “You have a great deal of unused capacity which you have not turned to your advantage” from a book on astrology, assembled them into a profile, and given the same profile to everyone [1]. The students had stretched a profile full of elastic language to fit their self-image, even though they probably thought they were judging the profile objectively.

It’s easy to laugh at those students, but they had completed a personality test, hadn’t they? And when they saw their profile later, it didn’t feel vague on a gut-level, I’d wager.

Imagine you’re one of those students. You read the profile, you connect the dots to your own life and to some memories of the personality test and think ‘Oh, huh!’. But you probably don’t immediately think: “hmm, if I was randomly selected from a large group of diverse people, this profile might match quite a big share of that group, or it might not; I don’t really know if this identifies my personality uniquely enough.”

That’s the annoying thing about vague feedback. It doesn’t feel vague and unhelpful, probably not most of the time. Instead, our warm rush of correlation overpowers the slower moving, cautious trains of thought, and we update our perception of our performance based on partially dreamt-up information. It seems quite easy to mistake bad feedback for good.

So, how should we gather feedback and update ourselves? In this post, I’ll focus on some things that might dissolve this question.

I’ve been reading about forecasts, and how feedback can improve them, in the excellent book Superforecasting. Forecasting aside, this book also casually throws out so many gems about various topics that I ended up filling a decent chunk of a notebook. (Did you know that randomized controlled trials were not common in medicine until basically after WWII? Read up on Archie Cochrane!)

A key theme in Superforecasting is how we can become less terrible at forecasting. To do that, we need to adopt an attitude of ‘perpetual beta’ — ‘try, fail, analyze, adjust, try again.’ ~~(hearing echoes of Carol Dweck’s Mindset everywhere…)~~ actually… I want to dig deeper into Dweck’s studies, based on some new complicating things I’m reading, like: ‘if growth mindset was more causative of success in school/life than fixed mindset, the relative proportion of growth/fixed mindset students in places like Stanford would be quite tilted, but it isn’t so strongly tilted’, and so on.

And that cycle of trying and adjusting is not possible without incorporating feedback and adjusting appropriately. So, Superforecasting is quite on-the-ball for our question (how should we gather feedback and update ourselves?)

Unpacking our question, there are two main sub-questions I want to cover:

What is good feedback?
How should we use good feedback?

Good Feedback

Regarding Forer’s experiment with personality profiles, clear language would probably have uncovered the inadequacy of the profiles to Forer’s students.

Imagine being one of Forer’s students and reading: A meta-study of US college students’ responses to personality tests and their life-outcomes over a period of 10 years found questions types A, B, and C to be good predictors of behavioral-categories X, Y, and Z, and your test indicates you’re 50–70% correlated with B, D, and E, so our model predicts the probability of X, Y, and Z behaviors in your case is 30–70%.

“Oh, details and numbers… oh, so they use that data and my test to predict things about me… that’s quite the range there, 30–70%? Probably not saying much about my personality. Carry on.”

(this is a dramatization, I’m curious what the actual literature is like on this topic…)

Clear feedback is important, but unambiguous feedback is also crucial. You should know without a doubt when you hit or miss.

If you’re practicing bowling and can’t see the pins fall, you will probably get confident more quickly than you get better. Hearing all sorts of sounds of pins being impacted but not seeing how each roll affects the pins is not useful if you want to knock down more pins per roll.

It’s like re-reading a textbook chapter… it’ll make you think you know the material but your ability to recall during a test will not improve nearly as much as taking actual practice quizzes. The book Make it Stick makes special note of the power of practice tests & flash cards versus the inefficacy of simply reading material over again.

Imagine getting unambiguous feedback. It may be temporarily discouraging to know you didn’t do well. It’s probable that most of the feedback you receive in the process of making something will be critical. After all, you only succeed a few times (near the end and somewhere in the middle a few times, maybe) but fail a lot more. But do you want to actually improve, or do you want to simply feel like you’re improving? Unambiguous feedback is probably the most generous thing anyone can give you.

Promptness of feedback is also important. Too great of a time lag between the performance and the feedback can be harmful.

Once we know an outcome, it heavily skews our perception of what we thought before the outcome. So, if we rely on fuzzy memories of past performances, any kind of feedback will be interpreted by a skewed frame of mind.

There’s a really cool experiment about this phenomenon in Superforecasting (I should do a book review…). Baruch Fischoff had people estimate the likelihood of major world events and then recall their estimate after the event did or did not happen. Knowing the outcome consistently slanted the estimate, even when people tried not to be biased. Another one I remember reading somewhere is the video testimony right after JFK’s assassination versus depositions much later; you can very clearly notice people’s avowed memories change drastically.

If you don’t tie yourself down with your performance/guess/draft/pull-request before getting feedback, you run the risk of ruining your ability to best learn from the feedback. Hindsight bias will hit you even if you know it might hit you, even if you try to de-bias yourself!

Imagine receiving prompt feedback. You finish up a draft of an essay and send it to a friend, then go for lunch. Later in the afternoon, you receive a graded response from your friend, even using the kind of rubric your teacher will use. How quickly will you offer a coffee/beer to that friend? Promptness of feedback while making something is utterly energizing.

Using Good Feedback

Before we even think about getting feedback, we need to score our efforts using a suitable metric. This brings me to Douglas Hubbard’s How to Measure Anything

Anything can be measured. If a thing can be observed in any way at all, it lends itself to some type of measurement method. No matter how “fuzzy” the measurement is, it’s still a measurement if it tells you more than you knew before. And those very things most likely to be seen as immeasurable are, virtually always, solved by relatively simple measurement methods.

If you find yourself saying “oh, I don’t think it’s even possible to measure X,” step back, humor yourself, and try to break down that question and come up with a measure. Perhaps use a technique used by computer scientists: break down the problem into smaller sub-problems until it becomes clearer how to measure the sub components. Then, reduce the sub component measures into one aggregate measure. This is also something Tetlock talks about in Superforecasting, calling it ‘question clustering.’

Here’s another way to think about it. Let’s say you’re trying to figure out how to measure ‘big unmeasurable thing X.’ Now, consider a statistical distribution of possible future worlds (within a rough timebox, like next 2 years) and consider what kinds of observable things you would anticipate happening in the clump of worlds in which ‘X’ happens. What I mean is: consider what other things probably exist or how they behave if ‘X’ happens in that world. Now, you have a starting point to search for a set of things that you can observe, and what you can observe you can virtually always measure somehow. There are probably smaller questions you can define that individually don’t answer the big question but taken together as a cluster can give you a decent probabilistic understanding, or a ‘temperature,’ of ‘X.’

Want something more specific? I’ll quote Hubbard’s “Applied Information Economics” method:

Define a decision problem and the relevant variables. (Start with the decision you need to make, then figure out which variables would make your decision easier if you had better estimates of their values.)
Determine what you know. (Quantify your uncertainty about those variables in terms of ranges and probabilities.)
Pick a variable, and compute the value of additional information for that variable. (Repeat until you find a variable with reasonably high information value. If no remaining variables have enough information value to justify the cost of measuring them, skip to step 5.)
Apply the relevant measurement instrument(s) to the high-information-value variable. (Then go back to step 3.)
Make a decision and act on it. (When you’ve done as much uncertainty reduction as is economically justified, it’s time to act!)

Okay, so it is possible to measure basically anything. An illustrative example from Superforecasting: Tim Minto, a forecaster in the Good Judgment Project team that competed in a 4-year forecasting tournament and won massively, predicted the Syrian refugee flows in 2014 and got a great Brier score of 0.07. Brier scores are a way to properly score forecasts; they go from 0.0 (best) to 2.0 (worst), 2.0 means saying it’ll definitely happen and it definitely doesn’t happen. So, Tim scored very very well on Syrian refugee flows. But Tim also forecast that Shinzo Abe would visit the Yakusuni Shrine, and got a score of 1.46. Tim knew exactly how he did in both cases, thanks to this practice of scoring each effort.

Imagine that, dissolving away hindsight bias with a pen and a piece of paper. Accurately knowing how to update. Just like that.

Clear, prompt, and unambiguous feedback is critical for actual improvement. But knowing how to use good feedback is not something we can skip. We need to keep score, search for good feedback, analyze and adjust, and try again.

Try, fail, analyze, adjust, try again!