The Remarkable Ways We Gain Insights - 2022-06-30

I heard a book recommendation on a podcast, The Changelog: Learning From Incidents. One key takeaway from that podcast was to have someone separate do the investigation for an incident and to report on it. The people that just survived are worn out, and process at that moment is not going to lead to truly learning and preventing it in the future.

In other industries like aviation, healthcare, and the like which involve the safety of human lives or property, we have significant processes in place. The book talks about this too. Is it on you to assess the competency of the flight pilot, evaluate his flight history and reputation? No, you're just going to be there for the ride. Process is there, full of checklists to get you safely to the other side (or to cancel your flight).

I used to work for healthcare technology; they were very checklist oriented. In healthcare technology we served institutions that hire physicians, nurses, etc. and that brought a checklist oriented culture. In fact so koolaid was forced down our throats about what makes healthcare successful (financially and functionally) that we were forced into the same stagnant sluggish spell that you see in government institutions. But hey we charged and made a lot of money. Or at least my employer did. I wasn't compensated differently for the impact of my work.

absolutely-not
You wouldn't want to leave a towel in someone's gut and have them return, only to have a sharp tool left inside the second time for a third return visit.. Yes, this is a real story. Checklists do exist for a reason around life.
Back to the health care employer though, process had been so ingrained over the decades (literally) that altering it to include unit tests was barely supported in 2015 and only on pure functions. The employer seemed to want to reduce their QA costs but would not support the development costs to automate testing around their products.
disappointed
This ultimately led to my burnout there. My feature which wrapped around the ingress and egress points of the application could not be automated. The language my feature was written in literally did not have the concept of interfaces for me to mock inputs, outputs, and verify effects. I tried for 4 months to figure out how to replicate the features of an interface so that my feature would not take 40-80 cumulative hours of QA effort for every feature or fix introduced.
uhh
oh-you
Manager-me here in 2022: That was really dumb of me. Not only was that scope of development not planned, I should have communicated better on what I was trying to do and concluded I could not make any progress after two weeks. Secondly, the amount of sunken dev-time was completely disproportionate of all future QA that would happen.
glare
Also Manager-me here in 2022: What the heck boss-man? I may have had prior dev experience but I was fresh out of college. I did not receive any mentoring or coaching when things got hard. "How can I help you?" is ineffective when I do not know how to help myself yet.

But among all the processes introduced in and around our lives, there's a problem, and it comes back to learning from software incidents. Process will obscure the clues that lead to deeper insights to the issues encountered; Processes may lead to surface issues being documented but not document the underlying cause or suggest that an underlying cause should be explored.

As both a software engineering manager and someone who has survived, documented, and resolved incidents, I wanted to see what this book had to bring me.

Book cover for Seeing What Others Don't: The Remarkable Ways we Gain Insights

At first I tried highlighting things in this book, but I was just getting distracted finding things that looked like a quip here and there that I could point to. Only the first 50 of 250ish pages got highlighted. In a way I'm glad I stopped the highlighting; the pausing to properly fill in boxes of color would have taken too much time.

This book has a lot of padding. It does not need to be nearly an inch thick.

It starts out with grounding me–the reader–in the same understanding the author–Gary Klein–had before diving into insights. Apparently he collected stories over the course of his life including someone experiencing an insightful moment. Maybe a means to inspire himself later on.

Then he examined the existing theories of insights. Like oh you get exposed to a lot of things, talk to a lot of people, and then something incubates in the back of your head and ah-hah! You have the solution!

Well, that's one theory, and it does not account for many insights people have about the systems in our lives.

Later on he talks about companies and how companies want to

  1. Reduce Errors
  2. Increase Insights

But no one knows how to increase insights, but we do know how to reduce errors. Introduce Process!

Processes

You've heard of QA right? Quality Assurance? That's a process introduced to keep things within expectations. It reduces errors. Usually QA is performed by another team or individual.

Well let me ask. Have you spent 120 hours documenting all your research in a word doc on some sharepoint server, a primary plan with an estimated scope, a few caveats or concerns and so on.. Presented it to an internal committee, and later presented a reduced version to a client who's paying up the McMansions..

I've lived that life: it was called Healthcare technology. That process did not leave me with any transferrable skills. I did not come out of that a better developer.

Although I have a shirt that says "2 weeks of programming can save you 2 hours of planning," I think that there's a balance to be had. One which was definitely not felt in Healthcare.

The book brings up this professional case up too. You're dealing with a client. If you deviate from the plan, you not only jeopardize the current timeline, the start of the new timeline will be delayed until it is approved. Large organizational processes stifle sudden change, for the good or bad.

And that's what Gary points at over and over.

Insights are sudden, unexpected ideas that change our understanding of something.

By focusing so much on reducing errors, we (not just may) will also smother insights that naturally occur. They are not communicated, or if they are communicated they are filtered out.

Example after example is presented in What Others Don't See.

Process can be good for maintaining lives in a known environment, while adding more process to an unknown environment seems to backfire.

Gary even goes on to say, you know if Boone had to save his kidnapped girl using a program, he would have missed all the vital signs that lead to his success. Boone played it by ear, he connected details and even dropped his plan twice along the way in favor of a riskier but fact-lead approach. Processes induce mindlessness. Do step A, then step B, then step C. Processes are just executing a program in meat space.

I do talk about creating process in my Approaching Projects series, one that helps you. I wrote that seven months ago to write down how I personally break things down. I cut steps or add steps or rearrange steps all the time for myself. Personal process should be guidelines. What I do changes day by day because I'm reacting to information I know and learn. I am not a program, I am not a process, and on June 28th: I was praised for how well my team is functioning and their capacity to deliver.

If you need something to happen mindlessly, then use a process. Otherwise use your mind.

heart
A good opportunity for a process is daily self-care items. Shower, vitamins, lift a weight a few times, walk outside, brush teeth. Taking care of your wellbeing physically will really help you emotionally / mentally too. And it fits in, processes matter for preserving life.

What Inspires though?

There's three paths Gary talks about to get inspiration.

  1. Identifying Contradictions
  2. Creative Desperation
  3. Connecting, Curiosity, Seeing Coincidences

Every path relies on something called an anchor, which is some observed fact or idea.

The sun comes up each day and goes down during the night: these are anchors. But did the sun go around the earth or did the earth go around the sun? Nicolaus Copernicus was ridiculed for Heliocentrism (which while wrong got us closer to a better understanding of the universe).

Noticing contradictions, little or big things that just don't fit with the established idea, can lead to rewriting the story around a weak anchor.

newspaper-ych
So why were people, companies, and governments using surface-cleaning solutions and UV to sterilize public surfaces multiple times a day for COVID-19? COVID-19 is a known human-to-human vapor transmissible disease. It does not make sense to cleans other vectors when the transmission vector is not addressed. Introducing uninformed processes to look good is not the solution.
I wish we actually cleaned our desks every week in public school or something. It was a once a quarter activity. Yes, 3 months of sweat and boogers on student desks. But teachers were only given one bottle of cleaning wipes that often.
eww

As for creative desperation, this is puzzle solving. You're stuck. You have a scene before you that requires you to bend your mind to come up with a solution.

I breathe this stuff–this is what fun programming is all about!

Do I have to maintain a unique set in memory as I work across 2 billion records? No, I can use a Bloom filter tuned to have low probability of collision at 10 billion records and call it a day!

Creative desperation often involves throwing out a weak anchor.

Lastly, that connection one! The whole synergy talky walky stuff where you see things and make connections. This one gets a lot of focus in the professional world.

Apparently Steve Jobs tried to force this on Pixar's campus by having only two bathrooms at the center auditorium. Surely by forcing people to mingle, they'll exchange ideas and have inspiration! But this was changed when pregnant women complained about having to walk for 15 minutes to use the toilet. Eye roll. You'll see the same stuff at Facebook's campus too I bet. But not with bathrooms.

Making connections isn't just about passively or forcefully being subjected to lots of stimuli. You need to be experienced, engaged, active, and interested to make connections.

woosh
You know what happens when I'm not actively engaged? I fill a water cup. Upside down.

We do not make connections to add new anchors to our understanding when we are not engaged.

So the trick here is to be more engaged, focus on the situation and the details, assume less! We do filter out our own insights, but if we unconsciously block them by assuming beliefs like "No way, Russia would never invade Ukraine again," we get blind sided. The signs were there. But we chose not to accept it suspiciously or actively.

Oh, that's another thing Gary talked about. Sometimes we discover insights by being suspicious. That isn't to say be suspicious of everything.

Just consider that if you're trying to chase after something, looking for reasons why that thing is wrong is not a bad thing inherently. Remember that Miami Surfside Condominium Collapse?

MSNBC's report featuring images of cracks at the condo building

Someone was suspicious of those problems; they were reported and documented. But another who had the capacity to act or induce action did not engage and that lead to a preventable disaster.

Balancing with Process

Well, we can't just abandon process, but we can ask for it to be reduced.

Consider. It sucks to be on the receiving end of this sure, but bear with me. What if one coke can tab was missing in one of 500 cases in a grocery store, and this was represented. 1 failure out of 500 right? That sucks! Well QA may be able to identify this and reduce it to 1 out of 5 billion with visual inspection. So we first pay humans (for 1 in 1000) to look at coke tab cans (I bet this would be a really dull fatiguing job) until we can have a robot do it for 1 in 5 billion. Do you have any idea how much that cost in human time and then R&D time? A lot more than just refunding defective 12 packs for a few years.

What's the impact of something slipping up?

Ask that before you throw in human process to patch over the problem.

An incident that brought a new process
Here's a process that was thrown in at my place. Someone caused unpredictable sporadic downtime by adding multiple logger implementations to our app.. by importing a spring boot dependency into a not-spring-boot project. We kept seeing things go down 6 days later (just before the next weekly deploy).
my-heart
The cause? It was writing 100 megabytes of logs per minute. This.. also lead to a $20k increase in AWS costs because that went to cloudwatch logs.
Manually bisecting all commits over the last three weeks on a canary instance would take about 40 minutes per deploy, then hours of runtime to see if it had the symptom. This cycle time made bisecting ineffective since it slowly became more problematic.
well-heck
i-am-here
It came down to me to identify the cause and commit. Five other senior engineers tried and failed. Normally as a manager I don't step in: my team figures it out and learns from it. But the problem became so severe (sporadic cluster wide crashes every 3 days) that it could not wait longer.
Afterwards I taught how I figured it out. I used JSPs and reflection to inspect classes' libraries loaded in production. These classes were not present locally.
teaching
the-more-you-know
The production build process fetched dependencies differently, and the spring boot dependency included spring boot, which then brought another logger. It just so happened that some of these libraries boot up and try several different loggers, and they prioritized the one spring boot brought in. Because that logger was not configured, all trace and debug lines were logged for our redis and apache requests library.
Removing the spring boot dependency and then including the intended dependency (which that spring boot dependency included) stopped all that logging immediately. Unfortunately we had to eat the AWS cost.
shut-ych

Now every dependency change on our primary product requires approval from my team. At first, the approval required some mvn command to dump the dependency tree and the prior commit to compare it. Now we have a nice github action thing that appends it to a pull request, along with a CODEOWNERS file on the project file to enforce review.

I think adding that process was warranted: that slip cost the business more money (in resources and engineering time, figuring out the problem and reverting the issue) than the process will incur over the next few years–all the while preventing future slips of the same nature!

However, when introducing a process, consider a bypass, an escape hatch. The escape hatch should not be used normally, only when the process gets in the way.

Another example!

Another incident where an escape hatch was used.
Normally when someone needs a granular permission, they must ask their manager to file a help desk ticket to document their authorization. A severity-0 incident started during the weekend and I was on Zoom with the head of IT.
beg
I needed a permission and he tried to grant it to me.
you-tried
Except somehow his permission to grant permissions was revoked.
yeet
The only one with permissions was a part-time person. They're not going to come online during the weekend. Abandon the process!
access-granted
Like the previous incident, I used JSPs and SSH'd into an admin instance. I granted him the permission to grant permissions and then myself the permission I needed. He verbally acknowledge permission to do this in the place of process, and then I went on my merry way to undo the damage someone did.
deus-ex-new
Whether that damage was accidental or malicious hasn't been revealed to me. But I was still bothered that someone effectively shut down all operations for two hours and didn't own up to it.

Communication

We learn from each other all the time, we exchange information and sometimes we might see a contradiction.

We communicate opinions and facts, though what we see as fact or opinion may differ. Belief in either is an "anchor" in Gary's vocabulary.

When opinions disagree, we should tolerate that. When facts disagree, we argue.

But "facts" may not be truthful or accurate in the real world.

Have you ever done a turn-around after one of your views got riddled with holes?

Another personal story

In wood working you might be tempted to slap on thick layers of paint. It'll take soooo long to do multiple layers. So much effort to do many layers.

My first bookshelf with blobby paint that got sanded off

Except this is what you get. Blobs of primer on my first bookshelf. I painted over that and then tried to sand it flat, not realizing the primer is what made it thick. White primer is hard to see depth on...

And sometimes even the brush fibers get sucked into the paint so you fish your fingers in the stuff to pull it out and then smooth it back over.

Then it takes hours or a full day to dry.

It really is more work to do thick coats and the quality suffers.


So I talked to a friend about this and they're Like
surprise
Yeah you want to do thin layers. If you can, try using an air gun.
Last time I used spray paint, it just pooled and spilled.
think
hooray
Take quick passes over instead of spraying head on and only a few sprays at a time. It'll dry faster.

So I switched my workflow, got a respirator, a workshop air filter, and an air gun, and I got to work.

A dirty workshop filter

My filter used to be white, but with how much black paint I used, it became quite grey.

A shiny view of my bookshelf

Following this guidance, it not only took me less time! The result is far more pleasing to look at and feel.

At the time it was barely above freezing, so I tried using a heat gun to dry the paint. As a happy accident, that created this scaly surface texture you see under the shiny coating.

One lesson I took away from this book: don't argue and contradict. Listen, actively listen and ask to see a demonstration of their view.

We smother each other all the time by asserting our view of reality on others.

XKDC Duty Calls
Someone is wrong on the internet.

Now if we only listened we wouldn't teach each other our views. It is fine to share your view, but it is not fine to smother and reject others all the time.

My Conclusions

Insights come randomly, they are not a formulaic output of some process.

peeking
Are we just processes in the grand scheme of things?
Nope! We are probabilistic individuals!
finger-guns

Individuals and organizations need to balance process with the impact of processes, especially in creative settings. Processes are needed, especially around life and property, and are not bad, but often escape hatches need to be made when novel situations arise.

We cannot create a program to facilitate insights, or force insights through fabricated environments. Programs are just automated processes that handle known inputs. Insights come from unknown or even seemingly irrelevant inputs. What we can do is change our culture to encourage sharing of insights and suggestions, and actively engage with one another.

For a year my team had a daily standup message with a bullet point What did you learn yesterday?. But with a re-org, the new standup lacks this prompt. It feels less vibrant when we exchange only facts about progress or blockers.
ceiling
hotdog
Let's do it again!
You bet!
yes

Hey team, let's do it again

Insights are a natural part of how we think and see the world while rigid beliefs make for less insight around those beliefs. Consider being more flexible and even suspicious of your own beliefs at times.

We jeopardize each other's insights by smothering and rejecting them. Instead, have the conflicting insight expounded, elaborated.

We see more when we are curious and engaged. Think about the consequences or causes of an event. How did I learn so much about computers before I was 15? I was always about how it worked and why it worked the way it did, rather than the static what worked. Childlike curiosity is natural, it should continue with us in this ever-changing world.

Also, I think we see more possibilities when we are less fatigued from distractions. Scoping what we're thinking about to what we want insights on may be a better approach than senseless noise and doom scrolling. Consider less time consuming in order to get more time creating.

Incident analysis benefits by having another party examine and conclude, having their own insights about events that went wrong. Those involved with the incident can tell the story of what happened, they may guess as to why it happened. But those participants may also be biased, stuck with whatever theory helped them get over the incident's impasse.

Lastly, If you like good incident analysis then check out the US Chemical Safety and Hazard Investigation Board videos. The CSB has a good pattern of delivering context, the event, what went wrong, why it went wrong, recommendations to avoid or prevent, and a recap of previous recommendations that were not implemented that could have lessened the impact.

Incompatible Chemicals: Explosion at AB Specialty Silicones by USCSB