TBM 308: Boundary of Safely Challenged Assumptions

Sep 01, 2024

I have five more spots open for my prioritization workshop on September 17th. I'd love to see you there. A review:

This was mega. Years of thinking and expertise condensed into 3 hours. It will take me the next few months to make sense of it all but some great tactical things to try out in the next few weeks, too. (Fi)

This post is Part 2 to a prior post about mission command, challenging your mental models, and the crisis of legitimacy many companies are experiencing lately. However, you don't need to read that post for this post to make sense. It should stand alone.

In software product development, we have idealized ways of working firmly planted in mission command, independent teams, etc. But we don't have an established vocabulary for when things "go wrong" (unless they go very wrong). This comment from engineering leader Spencer Pittman is the clearest articulation of the issue:

I think the pretext of a lot of software wisdom is that if things are breaking down then a more perfect execution of an ideal system must be the solution. For all our overtures toward blameless, systemic postmortems, I think, in reality, we lag far behind most other technical business domains in terms of a sophisticated conversation about human factors and humanity in the production chain. This is counterintuitive because software does a better job "talking nice" than, say, manufacturing, but I've found it significantly harder to challenge the system assumptions itself (vs the presumption that people in the system are "low performing") when things are operating poorly.

This comment echoed a friend who works at a world-famous manufacturing organization on "the software side of the house." He noted recently how:

The manufacturing side shows up every morning in safe, problem-solving mode. It is inspiring. Even with all those constraints, they're making things better every day. They adapt so quickly! On the software side, even with all of our freedom to iterate and adapt, it is an absolute mess. No one trusts each other. Everyone is pointing fingers and gaming metrics to stay under the radar. I understand that manufacturing and software development differ, but this doesn't fully explain the attitude difference. It is night and day.

Why does this difference exist? Why do we have a tougher time talking about human factors? Why is it hard to discuss "the system" and challenge assumptions?

Like the newsletter? Support TBM by upgrading your subscription. Get an invite to a Slack group where I answer questions.

Support the newsletter

The Work is Highly Contextual

To have a clear sense of what is going on—especially when something is going wrong—you need to be very close to the people and technology (sociotechnical) close to the work. Comparing teams isn't as helpful as we might think. When leadership teams have a "god view"—goal progress across teams, for example—the best they can hope for is a signal to explore the situation further, which will take a lot of time due to the context-full nature of the work. Acute issues become chronic issues, which makes it even harder to grasp any semblance of reality.

Visibility

"The shop floor (or gemba) is not visible in the same way it is in some other industries," writes Tiani Jones. "An exaggerated way I think about it is that [software development feels like] a magical world where people meet in teams and then go away and code appears." Even in the case of synchronous, in-person work, all you see is a group of people sitting at computer screens who occasionally get up and have a conversation (or get a snack or use the restroom). The work and "tools" are virtual. The backlog isn't a mountain of parts sitting on the shop floor alongside all the unfinished projects and open threads. To top it off, we do a lot of our best "work" while taking a walk or riding a bike—we're not even in "the office" (physical, virtual, or otherwise).

Cause and Effect

A software company has a multi-day (technology) outage, loses millions of dollars, its stock tanks, and brand loyalty suffers a major hit. What happens? In most companies, you'll see a flurry of activity—retrospectives, mitigation, preventative measures, and a plan to "make sure that never happens again." Ideally, these activities are "blameless" and explore human factors, incentives, and other system-oriented areas.

This is all to say that in software we're capable of more systematic approaches to responding to complex events. However, in most settings, you need something terrible with immediate negative effects to happen for people to pay attention at an org-wide scale. With "normal" operations, cause and effect are difficult, if not impossible, to tease out, and today's success is the output of countless inputs spread across years and even decades.

Consider how most product teams can barely systematize their local continuous improvement loop, let alone figure out how to deal with issues spanning teams and departments. "We stopped doing retrospectives altogether," mentioned a friend. "We've dealt with everything we can do locally; everything else is just a black hole. We've stopped raising any issues."

(Fun tidbit. I met a senior technology leader once who basically admitted to wanting “a not-too-serious outage” so that they could prioritize the work required to avoid “the big one.” They couldn’t imagine another scenario where the work would be prioritized.)

Mythology and Heroics

In software product development, we mythologize independence and small teams. "Give an engineer a problem to solve and get out of the way!" "Autonomous and empowered teams!" "A small product team can move mountains!" "Be the shit umbrella and protect your team so they can get actual work done!"

These ideas permeate many aspects of idealized approaches to work. I mentioned in a prior post how compelling mission command is to the average product maker. Dare I say, the work of Marty Cagan and others is firmly planted in this idealized view, based on the premise that courageous and heroic acts of leadership on all levels can somehow hold it all together. It's "crazy hard," but surmountable if we just lead hard enough and learn from the best.

While simultaneously being "all about people," we are surprisingly ignorant of human factors, ergonomics, humanity/sociology, and all the other helpful human-centric frames.

Stage of Evolution

Software product-making is a relatively new undertaking in the grand scheme of things, and the landscape is evolving quickly. Modern manufacturing takes cues from things we've learned over the last 100 years. Meanwhile, I now see "new" practices that emerged in B2B SaaS in 2010 hit larger enterprise product companies.

Discussion…

I have a theory. The idealized view is simultaneously:

Very effective when the stars align—" perfect execution of an ideal system."
Surprisingly fragile when things "go wrong" (you lose legitimacy, accumulate chronic issues, the business landscape changes, etc.)

When #2 happens, our only answer is to use the mental models we've evolved for #1, and the mental models/heuristics we've developed for #1 are not fit for purpose. Similarly, the mental models we've evolved based on manufacturing may have the right attitude (we want to capture that drive for continuous improvement). Still, they aren't immediately applicable due to the software context.

There are plenty of large enterprises with incredible experience in manufacturing that fail to translate this attitude effectively to their software divisions. There are also plenty of large enterprises that try to solve the software problem with more processes, consistency, and "rigor" and also fail to improve.

What we haven't quite figured out in software development is how to respond when the ideal slips into a period of incoherence and lack of visibility. The best we can muster is "better people."

I hit my timebox for the day, but to recap:

Mission command is an idealized structure
Mission command works when there is legitimacy and coherence
Mission command is more fragile than we think. When legitimacy slips, and coherence erodes, it breaks down.
We don't have a vocabulary for addressing the incoherence other than "fix the people" or "fix the process."

Boundary of Safely Challenged Assumptions

Which leaves me with a final thought...

What if we should be designing our software development organizations based on the premise that 1) shit happens, change happens, things change, and therefore 2) the maximum size of a team of teams should be the largest group of people that can actually challenge its own assumptions under challenging conditions (not ideal conditions).

Imagine a group of 150 people with a hierarchy of GM, VP, Director, Manager, and front-line contributor. While that structure might work in perfect times, it will be incredibly fragile if conditions change (especially if it also has dependencies outside the group). There is likely some sweet spot—think max 30 or 50 people, a team of teams—that can remain relatively (but not dogmatically) flat while also having a significant impact. It is possible to know everyone in this group size, and leaders should be sufficiently aware of the details. You can have a P&L and establish clear interfaces with other groups. Information travels, and it is difficult for bureaucracy and waste to be shoved under the carpet.

Think empowered groups.

One of the central problems is that we create idealized structures that only work under ideal conditions (like ZIRP, etc.), and then when things go wrong, we fall back on our go-to culprits.

Perhaps one answer is to plan for less ideal conditions and ensure that our organizational structure boundaries support the systematic introspection and adaptation required to thrive.

I’ll probably write a Part 3, or at least something thematically related in the near future. Thank you for taking these thought journeys with me.

Like the newsletter? Support TBM by upgrading your subscription. Get an invite to a Slack group where I answer questions.

Support the newsletter

Mapledurham

Sep 2

“No one trusts each other. Everyone is pointing fingers and gaming metrics to stay under the radar.” Am I wrong in thinking that this sounds like the (rather hellish) result of bad leadership higher up leading with fear rather than trust? I do think that this piece raises a number of interesting questions which go to an epistemic and fundamental issue right now: what is the relationship between business and humanity? Looking forward to part 3!

Expand full comment

1 reply by John Cutler

Igor Ferst

Sep 3

Thanks for sharing your thinking around this. Regarding differences between software development and manufacturing, I feel like another important one is the overwhelming possibility of the former. In software you can build pretty much anything, if you decide it’s worth expending the time and effort. Look too deep into the abyss of this type of unconstrained thinking, and you might go a little mad. But this kind of thing is much less likely when your thinking is constrained by standing in front of a big physical production line (which ties back into your point on visibility).

5 more comments...

The Beautiful Mess

Discussion about this post