Reasoning boxes

This story begins with the introduction of metacognition to large language models (LLMs). In the LLM days of yore (like a few months ago), we just saw them as things we could ask questions and get answers back. It was exciting. People wrote think pieces about the future of AI and all that jazz.

But then a few extra-curious folks (this is the paper that opened my eyes) realized that you could do something slightly different: instead of asking for an answer, we could ask for the reasoning that might lead to the answer.

Instead of “where do I buy comfortable shoes my size?”, we could inquire: “hey, I am going to give you a question, but don’t answer it. Instead, tell me how you would reason about arriving at the answer. Oh, and give me the list of steps that would lead to finding the answer. Here’s the question: where do I buy comfortable shoes my size?

Do you sense the shift? It’s like an instant leveling up, the reshaping of the landscape. Instead of remaining hidden in the nethers of the model, the reasoning about the question is now out in the open. We can look at this reasoning and do what we would do with any reasoning that’s legible to us: examine it for inconsistencies and decide for ourselves if this reasoning and the steps supplied will indeed lead us toward the answer. Such legibility of reasoning is a powerful thing.

With reasoning becoming observable, we iterate to constrain and shape it. We could tell the LLM to only use specific actions of our choice as steps in the reasoning. We could also specify particular means of reasoning to use, like taking multiple perspectives or providing a collection of lenses to rely on.

To kick it up another notch, we could ask an LLM to reason about its own reasoning. We could ask it “Alright, you came up with these steps to answer this question. What do you think? Will these work? What’s missing?” As long as we request to provide the reasoning back, we are still in the metacognitive territory.

We could also give it the outcomes of some of the actions it suggested as part of the original reasoning and ask it to reason about these outcomes. We could specify that we tried one of the steps and it didn’t work. Or maybe that it worked, but made it impossible for us to go to the next step – and ask it to reason about that.

From the question-answering box, we’ve upleveled to the reasoning box.

All reasoning boxes I’ve noticed appear to have this common structure. A reasoning box has three inputs: context, problem, and framing. The output is the actual reasoning. 

The context is the important information that we believe the box needs to have to reason. It could be the list of the tools we would like it to use for reasoning, the log of prior attempts at reasoning (aka memory), information produced by these previous attempts at reasoning, or any other significant stuff that helps the reasoning process.

The problem is the actual question or statement that we would like our box to reason about. It could be something like the shoe-shopper question above, or anything else we would want to reason about, from code to philosophical dilemmas.

The final input is the framing. The reasoning box needs rails on which to reason, and the framing provides these rails. This is currently the domain of prompt engineering, where we discern resonant cues in the massive epistemological tangle that is LLM that give to the reasoning box the perspective we’re looking for. It usually goes like “You are a friendly bot that …” or “Your task is to…”. Framing is sort of like a mind-seed for the reasoning box, defining the kind of reasoning output it will provide.

Given that most of the time we would want to examine the reasoning in some organized way, the framing usually also constrains the output to be easily parsed, be it a simple list, CSV, or JSON.

A reasoning box is certainly a neat device. But by itself, it’s just a fun little project. What makes reasoning boxes useful is connecting them to ground truth. Once we connect a reasoning box to a ground truth, we get the real sparkles. Ground truth gives us a way to build a feedback loop.

What is this ground truth? Well, it’s anything that can inform the reasoning box about the outcomes of its reasoning. For example, in our shoe example, a ground truth could be us informing the box of the successes or failures of actions the reasoning box supplied as part of its reasoning.

If we look at it as a device, a ground truth takes one input and produces one output. The input is the reasoning and the output is the outcomes of applying this reasoning. I am very careful not to call ground truth “the ground truth”, because what truths are significant may vary depending on the kinds of reasoning we seek.

For example, and as I implied earlier, a reasoning box itself is a perfectly acceptable ground truthing device. In other words, we could connect two reasoning boxes together, feeding one’s output into another’s context – and see what happens. That’s the basics of the structure behind AutoGPT.

Connecting a reasoning box to a real-life ground truth is what most AI Agents are. They are reasoning boxes whose reasoning is used by a ground truthing device to take actions, like searching the web or querying data sources – and then feeding the outcomes of these actions back into the reasoning boxes. The ground truth connection is what gives reasoning boxes agency.

And I wonder if there’s more to this story?

My intuition is that that the reasoning box and a ground truthing device are the two kinds of blocks we need to build what I call “socratic machines”: networks of reasoning boxes and ground truthing devices that are capable of independently producing self-consistent reasoning. That is, we can now build machines that can observe things around them, hypothesize, and despite all of the hallucinations that they may occasionally incur, arrive at well-reasoned conclusions about them.

The quality of these conclusions will depend very much on the type of ground truthing these machines have and the kind of framing they are equipped with. My guess is that socratic machines might even be able to detect ground truthing inconsistencies by reasoning about them, kind of like how our own minds are able to create the illusion of clear vision despite only receiving a bunch of semi-random blobs that our visual organs supply. And similarly, they might be able to discern, repair and enrich insufficient framings, similar to how our minds undergo vertical development.

This all sounds outlandish even to me, and I can already spot some asymptotes that this whole mess may bump into. However, it is already pretty clear that we are moving past the age of chatbots and into the age of reasoning boxes. Who knows, maybe the age of socratic machines is next to come? 

Porcelains

My friend Dion asked me to write this down. It’s a neat little pattern that I just recently uncovered, and it’s been delighting me for the last couple of days. I named it “porcelains”, partially as an homage to spiritually similar git porcelains, partially because I just love the darned word. Porcelains! ✨ So sparkly.

The pattern goes like this. When we build our own cool thing on top of an existing developer surface, we nearly always do the wrapping thing: we take the layer that we’re building on top and wrap our code around it. In doing so, we immediately create another, higher layer. Now, the consumers of our thing are one layer up from the layer from which we started. This wrapping move is very intuitive and something that I used to do without thinking.

  // my API which wraps over the underlying layer.
  const callMyCoolService = async (payload) => {
    const myCoolServiceUrl = "example.com/mycoolservice";
    return await // the underlying layer that I wrap: `fetch`
    (
      await fetch(url, {
        method: "POST",
        body: JSON.stringify(payload),
      })
    ).json();
  };
  // ...
  // at the consuming call site:
  const result = await callMyCoolService({ foo: "bar" });
  console.log(result);

However, as a result of creating this layer, I now become responsible for a bunch of things. First, I need to ensure that the layer doesn’t have too much opinion and doesn’t accrue its cost for developers. Second, I need to ensure that the layer doesn’t have gaps. Third, I need to carefully navigate the cheesecake or baklava tension and be cognizant of the layer thickness. All of a sudden, I am burdened with all of the concerns of the layer maintainer.

It’s alright if that’s what I am setting out to do. But if I just want to add some utility to an existing layer, this feels like way too much. How might we lower this burden?

This is where porcelains come in. The porcelain pattern refers to only adding code to supplement the lower layer functionality, rather than wrapping it in a new layer. It’s kind of like – instead of adding new plumbing, put a purpose-designed porcelain fixture next to it.

Consider the code snippet above. The fetch API is pretty comprehensive and – let’s admit it – elegantly designed API. It comes with all kinds of bells and whistles, from signaling to streaming support. So why wrap it?

What if instead, we write our code like this:

  // my API which only supplies a well-formatted Request.
  const myCoolServiceRequest = (payload) =>
    Request("example.com/mycoolservice", {
      method: "POST",
      body: JSON.stringify(payload),
    });
  // ...
  // at the consuming call site:
  const result = await (
    await fetch(myCoolServiceRequest({ foo: "bar" }))
  ).json();
  console.log(result);

Sure, the call site is a bit more verbose, but check this out: we are now very clear what underlying API is being used and how. There is no doubt that fetch is being used. And our linter will tell us if we’re using it improperly.

We have more flexibility in how the results of the API could be consumed. For example, if I don’t actually want to parse the text of the API (like, if I just want to turn around and send it along to another endpoint), I don’t have to re-parse it.

Instead of adding a new layer of plumbing, we just installed a porcelain that makes it more shiny for a particular use case.

Because they don’t call into the lower layer, porcelains are a lot more testable. The snippet above is very easy to interrogate for validity, without having to mock/fake the server endpoint. And we know that fetch will do its job well (we’re all in big trouble otherwise).

There’s also a really fun mix-and-match quality to porcelain. For instance, if I want to add support for streaming responses to my service, I don’t need to create a separate endpoint or have tortured optional arguments. I just roll out a different porcelain:

  // Same porcelain as above.
  const myCoolServiceRequest = (payload) =>
    Request("example.com/mycoolservice", {
      method: "POST",
      body: JSON.stringify(payload),
    });
  // New streaming porcelain.
  class MyServiceStreamer {
    writable;
    readable;
    // TODO: Implement this porcelain.
  }
  // ...
  // at the consuming call site:
  const result = await fetch(
    myCoolServiceRequest({ foo: "bar", streaming: true })
  ).body.pipeThrough(new MyServiceStreamer());

  for await (const chunk of result) {
    process.stdout.write(chunk);
  }
  process.stdout.write("\n");

I am using all of the standard Fetch API plumbing – except with my shiny porcelains, they are now specialized to my needs.

The biggest con of the porcelain pattern is that the plumbing is now exposed: all the bits that we typically tuck so neatly under succinct and elegant API call signatures are kind of hanging out.

This might put some API designers off. I completely understand. I’ve been of the same persuasion for a while. It’s just that I’ve seen the users of my simple APIs spend a bunch of time prying those beautiful covers and tiles open just to get to do something I didn’t expect them to do. So maybe exposed plumbing is a feature, not a bug?

Innovation frontier

So we decided to innovate. Great! Where do we begin? How do we structure our innovation portfolio? There are so many possibilities! AI is definitely hot right now. But so are advances in green technology – maybe that’s our ticket? I heard there’s stuff happening with biotech, too. And I bet there are some face-melting breakthroughs in metallurgy…

With so much happening everywhere all at once, it could be challenging to orient ourselves and innovate intentionally – or at least with enough intention to convince ourselves that we’re not placing random bets. A better question: what spaces do we not invest into when innovating?

Here’s a super-simple framing that I’d found useful in choosing the space to innovate. It looks like a three-step process.

First, we need to know what our embodied strategy is. We need to understand what our capabilities are and where they will be taking us by default.

This is important, because some innovation may just happen as a result of us letting our embodied strategy play out. If we are an organization whose embodied strategy is strongly oriented toward writing efficient C++ code, then we are very likely to keep seeing amazing bits of innovation pop out in that particular space. We will likely lead some neat C++ standards initiatives and invent new cool ways to squeeze a few more drops of performance out of the code we write.

As I mentioned before, embodied strategy is usually not the same as stated strategies. I know very few teams who are brutally honest with themselves about what they are about. There’s usually plenty of daylight between what the organization states about where they’re going and where they are actually going. The challenge of step 1 is to pierce the veil of the stated strategy.

As you may remember from my previous essays, this understanding will also include knowing our strategy aperture. How broad is our organization’s cone of embodied strategy?

At the end of the first step, we already have some insight on the question above. Spaces well outside of our cone of embodied strategy are not reachable for us. They are the first to put into the discards pile. If we are an organization whose strengths are firmly in software engineering, attempting to innovate in hardware is mostly like throwing money away – unless of course we first grow our hardware engineering competency.

The second step is to understand our innovation frontier. The innovation frontier is a thin layer around our cone of embodied strategy. Innovation ideas at the outer edge of this frontier are the ones we’ve just discarded as unreachable. Ideas at the inner edge of the frontier are obviously going to happen anyway: they are part of the team’s embodied strategy.

It is the ideas within this frontier that are worth paying closer attention to. They are the “likely-to-miss” opportunities. Because they are still on the fringe of the embodied strategy, the organization is capable of realizing them, but is unlikely to do so – they are on the fringe, after all.

It is these opportunities that are likely going to sting a lot for a team when missed. They are the ones that were clearly within reach, but were ignored because of the pressing fires and general everyday minutiae of running core business. They are the ones that will disrupt the business as usual, because – when they are big enough – they will definitely reshape the future opportunities for the organization.

The innovation frontier is likely razor-thin for well-optimized and specialized organizations. The more narrow our strategy aperture, the less likely we will be to shift a bit to explore curious objects just outside of our main field of view.

In such cases, the best thing the leader of such an organization can do is to invest seriously into expanding their innovation frontier. Intentionally create spaces where thinking can happen at a slower pace, where wilder ideas can be prototyped and shared in a more dandelion environment. Be intentional about keeping the scope roughly within the innovation frontier, but add some fuzziness and slack to where these boundaries are.

The third step is to rearrange the old 70/20/10 formula and balance our innovation portfolio according to what we’ve learned above:

  • Put 70% into the ideas within the innovation frontier and the efforts to expand our innovation frontier.
  • Put 20% into the ideas that are within the strategy aperture.
  • Just in case we’re wrong about our understanding of our embodied strategy, put 10% into the ideas that are at the outer edge of the innovation frontier.

And who knows, my law of tightening strategy aperture could be proven wrong? Perhaps if an organization is intentional enough about expanding its innovation frontier, it could regain its ability to see and realize the opportunities that would have been previously unattainable?

Wait, did we forgo the whole notion of timelines in our innovation portfolio calculations? It’s still there, since the cone of embodied strategy does extend in time. It’s just not as significant as it was in the old formula. Why? That’s a whole different story and luckily, my friend Alex wrote this story down just a few days ago.

Learning from the nadir

I’ve talked before about traps: how to get into them, what they might look like, and even how to get out of them. This little story is about the developmental potential of traps.

To frame the story, I will draw another two-by-two. The axes signify our awareness of our limitations and our capacity to overcome them. To make things a bit more interesting, I will also turn this two-by-two 45 degrees clockwise, because I want to map it to another framing: the hero’s journey.

The axes form four quadrants that loosely correspond to the key segments of the hero’s journey. 

In the top quadrant, we are the happiest and perhaps even bored. We aren’t aware of any of the limitations that hinder us and we feel generally content with our abilities.

It is that boredom that gets us in trouble. At first reluctantly, but eventually with more gusto, we engage with a challenge and traverse into the right quadrant. This quadrant is characterized by weirdness. Campbell points out all kinds of odd stuff happening to us, from being visited by a quirky wizard to being tested in ways that make us unsure of ourselves.

In the right quadrant, we aren’t yet aware that the challenge exceeds our capacity to overcome it, so things feel bizarre, random, and generally not right. We might start a new job with enthusiasm, and after a few meetings, have our heads spinning, encountering unexpected politics and/or gargantuan technical debt: “What did I just get myself into?”

Eventually, as we puzzle things out, we arrive at the nadir of our journey, the bottom quadrant. We become aware of the fact that we’re in way over our heads. We are aware of our limitations and do not yet have the capacity to overcome them.

The bottom quadrant often feels like a trap. My colleagues and I sometimes apply the word “infohazard” to the insightful bits of knowledge that finally clear our vision and thrust us into this quadrant. It almost feels like it might have been better if we didn’t acquire that knowledge. Yeah, the previous quadrant was super-weird, but at least I didn’t feel so deficient in the face of the challenge.

This quadrant is also the most fertile ground for our inner development. When we have the right mindset, the awareness of our limitations creates a useful observation perch. Only when we are able to see our own limitations can we contemplate changing ourselves to overcome them.

This is not a given. Way too commonly and tragically, we never get to occupy this perch. Falling into the vicious cycle of not-learning, we form an inner false loop inside of our hero’s journey, spinning round and round inside of the bottom quadrant, and truly becoming trapped.

Whether we grasp onto the perch or not, one thing is guaranteed. The bottom quadrant is full of suffering. Even when we believe we’ve learned all there is to learn about self-development, and have all kinds of tools and tricks (and perhaps even write about it regularly) – the moment of discordance between what we’re seeing and what we believe will be inevitably painful.

It is on us to recognize that this pain can be processed in two ways: one is through the habitual entrapment of the not-learning cycle and the other one is by choosing to learn like crazy. Hanging on to a dear life to the perch of observation and examining our beliefs and recognizing flexibility in bits that we previously thought immovable.

Only then can we emerge into the left quadrant, where we are both aware of our limitations, but now have the capacity to overcome the challenge – and bring the boon back to the land of living, as Campbell would probably say.

How we engage with LLMs

It seems popular to write about generative AI and large language models (aka LLMs) these days. There are a variety of ways in which people make sense out of this space and the whole phenomenon of “artificial intelligence” – I use double-quotes here, because the term has gotten quite blurry semantically.

I’ve been looking for a way to make sense of all of these bubbling insights, and here’s a sketch of a framework that is based on the Adult Development Theory (ADT). The framework presumes that we engage with LLMs from different parts of our whole Selves, with some parts being at earlier stages of development and some parts at the later. I call these parts “Minds”, since to us, they feel like our own minds, each with its own level of complexity and attributes. They change rapidly within us, often without us noticing.

These minds are loosely based on the ADT stages: the earliest and least complex Opportunist Mind, the glue-of-society Socialized Mind, the make-things-work Expert Mind, and the introspective Achiever Mind.

🥇The Opportunist Mind

When we engage with an LLM with an Opportunist Mind, we are mostly interested in poking at it and figuring out where its weaknesses and strengths lie. We are trying to trick it, to reveal its secrets, be that initial prompts or biases. From this stance, we just want to figure out what it’s made of and how we could potentially exploit it. Twitter is abuzz with individuals making LLMs act in ways that are beneficial to illustrating their arguments. All of those are symptoms of the Opportunist Mind approach to this particular technology.

There’s nothing wrong with engaging an LLM in this way. After all, vigorous product testing makes for a better product. Just beware that an Opportunist Mind perch has a very limited view, and the quality of insights gained from it is generally low. I typically steer clear from expert analyses engaging with LLMs from this mind. Those might as well be generated by LLMs themselves.

👥The Socialized Mind

When the LLM becomes our DM buddy or a game playing partner, we are engaging with an LLM with a Socialized Mind. When I do that, there’s often a threshold moment when I start seeing an LLM as another human being, with thoughts and wishes. I find myself falling into habits of human relationship-building, with all of the rules and ceremonies of socializing. If you ever find yourself trying to “be nice” to an LLM chat bot, it’s probably your Socialized Mind talking.

At the core of this stance is — consciously or subconsciously — constructing a mental model of an LLM as that of a person. This kind of mental model is not unique to the Socialized Mind, but when engaging with this mind, we want to relate to this perception of a human, to build a connection with it.

This can be wonderful when held lightly. Pouring our hearts to a good listener convincingly played by an LLM can be rather satisfying. However, if we forget that our mental model is an illusion, we get into all sorts of trouble. Nowadays, LLMs are pretty good at pretending to be human, and the illusion of a human-like individual behind the words can be hard to shake off. And so we become vulnerable to the traps of “is it conscious/alive or not?” conversations. Any press publication or expert analysis in this vein is only mildly interesting to me, since the perch of the Socialized Mind is not much higher than that of the Opportunist Mind, and precludes seeing the larger picture.

🧰The Expert Mind

Our Expert Mind engages with an LLM at a utilitarian level. What can I get out of this thing? Can I figure out how the gears click on the inside — and then make it do my bidding? A very common signal of us engaging LLMs with our Expert Mind is asking for JSON output. When that’s the case, it is very likely we see the LLM as a cog in some larger machine of making. We spend a lot of time making the cog behave just right – and are upset when it doesn’t. A delightful example that I recently stumbled into is the AI Functions: a way to make an LLM pretend to execute a pretend function (specified only as input/output and a rough description of what it should do) and return its result.

Expert Minds are tinkerers – they produce actual prototypes of things other people can try and get inspired to do more tinkering. For this reason, I see Expert Mind engagements as the fertile ground for dandelion-like exploration of new idea spaces. Because they produce artifacts, I am very interested in observing Expert Mind engagements. These usually come as links to tiny Github repos and tweets of screen captures. They are the probes that map out the yet-unseen and shifting landscape, serving as data for broader insights.

📝The Achiever Mind

I wanted to finish my little story here, but there’s something very interesting in what looks like a potential Achiever Mind engagement. This kind of engagement includes the tinkering spirit of the Expert Mind and enriches it with the mental modeling of the Socialized Mind, transcending both into something more.

When we approach LLMs with the Achiever Mind, we recognize that the nature of this weird epistemological tangle created by an LLM creates opportunities that we can’t even properly frame yet. We can get even more interesting outcomes than the direct instruction-to-JSON approach of our Expert Mind engagement by considering this tangle and poking at it.

The ReAct paper shone the light at this kind of engagement for me. It revealed that, in addition to direct “do this, do that” requests, LLMs are capable of something that looks like metacognition: the ability to analyze the request and come up with a list of steps to satisfy the request. This discovery took someone looking at the same thing that everyone was looking at, and then carefully reframing what they are seeing into something entirely different.

Reframing is Achiever Mind’s superpower, and it comes in handy in wild new spaces like LLM applications. Metaphorically, if Expert Mind engagements explore the room in the house, Achiever Mind engagements find and unlock doors to new rooms. The unlocking of the room done by ReAct paper allowed a whole bunch of useful artifacts, from LangChain to Fixie to ChatGPT plugins to emerge. 

This story feels a bit incomplete, but has been useful for me to write down. I needed a way to clarify why I intuitively gravitate toward some bits of insight in the wild more than others. This framework helped me see that. I hope it does the same for you.

Sailors and Pirates

Here’s a fun metaphor for you. I’ve been chatting with colleagues about the behavior patterns and habits of leaders that I’ve been observing, and we recognized that there are two loose groups that we can see: sailors and pirates.

The sailors are part of the crew. They are following orders and making things that have been deemed important happen. Ordinary sailors have little agency: they are part of the larger machine that is intent on moving in a certain direction. Sailors higher in the power structure (and there is usually a power structure when sailors get together) have more agency. They have more freedom in how the things happen, but they are still held responsible for whether they happen or not.

Organization leaders who are sailors are subject to the primary anxiety of things being out of control. Their catastrophic scenario is that all this wonderful energy that they have in the people they lead is not applied effectively to the problem at hand. They wake up in cold sweat after dreaming of being lost or late, of being disoriented and bewildered in some chaotic mess. 

This makes them fairly easy to spot. Listen to how they talk. They will nearly always speak of the need to align, to make better decisions, to be more efficient and better coordinated. Sailor leaders love organizing things. For a sailor leader, neat is good.

Every organization needs sailors. Particularly in scenarios where we know where we are going, sailors are who will get you there.  They are the reliable folks who feel pride and honor to drive their particular ship (or part of the ship, no matter how small) toward the destination. Sailor leaders don’t have to be boring, but they prefer it that way. Excitement is best confined to the box where it doesn’t disrupt the forward movement.

Pirates are different. The word “pirate” conjures all kinds of imagery, some vividly negative. For our purposes, let’s take Jack Sparrow as the kind of pirate we’re talking about here.

As I mentioned, pirates are different. They loathe the orderly environment that the sailors thrive in. They yearn for a small ship that can move fast and make unexpected lateral moves.

Pirate’s driving anxiety is that of confinement. Whether consciously or not, their catastrophizing always involves being stuck. Their nightmares are filled with visions of being trapped or restrained, with no possibility of escape, of being pressed down by immovable weight.

Pirates seek options and choose to play in environments where the options are many. This is why we often find them in chaotic environments, though chaos is not something they may seek directly. It’s just that when there’s chaos, many of the variables that were previously thought to be constant become changeable. It’s that space that is opened up by the chaos-induced shifts that the pirates thrive in. And sometimes, often unwittingly, they will keep causing a little chaos – or a lot of it – to create that option space.

Pirate leaders are also not difficult to detect. They are usually the weird ones. They keep resisting the organization’s desire to be organized. They usually shun positions of power and upward movement in the hierarchies. For the saddest pirate is the one who climbed through the ranks to arrive at a highly prestigious, yet extremely sailor position.

Pirate leaders are known to inject chaos. If you’ve ever been to a meticulously planned and organized meeting, where its key participant throws the script away right at the beginning and takes it in a completely different direction – you’ve met a pirate leader.

It’s easy to see how sailors and pirates are oil and water. Sailors despise the pirate’s incessant bucking of the system. Pirates hate the rigid order of the sailors and their desire to reduce the available options. 

Then, why are pirates even found in organizations? Aren’t they better off in their Flying Dutchman somewhere, doing their pirate things?

The thing is, pirates need sailors. A shipful of pirates is not really a ship. With everyone seeking options, the thing ain’t going anywhere. Pirates need sailors who are happy to organize the boring details of the pirate adventure. And the more ambitious the adventure, the more sailors are needed.

Conversely, sailors need pirates. A ship that doesn’t have a single pirate isn’t a ship either – it’s an island. The most organized and neat state of a ship is static equilibrium. When a pirate captain leaves a ship, and no pirate steps up, the ship may look functional for a while, and even look nicer, all of the cannons shining of bright polish and sails finally washed and repaired.

But over time, it will become apparent that the reason for all these excellent looks is the lack of actual action. The safest, neatest course of action is to stay in place and preserve the glorious legends of the past.

The mutual disdain, combined with the mutual need creates a powerful tension. Every team and organization has it. The tension can only be resolved dynamically – what could have been the right proportion of pirates and sailors yesterday might not be the same today. Sometimes we could use fewer pirates, and other times, we need more of them.

To resolve this tension well, organizations need this interesting flexibility, where pirates and sailors aren’t identities, but roles. Especially in leadership, the ability to play both roles well is a valuable skill. Being able to assume the role flexibly depending on the situation gives us the capacity to be both pirates and sailors – and gives the organization a much higher chance of acting in accordance with its intentions.

The most effective pirate is a meta-pirate: someone who can be both a pirate and a sailor in the moment as a way to keep the opportunity space maximally open.

We all have this capacity. The reason I described the nightmare plots for the sailor and the pirate is to help you recognize them in your own dreams. Experienced both kinds? You are likely both a little bit of a pirate and a sailor at heart. If one is more common than the other, that’s probably the indicator of where you are leaning currently. So, if you’re looking to become a meta-pirate, that’s an indicator of where to focus the work of detaching the role from your identity. 

Nudges, boosts, bumpers, and tilts

The quartet is finally complete. I’ve written about nudges separately, and already had bumpers, boosts, and tilts grouped. Now they are all united into one simple framework that helps me better understand the environment in which I am operating and reason about imparting change on this environment.

To better understand an environment, we evaluate two of its properties: degree of inter-part alignment and degree of inter-part friction.

The inter-part alignment tells us how much the parts are aligned with each other. The inter-part alignment is not a static property, but rather an evaluation of how much the various moving parts of the environment choose to act. Since I mostly work with teams, these parts are typically people. The inter-part alignment is our guess at how much the people within the team are aligned along similar goals and objectives.

The inter-part friction tells us how much attachment the parts have to each other. If they are strongly attached, there’s a lot of friction between them. If they aren’t, there’s nearly none. Highly interdependent environments are usually the high-friction ones. In organizations, this friction is experienced as the degree to which people feel constrained by other people in their actions.

Using these two properties as axes, we can draw a nice – what else? – two-by-two. For each quadrant in this space, there’s a technique that, when applied, is most likely to result in desired outcome. Let’s walk around the quadrants, clockwise, starting from top-right.

One thing to remember: teams are almost never easily assessed as exemplars of one particular environment. They flex and shift, changing from one kind to another. Environments are also scenario-specific. The same team might look like one environment for a particular problem, and a completely different environment for another problem.

🚗 High alignment, high friction

In the top-right quadrant of two-by-two reside the high alignment and high inter-part friction environments. A good metaphor for this environment is a car that’s stuck in a ditch. It wants to go, but does not have the power to overcome the friction of the ditch.

For an organization, this might be the scenario where everyone agrees on the importance of the problem and there’s agreement on the solution, but the mustering of the actual resources to do the work is an issue.

This is where a boost is the most effective technique. Everyone lends their shoulder and leans in to get the car out of the ditch. Boosts are fairly straightforward, since they are a prioritization exercise. Just decide what is most important and go for it.

🌊 High alignment, low friction

In the bottom-right quadrant, we find environments that have high alignment and low inter-part friction. These will feel like water: parts move freely, but also simply seek static equilibrium. These are the “just tell me what to do” scenarios.

A good example of this scenario is an organization that needs direction on some specific issue that is widely recognized as existentially critical, but does not have strong opinions on the solution.

In my experience, these situations are rare and I usually see them in the realm of complex policy decisions that an organization needs to make, with only a handful of recognized experts who actually know how to make them.

The technique to use in high-alignment, low-friction situations is a bumper. Just draw the line not to be crossed, clearly articulate the consequences of crossing it, and firmly enforce it. Just like water is content to be in a glass, high-alignment, low-friction environments will be happy to abide by our bumpers.

🐈 Low alignment, low friction

Moving on to the bottom-left quadrant, there are environments with low alignment and low friction. These are the “herding cats” environments. There is literally nothing one can do through direct influence – no matter what we try, the energy from our actions seems to be rapidly absorbed into the whole without any discernible change. I wrote a little essay called “The Fractal Dragon” to describe this environment using a more dramatic metaphor.

In organizations, low-alignment, low-friction environments are typically full of smart, yet self-interested people. If we’re very technical about these teams, they are not technically teams, but rather markets. Open source projects and standards bodies can be good examples, though I’ve seen actual engineering teams that have similar structures. A good marker here is a loose relationship between funding and fitness function.

In a low-alignment, low-friction scenario, the go-to technique is tilts. Changes can only be made by carefully crafting incentives to tilt in the direction that we want and be patient with the organization eventually following the slope of the tilt. Despite the temptation to “do something”, it is important to recognize that any attempt at direct action is likely a waste of energy – or worse, an introduction for further chaos into the environment, with all of the unintended consequences that follow.

An effectively executed tilt has a nice positive side effect: it acts as an alignment function. When it’s working, the slope of the tilt also aligns the environment. Everything is roughly moving in the same direction. 

🧱 Low alignment, high friction

In the remaining top-right quadrant, we have environments that have low alignment and high friction. A good example of such an environment is an organization that is stuck in a dynamic equilibrium: parts of it (be that individuals or teams) are deadlocked, none willing to budge. A significant amount of energy is expended on all sides, yet there is no progress. This can happen for multiple reasons. One team might have strong incentives to not let go of a project. Another team might be firefighting, unable to spare any attention to unblock others. Yet another team could have strong, uncompromising opinions about the way things need to be done.

In such a situation, the appropriate technique is nudging: looking for key leverage points that are most susceptible to change. These are typically going to be found in critical juncture points that don’t have a single party to resolve them. I have done a bunch of nudging like this. Sometimes a single meeting is enough to move things along. Or as I described in “Nudging Embodied Strategy”, setting up a chat room.

Low-alignment, high-friction environments can be rather frustrating to work with. It may take several nudges – and a lot of trial and error –  to make a change that would have been effortless in any other environment. Keen observation of the overall environment and spotting the leverage points is the key here. Come equipped with a lot of patience and time.

🧭 Yet another compass

At least the way I see it, for each of the quadrants, there seems to be a fairly clear match of a technique. Mismatching technique to the quadrant tends to lead to unproductive outcomes. 

For instance,  when we find our bumper technique being ineffective, we might have confused our quadrants. In this case, we’re likely in a low-alignment, low-friction scenario where a tilt will be much more effective. This is a common error in designing processes. Folks assume that they are in the bottom-right quadrant and devise a clear policy, only to see the team find ways to dull the policy with workarounds and generally make a mockery of it. Bumpers only work when there’s high alignment.

Similarly, if we find that our boosting and prioritization is ineffective or has at best temporary effects, we might be experiencing another instance of quadrant confusion. We are likely in a low-alignment, high-friction environment. The problem with our car is not that it’s in a ditch. It’s that it isn’t sure it wants to leave it. I wrote about my personal experience with such a confusion in “Behavior over time graphs and ways to influence”.

This matching (and mismatching) works both ways. The effectiveness of techniques tells us about the nature of the environment we’re working with – and can help us better understand it. If we start with a tilt, predicting a low-alignment, low-friction environment, and see little evidence of the tilt working, we might reconsider – perhaps the friction is not as high as we thought it was? Or maybe the alignment is higher than we surmised? By guessing the quadrant and applying the technique, we get more information about the environment we’re operating in, giving us insights for future actions.

In this way, this little framework of mine might be used as a navigation tool for us inspired process engineers. Give it a try and tell me how it works for you.

Receiving negative feedback

Negative feedback is rarely fun, but can show up in varying degrees of intensity. Sometimes it might seem like a nice suggestion to improve, and some might feel like a crushing blow. It is the latter kind that I am focusing on in this essay.

It can be rather discomforting to hear someone talk poorly of us – or things we’ve built. However, not receiving negative feedback leads to self-delusion, so we’re much better off with it than without it.

Our body’s intuition will fight us on this conclusion tooth and nail, constantly trying to convert us to the notion that negative feedback is just a bad thing and something we could live without. But we really can’t – and much of the modern human’s struggle could be framed as learning how to receive negative feedback in a productive way.

Here are some framings that I’ve learned that help me get the most of this uncomfortable experience. To make them a bit more concrete, I will situate these framings in a hypothetical scenario of a team receiving strong negative feedback on the launch of their product. However, most are applicable to a broad range of scenarios. 

🌀 Their feedback, our emotions

So we have some stinging feedback from our users. The first important thing about receiving strongly negative feedback is to recognize that this is a phenomenon that involves our emotional response. This response is a process inside of us. It is highly individual and varies greatly from one person to another.


You may have seen some people being more affected by negative feedback than others. One person may describe a negative tweet as “poisonous vitriol” where the other may chuckle at the pun the tweet makes. The intensity of the negative feedback is measured by our own emotional response to it.

We are not our emotions. We have them. Too often, it seems like they have us. If we want to learn from negative feedback, we need to learn to separate our emotions from ourselves. This is why we must learn to separate our individual response to negative feedback from the feedback itself.

⛓️ Why does this happen?

Why do we have strong emotional responses to negative feedback? I really like the framing of attachment here, specifically the attachment of identity. 

When we make something, we put ourselves into it. This tiny bit of ourselves could be our engineering pride, attached to our identity of “a good software engineer”. Or it could be our sense of fulfillment, attached to our identity of “someone who does good in the world”.  When people criticize our product, it will feel like they challenge these bits of identity that we embedded in our products. Behind a strong emotional response, there are nagging questions like: “Perhaps I am not as good of a software engineer as I thought I was? Am I really someone who does good in the world?” Whether we want to admit it or not, these questions were already there in our minds. The additional evidence of our failings often puts these front and center.

Being passionate about things we make is an important part of creating good products, but this passion comes with a shadow of identity attachment: when our users disagree with us about the goodness of what we’ve made, this passion will do a number on our emotions.

When we have a strong emotional response to negative feedback, we can’t process this feedback productively. Instead, we will react to our emotions. And that is often worse than not receiving feedback at all.

🧭 Orienting amidst emotional response

How do we separate ourselves from our emotional response when receiving negative feedback? This can be quite challenging. Especially when our response is strong, we often can’t even see that we’re subject to them. Our emotions become us. What we need is some sort of orienting tool that helps us step back and reflect on the soup of emotions we’re feeling.

Thankfully, Dr. Karen Horney has a great framework for this occasion: the moving toward, against, or away. We can use it as a compass to identify where our emotional response is currently taking us  — and hopefully, pause and reorient. 

Think of it as three paths we instinctively take when reacting to our emotions in response to negative feedback. Each has a certain story that plays unconsciously in our minds as we go down this path. Each story takes us away from processing negative feedback and applying insights from it to our future actions.

Our ability to separate ourselves from our emotions lies in being able to spot the general plot lines of these stories. I will list each path with its typical story and the extremes these stories take us to, if unchecked.

💨 Away

The first path we commonly take is the “away” path. The plot line here is that of avoidance and usually goes like this: “yikes, everybody is upset, maybe I should just go back to doing my thing and never ever listen to people criticize me or my work. This is fine.” The avoidance story is likely the most common of the paths. It can feel like the easiest way to deal with negative feedback – just avoid ever hearing it. While going down that path, individual engineers and entire engineering organizations can go into great lengths to reduce any exposure to actual users. Avoidance increases the disconnect between the reality inside of the team and the reality outside, which tends to end up in a total breakdown, when the difference is too great to ignore.

⚔️ Against

The second path is the “against” path. Along this road, there’s the adversarial story of being attacked – along with the urge to fight back. “These jerks, they don’t get how hard this was to get shipped” or “These people are just trolling us, they are here to cause chaos.” Somewhere, somehow, we cross the line from thinking of our users as those who we want to delight to the mob that’s out to get us. Compared to the stories of avoidance, antagonizing stories are also quite common, and they are usually easy to spot: look for a brawl, be it in meetings or chats or docs comments. Unlike the “away” path, this one feels like action. It gets the adrenaline pumping, and creates the sense of doing something. Unfortunately, it is just as unproductive. Because we are merely reacting to our emotions, we’re not actually listening to the feedback. We’re fighting the vandals who dared to deface our sacred grounds. Since the actual vandals are hard to pinpoint, there is usually a tendency for a team to splinter into fiefdoms, each with their own conception of “the enemy”.

🏳️ Toward

The third and final path is when we move “toward”. When we go down this road, the story that plays in our minds goes something like this: “omg, we’re so screwed, that twitter guy is totally right, we are losers, and everything we do is horrible”. This story sounds most similar to actually receiving feedback, though it’s anything but. This story of accommodation is a pure reaction to the avalanche of self-doubt that overwhelmed all of our senses, forcing us to grasp on for any strong opinion, even if it’s that of a random Twitter guy. Many, many bad decisions were made as the result of following this path. They all have a flavor of “we should do this (or not do this) because someone else thinks it’s the right (wrong) thing to do”. In teams, the most visible effect of the story of accommodation is a rising feeling of collective self-loathing. Vibrant and unique team cultures disintegrate in the loss of self-confidence and the disorientation that follows.

🦠 The viral load

Have I mentioned that these stories also happen to be super-contagious? There’s something about our human psychology that makes us particularly susceptible to these three stories. We might prefer one over the other, but there’s definitely one that will catch us. Some of us might be reasonably immune to the “toward” or “away” paths, but oh boy, even a hint of the antagonizing story can cause us to jump, knuckles up. Some of us might be highly avoidant, and have all kinds of tricks to prevent feedback from ever reaching us. Some of us might immediately move to accommodate, revealing lack of confidence in our own decisions.

Upon receiving negative feedback about our product, our team will likely experience all three of these, simultaneously. Each individual will take their own path, contributing to the boiling soup of emotion-laden stories. If we let these stories run amok, they will wash over the team as multiple waves of viral spread, inhibiting our ability to process this feedback.

The weird part is that these stories are often based on truth, which often makes them hard to untangle from. Sometimes our users really are just trolling us. Sometimes that twitter guy is actually right. And sometimes all this negativity is just too much and we must take a break to preserve our sanity. 

💪 Practice 

We are better off accepting that a mix of the three plot lines will always play out in our minds in response to negative feedback. We can’t wish them away or somehow reach the plateau of perfect rationality where we never have them again.

But if we practice noticing the stories and familiarize ourselves with the paths where they take us, we can start opening up a bit of space between ourselves and our emotions. And as we do, we increase our capacity to receive and process even the most crushing criticism.

We also don’t have to do it alone. A useful tactic that worked for me is asking someone who is less attached to the project to help me interpret the feedback. It can be quite puzzling, yet liberating to see someone else not have the same emotional response. It helps me see the story that I am trapped by – and is usually enough to get unstuck.

Putting all of this together into bullet points:

  • Accept that we will all have our individual emotional response to negative feedback – and that our first instinct will be to react to our emotions, not the feedback itself.
  • Have tools to orient ourselves in the midst of the response, to detach from what we’re feeling and bring our attention to the actual feedback. Dr. Horney’s framework is just one example.
  • Practice using these tools as a team to reduce the spread of emotional contagion. Even the basic awareness of the underlying process could do wonders to how the team reacts to negative feedback.
  • Grow a network of trusted friends and colleagues from diverse backgrounds and environments, with whom we can share the bits of feedback we’re struggling with and count on their support to help process it.

Whew, this was long. But here’s hoping that my ramblings will help you and your colleagues navigate the uncomfortable, yet ultimately essential process of receiving negative feedback. Let me know how it goes.

Deep Stack Engineer

Riffing on the idea of layer gaps, we can surmise that pretty much every layer we ever get to write code for has gaps. If that’s the case, then anticipating layer gaps in our future can lead to different ways to build teams.

A key insight from the previous essay is that when we work with a layer with gaps, we need to understand both this layer and the layer underneath it. For if we ever fall into the gap, we could use that knowledge of the lower layer to orient and continue toward our intended destination.

Which means that when we hire people to work in a certain stack, we are much better off hiring at least one person who has experience with the stack’s lower layer. These are the people who will lead the team out of the layer gaps. To give our full stack engineers the ability to overcome these gaps, we need at least one deep stack engineer.

A simple rule of thumb: for every part of the stack, hire at least one person who has experience working at the layer below. 

For example, if we’re planning to develop our product on top of a Web framework, we must look for someone who deeply understands this framework to join the team. Ideally, this person is a current or former active participant in the framework project.

Approaching this from a slightly different angle and applying the cost of opinion lens, this person will act as the opinion cost estimator for the team. Because they understand the actual intention of the framework, they can help our team minimize the difference of intentions between what we’re engineering in our layer and the intention of the underlying framework. As my good friend Matt wisely said many moons ago, it would help our team “use the platform” rather than waste energy while trying to work around it. Or worse yet, reinvent it.

Note that the experience at the lower layer does not necessarily translate to the experience at the higher layer. I could be a seasoned Web platform engineer, with thousands of lines of rendering engine C++ code under my belt – yet have very little understanding of how Web applications are built.

What we’re looking for in a deep stack engineer is the actual depth: the capacity to span multiple layers, and go up and down these layers with confident ease.

The larger the count of layers they span, the more rare these folks are. It takes a lot of curiosity and experience to get to the level of expert comfort across multiple layers of developer surfaces. Usually, folks tend to nest within one layer and build their careers there. So next time we come across a candidate whose experience spans across two or more, we are apt to pay attention: this might be someone who significantly improves the odds of success in our engineering adventures.

Layer gaps

I’ve been writing a bit more code lately, and so you’ll notice that some of my stories are gravitating that way. Here’s one about layer gaps.

Layer gaps are when a developer surface layer fails to close fully over the layer below. Another way of saying this is “leaky abstraction”, but I’ll use my own term “layer gaps” to define what it entails in a more nuanced way.

To recap what I’ve written previously about layers, every layer tends to offer a bit of its own opinion about how to best bring value to users. When the layers are able to fully express this opinion, we have no layer gaps in this layer. For example, JavaScript is a gapless layer. It’s a language and a firm opinion about a code execution environment. It might not have features that we would like to have in a language. It might have limits within its execution environment. It might even have frustrating bugs that irritate the dickens out of us.

But at no point of using JavaScript we will suddenly go: “whoa, I am no longer in JavaScript. I fell into some weird gap that I wasn’t previously aware of, and now I am experiencing the lower layer on top of which JavaScript was built”.

To better grasp what a layer gap looks like, we don’t have to go far. Let’s look at TypeScript. TypeScript is a really interesting beast: it’s a layer of a type system that is laid over JavaScript. First off, the type system itself is delicious. It reminds me a bit of the C# type system, and I thoroughly enjoyed learning and using both. However, there’s a gap into which I fell into more than once.

Because the type system is compile-time only, the layer disappears at runtime. It simply doesn’t exist anymore once the code is executed by the underlying JavaScript engine. However, compared to other type systems that I am familiar with, I expect at least some runtime type support. 

At least for me, my mental model of a type system includes at least a little bit of ability to reason about types. Like, at least comparing them at runtime. A bit of type reflection might be nice. But because there’s no such thing as TypeScript when the code actually runs, I experience the layer gap.

As developers of layers, we need to remember that if our layer has gaps, our user must not only understand how our layer works, but also how the lower layer works, and have the gaps clearly marked. For if we don’t, we’ll hear frequent screams of anguish as they discover them. A clearly marked gap might look like documentation that helps our developers understand the tradeoffs they are making by using our layer and make the decision to use it on their own terms. It could look like superb tooling that points out the gap as soon as the user gets close to it – and possibly both.

As users of these layers, we need to be ready for every layer to potentially have gaps. We need to invest time upfront to uncover them, and build our practices to fence around the gaps.

I was sharing with my colleagues that using TypeScript is like walking on stilts. I can get really, really good at walking on stilts. I could even learn how to run on stilts and do all kinds of acrobatic tricks while on stills. But I should never forget that I am wearing them. If I do, I may find myself unpleasantly surprised when the ground suddenly hits my face. 

Layer gaps aren’t necessarily a terrible thing. They come with tradeoffs, and sometimes these tradeoffs are worth it. For instance, I embraced TypeScript, because I can delegate some of the mental load of reasoning about the data structures to the TypeScript compiler – and it does a pretty good job of it.

I just need to keep remembering that as I am enjoying the benefits of seeing farther and being taller, I am doing this by wearing the stilts.