AI Patterns and Breadboard

In my last post, I kept talking about AI patterns, but kept it a bit vague. I thought it might be useful to share a couple of examples to describe what I mean by “AI patterns” a bit more clearly. Once again, put your technical hats on.

🗺️ How to read AI pattern diagrams

As part of practicing the “build a thing to build the thing” principle, we implemented quite a few of AI patterns in Breadboard. I will use the diagrams we generate from the boards (thank you Mermaid.js!) to illustrate the patterns. Here’s a quick guide on how to read the diagrams – and as a happy coincidence, a brief overview of Breadboard concepts.

 🔵 The inquisitively blue parallelogram represents the “input” node. This is where the user’s input is requested by the pattern. Because most patterns ask for input first, it’s a good place to start when tracing the flow of the graph.

🟢 The cheerfully green hexagon is the “output” node, which provides the output to the user of the pattern. For many patterns, that’s the end point, the journey’s destination, while for a few – just a brief stopover.

🟡 The curiously yellow boxes are all nodes that do interesting stuff. For example, “generateText” node invokes the LLM, while “promptTemplate” combines a template and various bits of text into a prompt that’s suitable for the LLM. Most of the time, you can guess what the function does by looking at its name.

🔴 The protectively red box with rounded corners is the “secrets” node, which has access to the user’s sensitive data. For most (all?) patterns, it is used to retrieve and pass the API Key to unlock the ability to invoke the large language model.

🍷 The variously-shaped wine-colored boxes are utility nodes: they are mostly here to serve other nodes by supplying important data and making it possible for graphs to be composable and useful. We’ll be mostly ignoring them here – but I will very likely be back to sing their song in the future.

Most nodes will have a two-line label. The first line is the type of the node and a second is its unique identifier. Because there can be multiple nodes of the same type, we need an id to distinguish between them.

 Just like with the literal breadboards,  nodes are connected with wires. Wires are represented by lines with arrows. The direction of the arrow on each wire represents the flow of information. So, when the graph shows this:

… it means that the information flows from the “promptTemplate” node to the “generateText” node.

Each wire is labeled. All labels have the same consistent “out->in” format. A good way to think of it is that every node may have multiple inputs and outputs. The wires connect these inputs and outputs.

In the example above, the output named “prompt” of the “promptTemplate” node is wired to the input named “text” of the “generateText” node. Most of the time, it’s not difficult to infer the purpose of the wire. Like, the wire above flows the prompt produced by the “promptTemplate” node as input of the “generateText” node. If you are curious about all the ins and outs of nodes (pun intended!), check out this guide on Github.

Some wires will have a circle at the end of them, rather than an arrow. These are constant wires. There’s a lot more to them, but for now, a good way to think of them is that they are here to specify constant values. Like in the diagram below, the “template” utility node supplies a constant “template” input to the “promptTemplate” node. 

With this quick Breadboard refresher out of the way, we’re ready to dig into the actual patterns. To keep this post from becoming a book, I’ll give you only three examples.

 🧮 The Calculator pattern

Let’s start with the widely used Calculator pattern (you can also see it here in on Github, implemented in Breadboard):

The structure of this pattern is very simple: user input goes into the “promptTemplate” node, which produces a prompt that goes into the “generateText” node, the output of which is fed to “runJavascript” node, and the result is returned as output.

As it often happens with AI patterns, the magic is in the contents of the prompt template. In this pattern, the LLM is used to find solutions to mathematical problems in a very clever way. 

As you may have heard, LLMs aren’t so great at math. So instead of approaching the problem head-on, we lean onto LLM’s strength: we convert a math problem into a language problem.

In the Calculator pattern, we ask the LLM to do what it does best: generate text. We ask it to write code that solves a math problem, rather than try to find the answer to the question. Here’s a prompt to do that:

Translate the math problem below into a JavaScript function named `compute` that can be executed to provide the answer to the problem.
Math Problem: {{question}}

Because writing code is a language problem, LLMs are pretty good at it. So, with a high degree of consistency, the output of the LLM will be a function that, when run, produces the right answer. Leave computation to the old-style computers. Let LLMs write code that will be computed.

For instance, when we replace the {{question}} placeholder with:

What is the square root
of the perimeter of a circle w
ith a diameter of 5?

The LLM will happily produce this function:

function compute() {
  const diameter = 5;
  const radius = diameter / 2;
  const perimeter = 2 * Math.PI * radius;
  return Math.sqrt(perimeter);

Which, when executed, will give us the correct answer of `3.963327297606011`. If you ask any conversational agent today a math question and it surprises you with an accurate answer, chances are that some variant of the Calculator pattern is being employed.

📰 The Summarizer pattern

Another common pattern builds on the LLM’s strength of narrating information, even when presented with bits of random content. I experimented with this ability early this year, and here’s an implementation of the pattern in Breadboard (also here on Github):

When we look at the structure above, we can see that user input splits into two paths.

The first route is circuitous. It takes us through the “urlTemplate” node that creates a valid URL (it’s a Google News RSS feed with the topic as the query), which is then fed to the “fetch” node. The “fetch” node grabs the contents of this URL, and sends it to the “xmlToJson” and “jsonata” nodes that munge RSS into a list of headlines.

The second and the first route meet up at the “promptTemplate” node, where they predictable move to the “generateText” node and, finally, the result is presented to the user.

The concept is fairly straightforward: give the LLM a topic and a few sentences, and request a summary. If – as is the case in the graph above – we are summarizing news headlines, a prompt will look something like this:

Use the news headlines below to write a few sentences to summarize the latest news on this topic:
## Headlines{{headlines}}
## Summary:

In this prompt, we have two placeholders: the {{topic}}, which is where the subject of summarization will go, and the {{headlines}}, where we will plug in the various headlines from a news source (Google News).

The key distinction between just asking an LLM a question and using this pattern is that we’re not relying on LLM’s parametric memory to contain the answer. We’re not asking it to find the answer for us. Instead, we are only employing its narrative-making abilities, supplying the raw information in the prompt.

So, if I for example put “breadboards” into the {{topic}} placeholder, and the following list of headlines from Google News (just the first first 20 for this particular board) into the {{headlines}} placeholder:

Thermochromic Treatment Keeps Solderless Breadboards Smokeless - Hackaday
Jumper Wires For Electronic Components - IndiaTimes
10 hostess hacks to make your food look better than it is - Colorado Springs Gazette
Gabriel's Cyberdeck Red V2 Packs in a LattePanda Delta 3, Analog Discovery 2, HackRF One, and More -
How to Measure Voltage on a Breadboard - MUO - MakeUseOf
The Ultimate Breadboard Platform? -
Building Circuits Flexibly - Hackaday
Lewiston Art Festival: A 'dinosaur' of woodwork - Niagara Gazette
Podcast 220: Transparent Ice, Fake Aliens, And Bendy Breadboards ... - Hackaday
Flexboard: a flexible breadboard for flexible and rapid prototyping of ... - Tech Explorist
Derek Fogt | Communities | -
MARNI JAMESON: Compensate for humdrum food with stylish ... - Sarasota Herald-Tribune
Build HMI screens with MicroLayout for your Meadow Apps -
Tidy Breadboard Uses Banana Bread - Hackaday
Old 6809 Computer Lives Again On Breadboards - Hackaday
My Favorite Things: Hardware Hacking and Reverse Engineering - Security Boulevard
Luna in Cocoa Beach offers serves thoughtful modern Italian food - Florida Today
Teaching Method Increases Students' Interest in Programming and ... - University of Arkansas Newswire
From A 6502 Breadboard Computer To Lode Runner And Beyond - Hackaday
How to Breadboard Electronics Projects with Raspberry Pi Pico - Tom's Hardware

… we will get this output from an LLM:

The latest news on breadboards include a new thermochromic treatment
that keeps solderless breadboards smokeless, a flexible breadboard for
flexible and rapid prototyping, and a new method for teaching students
programming and electronics.

For the quality of the junk we fed it, it ain’t half bad!

The Summarizer pattern has a much more popular cousin named Retrieval-augmented Generation (RAG). RAG is all the rage these days, and everyone wants to have one. If we peek under the covers, we’ll recognize the Summarizer pattern combined with another neat LLM capability of semantic embeddings into the Voltron of patterns.

🔁 The ReAct pattern

I would be remiss not to bring up ReAct when talking about AI patterns. This pattern ushered the new mini-era of LLM applications, a breakthrough that redefined what LLMs can do.

The ReAct pattern is different from the ones mentioned earlier, because it is cyclical: rather than asking an LLM once, it may do so several times, repeating until the problem is solved.

ReAct introduces this really interesting idea that we can induce chain-of-thought reasoning capabilities in LLMs if we structure our interaction with them in a certain way. In this chain of thought, the LLM interacts with the outside environment, suggesting actions to take and then reason about the outcomes of these actions.

I’ve talked about LLM-based reasoning a few times before, so this concept shouldn’t be entirely novel to my readers.

In ReAct, the key trick is in establishing a predictable beat of reasoning within the prompt:

1️⃣ First comes the Question – the question that the user asks
2️⃣ Then, comes the Thought – the opportunity for an LLM to reason about what to do next
3️⃣ After Thought is Action – LLM’s suggested action to take
4️⃣ Finally, the Observation – the outcome of the action, supplied by the tool 

Steps 2️⃣,  3️⃣,  and 4️⃣ keep repeating until the answer is found. 

The LLM is only allowed to pipe in on steps 2️⃣ and 3️⃣: that is, it can only produce the “Thought” and “Action” parts of the overall sequence.

Step 1️⃣ is provided by the user, and the observation in step 4️⃣ is supplied as the outcome of whatever action the LLM suggested to take.

As the steps repeat, all of these steps are being added to the overall prompt, allowing the LLM to see the history of the interaction and reason about it. In this way, and unlike in the Calculator and Summarizer patterns, the ReAct pattern simulates memory: with each invocation, the LLM can see how it acted in the past.

It’s a pretty convoluted contraption, as you can see it from the Breadboard implementation (here on Github):

However, the presence of memory and a repeating cadence of steps guides LLMs to allow it to answer much more complex and interesting questions. The answers are no longer produced in one go. Instead, LLMs walk toward the answer in multiple iterations.

Let’s see if we can make it a bit more concrete by breaking down an actual transcript of a ReAct pattern solving a problem.

First, we supply the question:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?

In response, the LLM the produces the following Thought and Action steps:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon

The LLM is asking to search for the distance between the Earth and the Moon. Notice that it is not trying to make up the answer. It is merely suggesting the next action to take. Taking control over from the LLM, the ReAct pattern invokes search and produces the following Observation:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km

With the observation in hand, the ReAct pattern hands control back to the LLM, and get this response:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km
2️⃣ Thought: I need to square root this distance
3️⃣ Action: math
   Action Input: sqrt(384400)

Great! Now the LLM wants us to do math. Fine. The ReAct machinery rattles and burrs, and produces the next Observation:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km
2️⃣ Thought: I need to square root this distance
3️⃣ Action: math
   Action Input: sqrt(384400)
4️⃣ Observation: 620

The ReAct pattern yields control back to the LLM, and it triumphantly declares victory:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km
2️⃣ Thought: I need to square root this distance
3️⃣ Action: math
   Action Input: sqrt(384400)
4️⃣ Observation: 620
2️⃣ Thought: I now know the final answer
   Final Answer: 620

Great job, model. You did it.

The ReAct pattern also introduces, almost as an afterthought, the concept of tools. Since the LLM is asked to suggest an action, it seems useful to specify the kinds of tools the LLM has at its disposal.

In the transcript above, the “search” and “math” tools were used. For other kinds of problems, there might be a need for other kinds of tools. 

This is where the most valuable aspect of the ReAct pattern resides: if we can specify our own tools, we can make LLMs do useful things. For example, I could hand it a “calendar” tool, an “email” tool, and a list of my friends and ask it to schedule a lunch for us. Or I could turn it into a menu-ordering system, where it would rely on menu-understanding tools to take customer orders.

The pattern stays the same,  but the tools change. With the ReAct pattern, we can build actual helpful agents. If you’ve been watching the LLM space, you have no doubt noticed a lot of activity around this notion.

🍞Patterns with Breadboard

These are just a few examples of interesting patterns that emerged in the generative AI space in the last few months. Honestly, the pace of pattern discovery has been nuts. So far, I see no signs of it slowing down. What I’d like to do with Breadboard is to help make these patterns more legible to everyone – so that more people can play with them, create their own, and explore this fascinating space together.

My intuition is that when we lower the barrier to entry to this process of discovery and make it easy to tell whether the new pattern is good or not, we have a better chance of exploring the space more thoroughly and realizing the full potential of large language models.

Composing graphs with Breadboard

This post is more technical and more rambling than my usual repertoire. It’s a deep dive into where my mind is today, and it’s mostly about technical design. Think of it as a sample of interesting problems that I am puzzling over these days.

Let’s start with graphs. In the past, I’ve mentioned this interesting tension between graphs and trees: how our human tendency to organize things into tree-like structures (hierarchies and other container models) is at odds with the fluid, interconnected nature of the world around us. I framed it as: “every graph wants to become a tree, and still secretly wants to remain a graph”, and described in the context of developer frameworks.

So naturally, when presented with an opportunity to write a developer library, I decided to use the graph structure as its core. The result is Breadboard. It’s still early on. I like to say that we currently have a toddler, and there’s much left to do to get the library to adulthood.  However, it seems useful to start sharing what I’ve learned so far and design decisions I landed on. 


If you even occasionally scan these chronicles of my learnings, you will undoubtedly know that I am fascinated by the potential of applying large language models (LLMs) as a technology and the kinds of new interesting spaces they could open.

As a result, I invested quite a bit of time tinkering with the models, trying to make them jump through various hoops and do things – just to get a sense of what it is that they are truly capable of. Judging from the endless updates I hear from my friends and colleagues, so is everyone else.

New interesting patterns of applying LLMs seem to arise almost daily – and that’s pretty exciting. What was less exciting for me was the distinct lack of tools that help us discover these patterns. Most of the frameworks that are rising to relative prominence appear to focus on capturing the newly discovered patterns and making them more accessible. This is great! But what about a framework that facilitates tinkering with LLMs to find new patterns?

I started Breadboard with two objectives in mind:

  1. Make creating new generative AI patterns accessible and fun
  2. Enable easy sharing, remixing, composition, and reuse of these patterns.

My hope is that Breadboard helps accelerate the pace with which new interesting and useful patterns for applying generative AI are found. Because honestly, it feels like we have barely scratched the surface. It would be super-sad if the current local maxima of chatbots would be as far as we get with this cycle of AI innovation.

🍞The metaphor

The first thing I wanted to get right was the mental model with which a developer might approach the library. 

Graphs are typically hard to describe. I am in awe with whomever came up with the term “Web” to describe the entire tangle of the hyperlinked documents. Kudos.

As you may remember, I am also very interested in the role makers play in the generative AI space. That’s how breadboards came to mind: the solderless construction bases for prototyping electronic circuits. Breadboards are a perfect maker’s tool. They are easy to put together and easy to take apart, to change the layout and tinker with various parts.

Lucky for me, breadboards are also graphs: the electronic circuits they carry are directed graphs, where each electronic component is a node in the graph and the jump wires that connect them are edges. By placing different nodes on the board and wiring them in various ways, we get different kinds of prototypes.

This is exactly what I was looking for: a one-word name for the library that comes with the mental model for what it does. As an additional benefit, “breadboard” selects for makers: if you know and love breadboards (or even just the idea of breadboards), you will likely look forward to playing with this library.

🧩 The composition system

Another piece of the puzzle was composition. Over the last decade, I ended up studying composition and designing composable systems quite extensively. In Breadboard, I wanted to lay down a sound foundation for composition.

There are three different ways to compose things in Breadboard: 🧩 nodes, 🍱 kits, and 🎛️ boards.

🧩 Nodes are the most obvious unit of composition: we can place nodes on a breadboard and wire them together. At this layer of composition, makers compose nodes to make our prototypes. Once they have a neat prototype, makers can share the board that contains the prototype. A shared board is something that anyone can pull down from a URL and start playing with it. They can clone it, tweak it, and share it again.

To get the node composition right, we need a set of nodes that allow us to build something useful and interesting. While still at an early stage, it is my intent to arrive at a starter kit of sorts: a relatively small set of general-purpose nodes that enable making all kinds of cool things.

🍱 We don’t have to stop with just one kit. Kits are another unit of composition. Makers are able to create and group interesting nodes into kits – and publish them to share with others. For instance, a project or a company might want to wrap their interesting services as nodes and publish them as a kit, allowing any maker to grab those nodes and start using them in their boards.

A maker can also just build a kit for themselves, and use it in their own prototyping only. While kits do not need to be published, boards that use unpublished kits can’t be shared with others – or at least shared in any useful way.

🎛️ Boards themselves are also units of composition. Makers can include boards of others into their board, turning an included board into a sort of virtual node. The board inclusion feature is similar to a hyperlink: just like on the Web, including a board simply links from one board to another, rather than subsuming it. Such loose coupling unlocks the full potential for interdependent collaboration, and I fully expect the common dependency management practices to be applicable.

In addition to inclusion, boards can have slots. Slots are another way to add modularity to boards. When I build a board, I can leave it incomplete by specifying one or more places – “slots” – where someone else can include their boards. This is a useful trick that software developers call “dependency injection”. For instance, if I developed a generic pattern to invoke various tools with generative AI,  I can leave a slot for these tools. When other makers reuse my board, they can insert their own sets of tools into this slot without having to modify my board.

🤖 The traversal machine

It took me a little bit of time to settle on what is a “node” in Breadboard and how these nodes get traversed in the graph. I ended up going with the actor model-inspired design, leaving lots of room to explore concurrency and distributed processing in the future. For the moment however, I am primarily guided by the motivation to make Breadboard graphs easy to understand.

One capability I wanted to enable was building graphs that have cycles within them. Pretty much anything interesting contains feedback loops, and so Breadboard supports directed graphs with cycles out of the box. Calculating topography of such graphs is an NP-complete problem, but lucky for us, traversing them is fairly trivial. After all, most computer programs are directed graphs with cycles.

At the core of the traversal machine is this concept: a well-behaving node is a pure function. More precisely, as close to a pure function as we can get. Since makers can create their own nodes, Breadboard can’t guarantee any of that, but I’d like to encourage it.

Since pure functions don’t contain state, state needs to be managed outside of the function. Breadboard relies on wires as the method to manage state. Wires are the way both data and control flow.

This sets up the basics of the traversal logic:

  • Every node has inputs and outputs. The inputs are the wires running into the node, and outputs are wires running out. A node consumes inputs and provides outputs.
  • The node – or technically the function that the node represents – is only run when all inputs have been provided to it by the nodes that ran before this node. Put differently, a node will not run if some of its inputs weren’t provided.
  • Graph traversal starts with running nodes that don’t have any inputs wired into them.

That’s about it.

To geek out on this a bit more, I went with a mailbox-like setup where wires are effectively one-time variables that store data. The data in this variable is written by a node output and read by a node input. A super-cool effect of such a setup is that the state of the graph is captured entirely in the wires, which means that Breadboard can pause and resume traversal of the graph by saving what’s currently stored in the wires.

🚧 What is next

Looking ahead, I am quite optimistic about Breadboard. We already have a small seedling of a team developing. In the next few weeks, I’ll keep making interesting patterns with it to keep informing the development of the library. Build a thing to build the thing, right?

Once the fundamentals settle a bit, we can start thinking about graduating Breadboard into an early adulthood, releasing the v1. Hopefully, at this point, we will have enough onramp for you and other makers to start actively using it in your prototyping adventures.

If you feel excited about this idea and don’t want to wait until then, please check out the list of open issues on Github and join the conversation. Be prepared to eat unbaked cookies and occasionally find bits of construction debris in them – and help make Breadboard better.

The ODA and OODA stances

After writing a bit about principles and tensions, I thought I’d switch tracks and go with something slightly different. Philosophy meets strategy in this story about how we see ourselves in relation to the world around us.
I’d like to start by making a distinction between two different ways in which we place ourselves in our environment. The first one we will call the ODA stance and we’ll name the second the OODA stance. I’ll explain the names shortly.

📈 The ODA stance

The ODA stance  is fairly common and broadly applied. In this stance, we see ourselves situated in our environment and able to perceive our environment directly. When we act on our environment, we observe changes, make decisions, and act again. This cycle – let’s call it the ODA loop (Observe, Decide, Act) – is our way to navigate the environment.

In many situations, the ODA loop surprisingly well. We observe customer behavior, we decide on the change we need to make to our product to adapt to this behavior, and then we make that change. Because we can see the environment around us (the Observe part), we can make plans, create OKRs, make Gantt charts, and proceed forth with confidence.

When in the ODA stance, it is all about making the right decisions. To make sure our actions yield the outcomes that we desire, our decisions have to be correct. If our organization’s narratives rotate around effectiveness of decision-making, it’s very likely that we’re in the ODA stance.

Because of that, organizations in the ODA stance are usually program-heavy. They trend toward rails and cranks. Rails are processes and practices that ensure that only correct decisions are made. Cranks are tools that to make sure that following the right processes and practices is as easy as turning the crank. When I am in the ODA stance, I have this firm belief that if I put down enough rails and create ergonomic enough cranks, I can solve any problem at scale.

The ODA stance will start to feel insufficient and missing ingredients when we get blindsided by an unanticipated change. When our plans and projections get swept away by a disruptive innovation or a newly discovered challenge, we begin to struggle.

Something feels off. It often feels like someone or something is deceiving us. Organizations tend to react to this feeling in various ways. Sometimes, the guilty parties are identified and dismissed. Sometimes, the external entities are deemed antagonistic and troops are rallied to defend the fort. Very often, the disruption is dismissed as a blip, an outlier that’s not worth paying attention to. In most cases, the whole ordeal is forgotten. Until it happens again.

Technological innovation really messes with the ODA stance. Every new change brings chaos, the uncomfortable time when things stop making sense. The best thing we can do, it seems, is to hang on for our dear lives and wait out the turbulence. Unfortunately, the frequency of storms only seems to increase.

Unsurprisingly, the ODA stance organizations experiencing the storms intuitively lean into even more rails and cranks. They strive to create better decision-making processes, document practices more precisely, and enforce more stringently that they are followed.

🌀The OODA stance

After getting bruised by the turbulence a few too many times, a new stance tends to emerge. Speaking from experience, we rarely grasp the idea and adopt this stance immediately. It takes actual experiential scars to arrive at it.

The OODA stance discards the idea of perceiving the environment directly. Instead, we insert this concept of a mental model between our observations and our decisions. This mental model is what, back in the ODA stance, we mistook for the environment.

In this stance, observations stream into our model, and continuously update this model, causing us to orient – to change our understanding of our environment. When we see an observation, we orient, then make decisions, then act. Helpfully, a fellow named John Boyd already coined the term for this cycle: the OODA loop.

The addition of an extra “O” to the ODA loop from the previous stance implies a more humble posture toward the environment. We do not know what the environment looks like. We may never have a full grasp of it. We only have observations as clues. We try to cobble these observations as best we can into a model of the environment, and then try to guess what the environment will do next by using our model to make these predictions.

It’s a productive kind of humility. Instead of beginner’s humility, where we are too timid to make predictions because we are alarmed by our own ignorance, with productive humility we admit that the environment is likely too complex for us to grok it fully, yet we have tools to navigate this complexity.

Organizations operating in the OODA stance focus on bettering their mental models. They recognize that the key to making good decisions stems from their ability to maintain mental models that result in good predictions.
This recognition usually leads to three types of investments:

1️⃣ Sensing. The OODA stance craves observation. Productive humility dictates that external sensing is paramount. We put in a lot of effort to ensure that information from outside of the organization (and the inside!) flows as freely as as abundantly as possible. Sensing is not a responsibility limited to a small group of folks with the right job titles, but a daily practice for everyone.

Organizations with the OODA stance encourage the culture of reading and writing, of seeking out novel trends. There are spaces for robust conversations about these trends and tools to plug into the flow easily. 

2️⃣ Flexibility. When we’re in the OODA stance, we want to ensure that we’re able to orient flexibly. No matter how meticulously maintained and useful so far, mental models are only based on past observations. Any new disconfirming evidence must be treated as a signal that the model might need an update. 

Organizations in the OODA stance have a healthy fear of seeing it all wrong. They build tripwires and warning lights into their processes, and treat challenges to established mental models as gifts, rather than annoyances. Coming from this organization’s leader, “Prove me wrong” is not a threat, but a plea.

In contrast with the ODA-stance organizations, it’s not the speed of the decision-making, but rather the agility of adjusting course that is valued and developed.

3️⃣ Shared mental model space. Organizations that favor the OODA stance nurture shared mental models. A funny thing: when we adopt the OODA stance, we recognize that everyone has their own mental model that they use to make predictions. Lived experiences, culture and upbringing, all contribute to a massive kaleidoscope of mental models of individuals across the organization. When looking at the same environment, we are seeing slightly different things, often without knowing it.

Contemplating this notion can feel overwhelming. The whole post-modernist movement might have risen out of that. To get past the “is there really a real reality?” line of inquiry, we lean into productive humility. It is clear that a collection of individuals of greater diversity of perspectives will likely have parts of a larger mental model amongst them. Everyone has a piece of the puzzle.

The trick is to assemble this puzzle. In the OODA stance, we expressly cultivate spaces where people can share their experiences, and have confidence that their perspective will be listened to and incorporated into the bigger picture. Collectively, we learn how to overcome the discomfort of others seeing something entirely different from what is plainly obvious to us.

☯️ Compare and contrast

Both stances have their utility. The OODA stance takes a lot more self-discipline and accumulated experience – the scars! – to acquire and hold. Because of that, the ODA stance is the default choice for most organizations.

Particularly in environments that are very familiar, don’t change very much, or change in monotone ways (like seasons), the ODA stance can work great. A good way to think of it is that when we don’t need to Orient ourselves a lot within an environment, we can simply omit the middle “O” from the OODA loop.

The ODA stance brings industrial scale. I might even argue that one can’t have industrial scale without leaning toward the ODA stance. We can only contemplate large-scale endeavors only when our model of environment is so sound and well-embedded across the organization that we don’t even think about it. To make a well-functioning bureaucracy, one needs a well-established system of governance.

On the other hand, in novel environments and environments that change rapidly, where the existing mental models keep failing to predict useful outcomes, the OODA stance becomes necessary. The ODA stance is simply blind to all the novelty, experiencing it as utter chaos. This is when it becomes necessary to decouple our mental models from the environment – and embrace the weird and wild ride of the OODA loop.

One miracle at a time

Technological innovation can be viewed as a network of miracles: breakthroughs that make possible something that wasn’t impossible before. Some miracles are major and others are minor.

It is the job of a technology strategist to identify miracles that will happen or will need to happen to make forward progress on the idea they have in mind. Miracles are fickle. Their appearance is not guaranteed. Something that looks like it’s just within reach can continue to stay that way for decades. Betting on a miracle is never safe – otherwise, they won’t be called miracles.

It is in this context that the tension between two forces arises.

On one side of this tension, there’s a desire to increase the likelihood of making forward progress. It usually shows up as wanting to pick the safer, more predictable next steps, and reduce the impact of the miracle not happening.

On the other side, there’s the wish to move forward faster. This is where we see the impatience and the urge to dream big, and reach for the stars – to go all in on bringing the miracle forth.

Both extremes lead to unproductive outcomes. Making safe bets means not being different from the rest of contenders, and being different is the name of the game in technological innovation. At the same time, the unhealthy disregard for the impossible is a great recipe for losing everything and gaining nothing.

To resolve this tension, I’ve learned to apply the “one miracle at a time” principle. The gist of this principle is that we pick one challenging part of the idea we’d like to realize and find ways to de-risk everything else. We minimize the number of miracles necessary for success of the one we care about.

This doesn’t mean we give up on the larger vision. A good way to think about it as finding the first stepping stone to jump to.  Sometimes it’s helpful to imagine lining up all the miracles into a series of steppings stones. Which one will we start with?

Applying this principle to practice might look like this.

1️⃣ Start with the inventory of miracles. Have a rigorous miracle count conversation. What’s loosely easy and what’s difficult? How hard are the difficult bits? Which ones feel like miracles?

2️⃣ With the miracles rounded up, find the ones that are important. There will always be the miracles that feel more like dependencies compared to those at the core of the vision. Consider taking those miracles off the table. Apply this rule of thumb: the stronger your conviction around the vision, the less you should care about the dependency miracles.

3️⃣ Pick one. Ensure that everyone has the same one miracle in mind.  Very often, especially in teams that are running ahead of the pack, we will find a large diversity of what peeps are passionate about. This diversity is great for ideation, but when we’re choosing the miracle, it can be counter-productive.

A common trend, especially among organizational cultures biased toward being agreeable,  is that there’s a whole portfolio of pet miracles being brought along with the chosen one: everyone says “okay, let’s focus on that one” and thinks “… and mine, too”. 

Choosing the next miracle is a convergent process. It requires difficult and honest conversations. In my experience, this is the hardest step.

Don’t be surprised if the chosen miracle is a minor dependency. This likely means that the vision is vague and needs further refinement – and few ways are better to do so than trying to start at the edges of the problem.

4️⃣ Once the miracle is picked, work to reduce uncertainty everywhere else. Formulate ideas in a way that are complementary to what is currently possible. Replace all those dependency miracles with withered tech – something that’s well-established and known to be reliable. Align with projects and priorities that are already on everyone’s mind.

There will be worries and well-argued points about how choosing today’s technologies to build tomorrow’s is fraught with peril. They will be very tempting. I’ve gotten snared by them many times. I also found that the most productive way is to embrace the paradox. Accept these arguments as true – and yet, recognize that each new miracle cuts down the odds of our work mattering when tomorrow finally arrives.

With miracles, there’s always a coin toss somewhere. Some things we thought were certain and simple prove otherwise. And things that we decided weren’t worth paying attention boomerang at our backs. By picking one miracle at a time, we can reduce the total surface of uncertainty and be principled about where we focus our attention.

Build a thing to build the thing

When building new products, there’s always a weird tension between making something “real” and production-ready, and spending a bit of time just experimenting with all of the various ideas of what this “real” might look like.

This tension is less noticeable when we actually know what we want to build. If I am building a new Web rendering engine, I literally have all the specs – and a few well-established implementations for reference. However, when the actual product is a bit of an unknown, the tension starts to surge.

There are typically two forces that create this tension. First, therse’s the desire to ship expeditiously and engage the customers. This might come from the intention to maximize our chances to get it right, but also could just be a business necessity.

Then, there’s another force – the desire to deliver something that truly resonates with the customers. It’s bizarre how the underlying animating intention could be the same “let’s get it right”, but the approach is different: instead of jumping in as soon as possible, we try to first figure out what “right” looks like.

My intuition is that the two varying approaches come from different evaluations of the “chances budget”: how many chances does this particular idea have before we blew it? Folks who see a large chance budget will veer toward “let’s just ship something and iterate (or survive/get promoted, etc)”. Folks who see only a handful of chances in the budget will tend to “let’s first get our ducks in a row”.

Depending on the organization, there will be a pull toward one extreme or another: and sometimes a soup of both at the same time. There might be peeps jumping to ship whatever remotely looks like a feature and spend marketing dollars on boosting its visibility. There might also be people trying to carefully orchestrate large-scale “spherical cow” ecosystems that can only practically exist in a caffeine-induced slide deck. 

📐 The Principle

In my experience, the trick to resolve this tension is the practice I call “build a thing to build the thing”. It’s a pretty simple trick, please don’t get too excited. The hard part is mostly in knowing how to apply it.

When we decide to “build a thing to build the thing”, we agree to focus first on building something that is immediately adjacent to what we actually want to build. In the developer experience field, this adjacency most commonly looks like this: “Let’s build something useful with our product, and see what we learn from it”.

If we’re building a new library or framework, let’s build something that uses it – so that we learn how to improve our library or framework. Build a thing to build the thing.

“Earnest effort” is an important ingredient. If this is just some half-hearted dabbling to check the box, the trick will not work.

Close to a decade ago, when I was working on the Chrome Web Platform team, we wanted to get a better sense of whether or not the APIs and primitives we’re shipping are actually helping developers make better mobile Web applications.

So a few of us locked ourselves in a room and spent a few weeks actually trying to build a clone of a well-known mobile app as a Web app, powered by the latest and greatest bits that we were shipping or about to ship. Our hypothesis was a negative proof: if we – the people who actually build the platform bits – can’t do it, then nobody can.

We also adopted the “live as our customer” posture and used only the tools that were available outside of Google.

Every week, we wrote up a report of what we learned. Where the friction was, where the seemingly easy tasks turned into albatrosses. Where primitives and APIs that we thought were useful actually weren’t.

We failed gloriously. I remember showing the Web app to our VP and the first load taking hundreds of seconds on a reasonably crappy phone. We tried hard. We rummaged in all the bags of tricks. We profiled JS. We profiled C++. We messed with V8 and Chromium code trying to make it less slow. In one particularly unwise moment, I wrote code to construct DOM with C++. 
At the end of the adventure, we had an incontrovertible proof: if we wanted for mobile Web apps to be on par with their native equivalents, we had to do something different as a team.

This exercise served as a shift for how I and my colleagues thought about what’s important (and not important). It triggered a change in priorities for the larger organization.  I’ll spare the gory details of how it all went down. Suffice to say, a big chunk of the current narrative about Web performance was shaped by the insights we gained from our adventure.

Sometimes, building a thing to build the thing is clarifying like that. Sometimes, it just creates more questions. Whatever the outcome, there will be precious insights, waiting for us to be harvested.

💔 The hard part

The greatest challenge of adhering to the “build a thing to build the thing” principle is in our own ability to be honest with ourselves.

Here are some failure cases that I’ve seen. Use them as tripwires if you decide to apply this principle.

1️⃣ The DevRel trap. During the exercise, the people who are building on top of the product are different from those who are building the product.
For instance, a contractor or a DevRel team is building an app with the library that the engineering team built.  This insidious pattern is so widespread that it’s even considered a best practice. In my experience, it is anything but. It feels so obvious: of course DevRel folks are the best people to do this project!

However, most of the valuable insights will fall into the crack between the DevRel and the engineering team. Glorious failures will not trigger reevaluation of priorities, but rather shrugs and side glances at the DevRel team: “Well, maybe they’re not as good as we thought they were”.

2️⃣ Forgetting the thing. We get so engrossed in the exercise that we forget which thing we’re actually building. This most commonly happens when the main product is vague and ill-defined, and the adjacent thing feels a lot more concrete and graspable.

The chain of events that leads to this failure case usually looks like this:

  • Step 1: We want to build <foo>.
  • Step 2: <foo> is entirely ill-defined. We have no clue how to build it.
  • Step 3: We decide to start with <bar>, which is a thing that could be built with <foo>.
  • Step 4: A few weeks/months later… We’re building <bar>. Nobody remembers <foo>.

There is nothing wrong with a product pivot. Just make sure it’s done intentionally.

3️⃣ The confirmation demo. Like confirmation bias, confirmation demos only show the good parts of your products. These “things built to build things” aren’t made to challenge the assumptions or draw insights. They are carefully crafted potemkin villages whose entire purpose is to avoid novel insights to be drawn. These things will change no minds.

Don’t get me wrong. Demos are important. However, they are in an entirely different universe from applying the  “build a thing to build the thing” principle.

Anytime the ambiguity of the path forward is high, and it is not quite clear what we’re building, it might be worth having at least one “a thing to build the thing” exercise in progress, and that the insights from it are collected diligently to aid with navigating complexity.

Live as our customer

This principle is something that feels very intuitive at first blush, but in my experience, is rather challenging to adhere to, especially as a team.

I will present this principle as resolving a tension between two forces that are very familiar to me in the realm of developer experience. I am pretty sure that these forces are still present in any other product development, albeit they may have different specific traits.

When building developer products for others, we often have a situation where the development environments that our customers have are different from ours.

They may use different stacks, build systems, or apply different processes from ours. Conversely, we may have our special tools that we’re used to and our particular practices that we employ to be more productive.

This difference forms the basis for the tension. Clearly, to ship a product that my customer loves and is willing to adopt means that I need to understand their environment. I need to know deeply the problems that they are facing every day: what is difficult? What is easy? Where can I help?

At the same time, I have my own environment that I am very comfortable in, honed by years of practice and incremental improvements. This is the environment that works best for me. This is the environment that I understand deeply, with all its quirks and idiosyncrasies.

The platonic ideal here is that I have it both ways: I deeply understand both environments, am able to hold both of them in mind, and develop for one while working in another.
If you can do this, kudos to you. But more than likely, there’s a bit of self-delusion going on. Based on my experience, this is simply not possible.

Instead, we subconsciously lean toward problems that we encounter in our environments, and tend to be blind toward the ones that our potential customers have. When we ship a thing, it looks like an alien artifact. It appears to solve problems that our customers don’t have, or try to solve their problems in weird, unworkable ways.

Imagine you’re an alien who was hired to be a chef. You’re asked to cook for humans. You can’t eat human food, and some of it looks revolting, honestly. Fried eggs. Blegh. How likely are you to cook something that humans will like?

This tension grows stronger if the difference between the environments is large. Putting it very bluntly: if our developer experience is so different that it feels like an island, we can’t build developer experience products that others will love — or even understand.

To resolve this tension, we must live as our customers. We must strive to shift to as close to the same environment as they have. If our customers use Github as their primary tool, we’d better use Github, as well. If the customers we target mostly use Fortran (bless them!), well then we must learn and adopt it as well.

Applying this principle is usually super-uncomfortable at first. Nobody wants to abandon their well-worn saddle. The new saddle will cause cramps and sore muscles for a while.  Expect a lot of pushback and well-reasoned arguments to return to the trusted old saddle. “This bug tracker sucks! I can’t even use keyboard shortcuts to navigate between issues! Our is so much better!” “OMG, this build system is arcane! What is all this XML stuff?! I thought we were in the 21st century?!”

There’s a kind of test that is built into this struggle. We signed up to build for these customers. Do we actually want to do that?

If the answer is still “yes”, we will find that we will be better off in the long term. We will have a much deeper sense of what our customers need, and where the pain points are. We will be able to spot them early and build things that they want to use.

The rise of makers

It seems pretty obvious that there is some sort of cyclical nature to technological progress. There seems to be a rhythm to how things happen, and many words were written to discern the structure of the cycle. My introduction to the cycle came by way of Tim Wu’s The Master Switch. In that book, the author proposed a cyclical rhythm of open and closed ecosystems alternating as new innovations emerge. Echoes of similar dynamics are found in Crossing the Chasm, The Innovator’s Dilemma, and other books. It’s like everyone is pointing at the elephant and trying to glimpse the full shape of it.

A tiny part of this elephant is the rise of makers. The rise of makers is the time in this cycle of progress when those who tinker and prototype emerge as the best candidates for finding the Next Big Thing. 

Every story of a tech giant’s humble beginning in a garage is the story of makers. My assertion is that the reason why we hear fewer of these stories in some periods of time than others is not because we run out of bright, entrepreneurial minds and certainly not because we run out of garages. Rather, the prominence of makers is a function of the cycle of technological progress.

Put differently, makers’ contributions result in significant outcomes in one part of the cycle and shift to the margins otherwise.

📻 Who are makers?

Perhaps I could back up a bit and start by describing what I mean by “makers”.

Makers are tinkerers and prototypers who are engaging with a new technology not because they have to as part of their job, but because they find it irresistibly interesting and fun. I like to say that developers are “9-to-5-ers” and makers are “9pm-to-early-morning-ers”.  This might not be an entirely accurate description, but it captures the spirit. It also hints at the key property of a maker: it’s not a kind of person. It’s a mindset. The same person who puts in their work hours at their day job becomes a maker extraordinaire in the evening or weekends.

Makers make stuff. They aren’t here for entertainment, fame, or other promise of future boon. They get their hands dirty. They build stuff. In the language of Crossing the Chasm, they are the kind of early adopter who doesn’t just adopt the tech. They make new things with it.

Makers play with technology, rather than apply it to achieve business goals. They delight in finding weird quirks and twists, the same way gamers love finding glitches that lead to a speedrun. They find all of your design flaws and unintended side effects – and turn them into a feature. The whole process is messy and often in the “life, uh… finds a way” manner – what makers make may look very different from the technologists’ intended range of use.

Makers write crappy code and wire things on breadboards. They rarely care about the future enterprise strength of their project. They explore ideas non-linearly, random-walking. They scrap half-finished projects and repurposing them into the new ones. All this contributes to the seeming disarray of the maker scene. A good sign of a maker project are fix-forward collaboration practices. 

Makers are here to discover something new, to bravely explore. They crave being first to uncover some way to make technology do a thing that nobody else had seen before. Makers become increasingly more disinterested with a particular technology as it matures and becomes polished. Polish and reliability mean that the tech has become mainstream – and thus, less likely to yield a “holy crap!” moment.

Being a maker means being in constant search of that moment. When the thing finally works and goes viral on Twitter, and investors come knocking – it’s a maker’s dream come true. Often, it’s also the end of a maker’s journey. Once the new big thing is found, makers shift to become businesspeople. The fun hobby project grows into a rapidly growing team around the newfound thing. Not all makers choose that path. After all, the thrill of exploration does get replaced by the mundane concerns of running the business. However, those who do, they wield power to reshape and create industries.

… At least, when the conditions are right.

🧫 Conditions for makers’ rise

When a novel technological capability moves forth to acquire an interface, and a broader audience begins to interact with it, there’s a question that hangs in the air: “What is this thing actually good for?” 

This is the value question. Whoever answers this question first gains a temporary advantage: until everyone else also figured it out, they can seize the opportunity to acquire this value.

Makers arrive at the scene right about then. They start poking at the interface and make things with it. The amount of power makers will have at this point depends on whether or not they can answer the value question sooner than anyone else.

What are the properties of the technological capability (and the environment it is introduced into) that put makers in the driver’s seat?

I’ve been thinking about this a bit, and I keep coming back to these three: 🔒 access to technology, 🏞️ openness of space, and 🚀 iteration velocity.

All these properties interact with each other, so they aren’t exactly orthogonal.

🔓Access to technology is both the property of new technological capability and its environment. It is probably best measured in the number of makers who could practically start using the capability.

It is the property of the capability because technologists who introduce the interface can choose to make it more or less accessible. It is also the property of the environment, because other, adjacent technological advances may make the capability more accessible than before.

A good example of the capability becoming more accessible due to shifts in environment is the introduction of widely available high-speed Internet. Without any discernible change in how the Web worked (no change in capability itself), the increase in bandwidth created new opportunities for makers to spur what is known as “Web 2.0”.

🏞️ The openness of space is reflected by the number of new opportunities created by the introduction of the technology. A good marker of a wide-open space is that a typical market-sizing exercise keeps collapsing into a fractal mess, creating more questions than answers. It’s not just one thing that suddenly becomes possible, but a whole bunch of things – and there’s this feeling that we have only scratched the surface.

Open spaces favor makers, because they require random-walk and large quantities of participants to facilitate the ergodic exploration of the space. Well-established players tend to be subject to their embodied strategies, leaving them unaware of vast portions of the space, and thus highly vulnerable to your usual innovator’s dilemma and counter-positioning.

🚀 Finally, the iteration velocity is what gives makers the edge. Makers rule tight feedback loops. The shorter the lead times, the more likely makers will show up in the leaderboards. If something can be put together quickly, count the makers to stumble into a thing that actually works. Conversely, if the new technology requires lengthy supply chains and manufacturing processes, makers would play at best supporting roles.

Iteration velocity is also influenced by the level of stakes in the game. The higher are the stakes, the less velocity we’ll see. For example, we are unlikely to see makers playing with passenger airplanes or power grids. Those are the areas where the lead times are necessarily long, and no matter how exciting the innovation, makers won’t play a pivotal role in those spaces.

📈 Makers rising

There is no doubt in my mind that we’re experiencing another moment of makers rising to prominence. The galaxy-sized spaces opened by generative AI, its accessibility, and the velocity with which one could put together an impressive prototype – all are pointing at the notion that perhaps the next big thing will come from a tinkerer’s garage. My intuition is that we’re in an historic moment that we haven’t seen since the birth of the Internet – or perhaps even larger than that.

We are living in the age of makers rising. I am not sure how long it will last. After all, the prominence of makers is relatively short-lived, just a moment in the large story of the technological advance. But oh boy, is this moment significant.

The gifts of failure

I may have written about this before. It’s a trick that I’ve learned intuitively over the years, and I’ve found it tremendously helpful. It tends to work best in situations where the outcome is uncertain.

When adventuring forth into the unknown, we usually focus on the thing that we want to achieve. Even if the chances of it happening are slim, we set our eyes on the prize and commit to persevere through whatever challenges are thrown at us.

It is just before this point that I usually pause and spend a bit of time contemplating the gifts of failure. The question I ask myself is simple: “What good can come out of failing in this venture?” There are usually plenty of platitudes about learning from failure and all that wonderful stuff that is talked about incessantly in books about innovation. This is not what I am looking for.

When I ask that question, I am looking for tangible benefits that might arise as a result of the failure. Once we’ve reached the threshold where we definitely know we’ve been defeated – by circumstances, a worthy foe, or our own limitations – what are we left with?

Put differently, when all we have is to pick up the broken pieces of our failed enterprise, how quickly can we repurpose these pieces to start anew?

If we don’t contemplate this question earnestly, the answer is usually “not much”. In such cases, the collapse is dramatic and total. Very little of the effort put into the project is recoverable. If anything good comes of it, it’s sheer luck.

The stepping stones tactic is one way through which we can maximize this luck. In Greatness Cannot Be Planned, the authors argued that small increments just outside of the adjacent possible – the stepping stones! – are what ultimately feeds innovation and big discoveries.

By reflecting on the gifts of failure, we can sketch our path as a series of stepping stones that we create for ourselves. What are the stepping stones will we need to produce anyway as we walk toward our vision? Which ones can take on the life of their own, and serve us even after our initial project has failed? Through this lens, the product of our endeavor will no longer look like a monolithic thing. Instead, it will be a modular construct that can flexibly rearrange itself into something completely different, becoming highly resilient.

As another tactic more useful for smaller projects, I sometimes maintain a small portfolio of secondary benefits that could be promoted as primary. These benefits must be somewhat independent of each other, so that if the initial attempts prove fruitless, there’s another path forward that opens up by refocusing on another potentiality.

Both of these tactics arise from reflecting on the original question. When we accept that in highly uncertain, volatile environments, any exploration is much more likely to fail than to succeed, we stop seeing the target of our adventure as the mythical “end all be all”, and focus on accumulating opportunities and potentialities along the way.

Gravity and water

I have been lucky enough to be part of the process of shaping several organizations, yet only uncovered this nugget of insight very recently. Very briefly, it helps to discern the forces of the broader environment that influence us without our awareness.

It is very rare that the leaders of a newly formed organization want it to behave exactly the same as the surrounding environment. If that is the case, we have a distinction without a difference, a label rather than an organization. Teams and organizations are created to do something (even if slightly) different from the rest of the environment. At the same time,  they are still in that environment and are influenced by the forces of this environment.

If I want to create a team that innovates daringly while situated within an organization that is designed for keeping an important chunk of infrastructure running. I can speak all I want about inspiring creativity and fearless questing for ideas to my team – but if I don’t account for the fact that most processes around me are designed and incentives are structured for a “keeping” mission, none of that may matter. Implacably, my attempts to encourage innovation will come out shaped like infrastructure improvement projects. It will feel like an invisible force being exerted on the newly born team,  veering it off our intended course.

More often than not – especially in larger organizations – such forces are invisible until we start deviating from their direction. These forces are meant to organize, to bring various people’s intentions into alignment in service of the mission. Only when we try to do something different do we start experiencing friction.

Two questions might help you start making these more visible.

The first question is “what is our gravity?” This will take a little bit of self-reflection. We rarely ask ourselves why our feet stick magically to the ground. Similarly, organizations have gravity-like forces that everyone is used to. They tend to hide in structural and cultural bits of the org. If you’d like, use my fourplex of constraints to investigate. Which ones of these do we need to counteract when deviating from the default organization configuration? OKRs are probably my favorite example here. No matter how much we try to overcome it, the language of OKRs inexorably drives us to short-term thinking. If we try to build an organization that thinks in a longer timeframe than the rest of its OKR-driven environment, we will have a difficult time placing ourselves within it. “Wait, these people don’t have OKRs?  Do they even do anything?”

The second question is “what is our water?”, echoing the parable presented by David Foster Wallace. What are the things that we take for granted to such a degree that we don’t even notice them anymore? How do these things influence how we think and act? Where are they out of alignment with our intentions? If our team has a strong engineering culture, we will naturally have biases that will filter out less “engineer-ey” ideas – and that might be alright, unless these ideas are essential for making progress toward our intended mission.

The answers to these two questions are nearly always tentative and wrong, at least initially. It is important to keep examining them as the time goes on.

A while back, one of my colleagues was reflecting on their first year on a team at which we both worked at the time: “At first, I thought the hardest part will be to get creativity and inspiration going. Surely, the problem was that we weren’t thinking cleverly enough. Oh, boy was I wrong. It’s exactly the opposite! Everyone here is incredibly smart and is bursting with creative energy. Now I believe that where we struggle the most is in applying all that energy in a way that leads to shared learning that persists.” My colleague’s guess was even more refined and nuanced  the next year.

Understanding what is stopping us from where we want to go is key to getting there. If we don’t stop to examine the forces that hold us in place, the best we can hope for is an external shock that jolts us out of our trap.

Models and structured reasoning

One of my colleagues quipped recently that instruction tuning might not be about teaching large language models (LLMs) what to do, but rather about reminding LLMs of the wealth of wisdom they already have at their foundation. Here’s a rather unkempt vignette on what wisdom I would like an LLM to recall.

I would like to help a model remember how to solve problems that are a bit more complex, and require some rigor in reasoning. We are well underway in our first generation of LLM reasoning explorations, and I am looking a bit behind the current horizon.

Here is my hypothesis: an LLM that is capable of reasoning about a broad variety of problems would be biased to structure its reasoning in a certain way. That is, when we ask it to “think step by step”, it will rely on some predefined structure of these steps while performing a text completion.

If this hypothesis is true, reasoning structures will be at the core of dramatically improving the reasoning ability of the models. Put differently, I anticipate that the next cycle of model evolution will be about exploring the space of reasoning structures and finding more and more capable local maximas of such structures. 

The future of instruction tuning might not be about the quantity or even quality of the data, but rather about the rigor of the overarching reasoning structure in the data sets. 

To make my “reasoning structure” concept a bit more clear, I tried to conjure one up relying on the problem understanding framework that I’ve created a while back. This is just one rough sketch, but it should give you some idea of what the heck I am talking about.

🎱 Guessing the problem class

To quickly recap, I posit in my framework that there are three classes of problems: solved, solvable, and unsolvable.

LLMs are already able to address solved problems with moderate success. The implicit code execution shows a straightforward way to approach them: once the problem is deemed as something that was already solved before, we ask an LLM to write the minimal bit of code to produce the solution. Crunching numbers, reversing words, making charts, and various similar kinds of problems fall in this class.

LLMs are making decent progress on tackling solvable problems. Solvable problems have no well-established solution, but we know what the solution looks like – also known as “the ground truth”. The presence of this ground truth is what enables reasoning boxes, and their iterative approach to finding the solution.

The yet untapped potential of LLMs is in making sense of unsolvable problems. Unsolvable problems are highly adaptive, and often appear to be solving us rather than the other way around. Things like relationships, organizations, politics, and ecosystems all fall into this class of problems – and usually, we don’t end up solving them. Instead, we navigate through them as best we can. A good way to picture an unsolvable problem is thinking of an open ocean. We can’t solve the high seas, but we can navigate them.

The key to navigating an unsolvable problem is in investing a bit of time into sensing the various forces inside of the problem that make it impossible to solve once and for all. I call this process “framing the problem”. A successful framing produces one or more solvable problems that when solved, help us navigate the unsolvable problem.

Let’s use this simple taxonomy as the foundation of a reasoning structure.


The structure will likely begin with making a guess at the class of the problem: are we looking at a solved, solvable, or unsolvable problem? I experimented a bit with asking LLMs about Cynefin, and was pleasantly surprised by the quality of these guesses. Perhaps this is the foundational wisdom of the model that my colleague was talking about? 

Based on the outcome of the initial guess, the structure will guide the reasoning to branch and engage with the problem differently based on its class.

🔨 Handling solved problems

For solved problems, an LLM would attempt to just answer it, or write a bit of code as I mentioned above – just as most question-answering LLM-based applications do today. Like so:

Question: What is the square root of 38495?
Response: The problem of determining a square root of a number is a solved problem. 

Here is a program that returns a square root of 38495:
 The answer is the output of the program.

For this problem class, this is not as much as solving the problem as retrieving the solution from the depth of the model’s wisdom.

🧩 Actually solving problems

If the problem is solvable, the reasoning structure will assume an iterative stance with two interlinked lines of reasoning.

The first line of reasoning tries to answer the question of “Has the problem been solved?”. This is the actual process of solving the problem. 

The second line of reasoning tries to answer the question “Do I know everything I need to know to verify that the problem has been solved?”. This is the process of establishing the problem context, which determines the constraints that could be used to verify that the solution is correct. This context serves as the ground truth for the problem.

Establishing problem context is a problem in itself, and the reasoning structure will branch further into various techniques that could be effective. For example, the LLM might decide to write a suite of tests that, when run, will prove or disprove that the answer is correct. It could also opt to ask the user to provide feedback on the answer and learn from that feedback. 

It might even choose an “inner monologue” technique, where the ground truth is present only in a socratic sense, with the LLM constantly questioning its own assumptions and making progress by spotting inconsistencies in them. As long as the assumptions are visible, there is an opportunity to question and correct them.

Whatever the choice, the reasoning structure will need to presume that both solving and context-finding processes are iterative. The LLM will no longer attempt to solve the problem in one completion, but rather focus on making progress toward the solution in multiple iterations and prefer frequent contact with the ground truth. 

This stance is different from the typical “question-answer” one of the solved problem class. I’ve been able to get the current generation of models to do so with mixed results, and it feels like an important capability to develop, perhaps through instruction tuning.

The model will also need to be capable of keeping track of where in the reasoning structure it is. Given that solving a problem might take multiple completions, there needs to be something that helps the model orient in relation to the overall progress of solving the problem.

Question: Write a to-do list app
Response: Writing a to-do list app is a solvable problem, because it is very clear when the app is built and whether it allows viewing the to-do list, as well as adding and removing items from the list. 

What is the information am I missing to solve this problem? 

I need to ask the following questions: 
- Other than showing a to-do list, adding, and removing items from the to-do list, what other features are important to you? 
- Should this app support multiple lists or is just one list sufficient?
- On what device will you use this app?

Then, a few interactions later:

Question: Write a to-do list app
Context: This is a solvable problem, and we are currently in the process of determining requirements. Previous interactions established the following requirements: 
- app will support only one to-do list
- app will run on an iPhone
- app will allow viewing one to-do list
- it should be possible to add and remove items on the list
- it should be possible to sort items on the list. 
Response: <completion continues here>

Today, this tracking is primarily done by recording the history of the problem-solving interactions, like a chat transcript. Most chat models will accept the history of the previous messages and are tuned to treat it as guidance for the text completion. I can’t wait to see the new, more effective methods of retaining problem context be discovered.

For instance, a more sophisticated reasoning structure might guide a model to detect that the problem could be split into multiple sub-problems that could be solved more easily, in parallel or in sequence. In such cases, the problem context would map into multiple sub-problems and eventually reduce back to one, once the sub-problems are solved. For example:

Question: Write an app that allows me to create a to-do list
Response: Writing a to-do list app is a solvable problem, because it can be tested if the app:
- successfully starts
- allows viewing the to-do list
- supports adding and removing items from the list.

To simplify the problem, I will break it up into two sub-problems:
- the backend that provides the to-do list API for storing and retrieving the list
- the frontend that consumes this API.

The context for the backend sub-problem will be:
- write a backend for a to-do list app
<detailed backend spec goes here>
- write an API for the backend that supports three methods: retrieve the list, add a new item, and remove an existing item.
<detailed spec of the API goes here>

The context for the frontend sub-problem will be:
- write a Web app frontend for a to-do list app
- the fronted will support viewing to-do list on mobile devices, and specifically on Safari for iPhone 
<detailed frontend spec goes here>
- the frontend will integrate with the following API
<detailed spec of the API goes here> 

🌀 Framing unsolvable problems

When the LLM guesses the problem as unsolvable, the reasoning structure guides the process toward discerning the tangle of forces that creates unsolvability. This part of the process is less about finding solutions and more about building a reasonable dynamic model of the problem.

I am not yet sure how such a model could be represented efficiently as text, but it will likely start with a list of statements that form into a causal graph. For example, here’s how a model may respond to a complex question:

Question: Should I stay in college or start my own AI company?
Response: making a decision on whether to stay in college or start an AI company appears to be an unsolvable problem. There does not seem to be one right answer and it is hard to check the correctness of the answer.

To better understand the forces at play,  I will ask the following clarifying questions:
- What is difficult about making this decision?
- Why does it have to be an either-or decision?
- What holds you back from starting a company?
- What is so attractive about starting a company?
- Why is leaving college necessary?
- Why is staying in college important for you?

After a few rounds of questions and answers, the graph will begin to take shape, the LLM might identify a few leverage points: solvable problems within the larger unsolvable problem. 

Question: Should I stay in college or start my own AI company?
Context: This is a complex problem and we are currently in the process of building a model of the problem. Here are the edges of the model graph:
- AI is very exciting, and there are a lot of cool opportunities 
- If I wait, I will miss these opportunities
- To graduate successfully, I need to wait
- To build a company, I need a lot of time
- To build successful foundation for my life, I need to excel at school
- To excel at school, I need a lot of time
- Time is an extremely limited resource
<graph edges continue>

Response: Based on the graph I have so far, I believe that time might be a leverage point in this problem. I will recommend framing the original problem as that of optimizing for the most effective use of time and creating space for both college and company in the schedule.

Suggested problem statement: Optimize my schedule to enable both staying in college and starting my own AI company.

Once the framing emerges, the reasoning structure will follow the recipe for the solvable problem that popped out. While the original problem remains unsolved, there are now concrete steps that the user can take to navigate it.

The dynamic model that is created by the process of navigating an unsolvable problem is a valuable artifact. In fact, it probably needs to be retained. Depending on the success and failure of the framing that was produced, it will need to be updated and refined.

🩹 Detecting problem classification mistakes

Because the first step of the reasoning structure is just a guess, the structure will need to provide guidance to detect the mistake and shift the problem into a different class.

Some of these mistakes aren’t as bad as the others. For example, mistaking a solved problem as solvable or a solvable problem as unsolvable will both result in inefficient use of resources, but produce a satisfying answer.

There is another category of mistakes where the solution is produced, but isn’t valid, like when a solvable problem was diagnosed as solved, and the attempt to solve the problem results in the wrong answer. I wonder if some sort of “inner critic” reasoning might be necessary even for solved problems to mitigate that.

The mistake that is a bit more troublesome is when the unsolvable problem was guessed to be solvable. It will look like inability to pin down constraints, creating a sense of divergence, rather than convergence toward the objective. Detecting this might be challenging, because the reasoning will continue finding new solvable sub-problems, and appear to make the infinite-regress sort of progress. Here, the reasoning structure will need some sort of a trip wire: a way to ask the model to lift its pencil and say “buddy, you’re spiraling out of control, this might be an unsolvable problem”.

😎 Mission impossible

Part of this essay was an exercise of trying to write a sketch of a reasoning structure in its entirety. Now, you and I could look at it and wonder whether or not something like a reasoning structure might even be feasible to imbue into a model.

It definitely feels a bit daunting. However, if the concept seems intriguing to you, please give it a try. Maybe hack together something very simple to start small, and expand from there. Who knows, maybe through this inquiry, the full depth of wisdom of a large language model will be revealed to us.