The engine and the car

The whole large language model space is brand new, and there are lots of folks trying to make sense of it. If you’re one of those folks, here’s an analogy that might come handy.

Any gasoline-powered car has an engine. This engine is typically something we refer to as a “V8” or “an inline 4” or sometimes even a “Wankel Rotary Engine”. Engines are super-cool. There are many engine geeks out there – so many that they warrant a video game written for them.

However, engines aren’t cars. Cars are much more than their engines. Though engines are definitely at the heart of every engine, cars have many additional systems around them: fuel, electrical, steering, etc. Not to mention safety features to protect the passengers and the driver, and a whole set of comforts that we enjoy in a modern car. Pressing a button to roll down a window is not something that is done by the engine, but it’s definitely part of the whole car experience.

When we talk about this generation of AI systems, we typically talk about large language models (LLMs). In our analogies, LLMs are like engines. They are amazing! They are able to generate text by making inferences from the massive parametric memory accrued through training over a massive corpus of information.

However, they aren’t cars. One of the most common mistakes that I see being made is confusing engines (LLMs) with cars (LLM-based products). This is so common that even people who work on those products sometimes miss the distinction.

When I talk to the users of the PaLM API, I see this confusion show up frequently in this manner: developers want to reproduce results from the LLM-based products like Bard or ChatGPT . When they try to get the same results from the API, they are disappointed that they don’t match. Factuality is lacking, API can’t go to the internet and fetch an article, etc. 

In doing so, they confuse the engine with the car: the API, which offers access to the model, is not the same as the products built with it. With an LLM API, we have a big-block V8. To make it go down the road, we still need to build the car around it.

 To build on this analogy, we live in the early age of cars: the engines still figure prominently in the appearance and daily experience of a vehicle. We still have to turn the crank to start the car, oil the engine frequently, and be savvy enough to fix minor problems that will definitely arise.

As our cars become more refined, the engines get relegated into a well-insulated compartment. Users of cars rarely see them or operate on them directly.

This is already happening with LLM-based products. Very few current offerings that you might encounter in public use are LLMs that are directly exposed to the user.

So, when you use a chat-based system, please be aware that this is a car, not the engine. It’s a tangle of various AI patterns that are carefully orchestrated to work as one coherent product. There is likely a reasoning pattern at the front, which relies on an LLM to understand the question and find the right tool to answer it. There is likely a growing collection of such tools – each an AI pattern in itself. There are likely some bits for making sure the results are factual, grounded in sources, and safe.

As the LLM products become more refined, the actual value niches for LLMs become more and more recognizable. Instead of thinking of one large LLM that does everything, we might be seeing specialization: LLMs that are purpose-designed for reasoning, narration, classification, code completion, etc. Each might not be super-interesting in itself, but make a lot of sense in the overall car of an LLM-based product.

Perhaps unsurprisingly, the next generation of cars might not even have the same kind of engine. While the window control buttons and the steering systems remain the same, the lofty gasoline engines are being replaced with electric motors that fit into a fraction of space. The car experience remains more or less the same (aside from the annoying/exhilarating engine noise), but the source of locomotion changes entirely.

It is possible that something like this will happen with LLMs and LLM-based products as well. The new open space that was created by LLMs will be reshaped – perhaps multiple times! – as we discover how the actual products are used. 

Steady winds, doldrums, and hurricanes

It just so happened that this year, many of my friends and colleagues ended up looking for new opportunities, and in our conversations, I ended up shaping this metaphor. As most metaphors, it’s not perfect, but hopefully, will stir some new insights for you.

We kept trying to describe the energy within organizations and the animating forces that move them. These forces can make our lives inside these organizations a delight – or a complete and utter misery. It seemed like a good idea to understand how these forces might influence us and find ways to sense these forces early. Preferably, even before committing to join a new team.

The idea of presenting these forces as winds seemed rather generative. If we look at the innovation S-curve, we can spot three different kinds: steady, doldrums, and hurricanes. They don’t exactly match the stages I outlined back in the original article. Instead, these winds follow the angle of the S-curve slope.

⛵The steady winds

Steady winds are consistent. We can feel them going in one direction and they change infrequently. Apparently sailors love them, because they provide a predictable way to navigate. Even if it’s not a tailwind, a steady wind can be harnessed through tacking.

Similarly, organizations that are in the upslope of their development tend to have a relatively consistent animating force that feels like a steady wind. Usually, there’s some big idea, some intention, and a group of highly-motivated individuals who set the direction of this wind.

We can feel it as soon as we step into an organization. It usually appears as the ambition of the  charismatic leader/founder, who knows exactly what they want and is doing everything they can to make it possible. More rarely, it might also appear as a set of ideals that depict some future state of the world – and this team has the fire (and funding) to bring it forth.

Steady winds aren’t always great. Sometimes, a steady wind’s direction is simply incompatible with where we want to go. It might trigger aversion in us, or be in discord with our own principles. The leader might be charismatic, yet have character quirks we deem appalling. The big idea might indeed be big, but no matter how much we try to suspend disbelief, we keep finding it laughable.

At the same time, steady winds bring clarity. They give us a very good idea of what this team is about and where they are going. These folks are going someplace. It’s on us to choose to go there with them.

When considering a new team and sensing a steady wind that moves it, ask yourself: is this wind aligned with what I myself want to do? Does it stir fire in my belly? At the very least, can I tack into this wind in a way that moves me where I want to go? And of course: am I at the place where I want to go on an adventure?

Because joining steady-wind teams definitely brings adventure. It might be glorious and awesome, or it might be like the Donner party, with all the fixin’s of freezing to death, scurvy, and/or dysentery. Only time will tell.

If the wind is favorable and adventure is what you seek, such a team might be a good fit.

⛳ The doldrums

Prior to the invention of motors, doldrums were a terrifying thing for sailors. Doldrums meant that to go anywhere, we have to break out our oars and turn our own sweat into motion. There is no wind to help us go anywhere.

Organizations tend to experience doldrums at the top of the S-curve. Once the niche is fully explored and the product or service is optimized to fit it exactly, it is really not clear where to go next. All successful products end up experiencing this. We can see this as fewer interesting changes in them, and a deluge of incremental improvements that may sound exciting, but don’t actually add up to anything like the stuff the organizations used to produce at the upslope.

To get anything done in this organization requires some form of consensus. There are usually processes. Approvals. Reviews. Endless, exhausting discussions. When in doldrums, there’s a prevailing sense of powerlessness, often accompanied by a weird combination of comfort and toil. Everything is hard, but at least it’s exactly the same as yesterday.

Leaders who used to produce the steady wind at the upslope typically leave when they encounter the doldrums. We won/lost. Why stay? Instead, they are replaced by sailors. These leaders concentrate more on preserving what was accumulated so far. Risk is frowned upon. 

It’s not like nothing gets done in organizations stuck in doldrums. There’s always activity, and an appearance of movement. To create this appearance, there’s a syndrome of chronic bigness: every new initiative is bigger than the previous one, ever more bombastically described and painted in brighter colors. Underneath is the same dull surface of still water.

Doldrums aren’t necessarily a red flag for joining. If what you’re looking for is the steady stillness of boring, yet never-ending work, that might just be the place. Large bureaucracies like government agencies and corporate giants have large organizational swaths that live in the doldrums – and necessarily so. Not everything needs to be an adventure. Sometimes, the slow and steady beat of the oars is the only thing that keeps the grand ship inching forward.

However, if you’re seeking something to fill your sails, please keep walking. Committing to a doldrums team will suck the soul out of you and is not worth it.

🌀 The hurricane

The final part of our story is hurricanes. Sailors caught in storms just hang on to their life, trying to survive and keep the ship afloat.

Similarly, organizations find themselves in turbulent waters. This typically happens on the downslope of the innovation S-curve, when the quiet ride through the doldrums is eventually replaced by contact with reality.

In the hurricane, there’s lots of wind. It’s blowing in all directions. To continue our metaphor, the wind is the animating force that is usually created by organization’s leaders and their intentions. In the hurricane, this intention is chaotic and unpredictable. And it’s usually reactive,  spurred by some external threat.

The downslope of the S-curve isn’t fun. The collective anxiety of leaders who got used to the doldrums creates a vicious cycle, exacerbating the situation further. The overall direction is unclear, but not for the lack of effort. There’s lots of movement, and lots of force, all going in circles.

On very, very rare occasions, a new leader emerges and manages to set the steady wind, bringing the team out of chaos. I have seen it happen, but haven’t experienced it myself. 

Unless you’re a total glutton for punishment or have a severe savior complex itch, it is difficult to recommend joining an organization in the hurricane. The trouble is, it’s often hard to tell. It is in nobody’s interest to reveal the true state of disorder to the candidates. So the hurricane-embattled team might appear as either doldrums or steady winds, depending on who you ask.

One of my colleagues recommended this approach: find someone on the inside. Someone who might still be there or left recently. Ask them candidly: “is this a sh*t show?” Watch their reaction and prod a bit. Look for stories that sound like aimless grasping for straws and high anxiety among the team’s leaders. Those are the telltale signs of the hurricane.

Diving into unpredictability

My previous essay on the topic of unpredictability generated a few insightful comments from my colleagues and friends. One of them led to this vignette.

It is very tempting to imagine that some people are just generally less susceptible to the discomfort of unpredictability than others. It might even feel like coming up with a way to gauge one’s ability to thrive in unpredictable situations would be a useful tool.

My intuition is that this stance needs a bit more nuance. As humans, we all abhor unpredictability. We rarely actually “thrive” in it, at least over the long run. The metaphor that comes to mind is diving.

Some people are great divers. They can spend a significant amount of time under water. They can go deep and explore the parts of the seabed inaccessible to anyone  else. At the same time, nobody would claim that great divers can actually live in the depths of the sea. We all need to come up for air.

In this metaphor, unpredictability is water. If we stay in it for too long, we drown. I see the desire for predictability – or homeostasis – as a gravity-like force that animates all of us. It isn’t something we can completely detach from – though stoics and buddhists try. Just like air that we need to breathe, predictability is something that is essential for nourishing our minds. Our minds are predictive systems. Unpredictability is anti-mind.

Great divers – those who can endure unpredictability better than others – are those who invest generously into techniques and strategies that enable them to stay in the deep longer and even enjoy it. However, prolonged exposure to it will still take the toll, and the need to come up for air will always win over.

Diving into predictability is hard work. Just like with any good diver, if they are making it look effortless, we can bet that a lot of effort was put in before. And just like with any good diver, the “true pirates” who appear as thriving in unpredictability are nearly always those with the decades of practice, with all the blood, sweat, tears, and scars such a practice entails. One of the foundational elements of this practice is finding a way back to the fresh air of a predictable environment.

Sometimes you gotta hit the wall

I’ve probably written about this a few years back, but I still find this mantra useful and worth repeating. It applies to the situations where we’re stuck but we don’t know that we’re stuck – not yet.

When we’re in this state, we have a sense that we’re still moving forward, and we’re making all the right moves. We get upset when our friends or colleagues cautiously share with us that we might be spinning our wheels. Yeah, there’s some loss of traction, but if we just keep going, we will figure this thing out. Just one more push.

Particularly for technologists and other admirers of modernist thinking, the likelihood of becoming stuck in this way somewhere along our careers is pretty high. The idea that if we know what we’re doing and we’re doing everything right, then things should work out according to our plans – it’s just so damn seductive.

We can last quite a bit of time in this purgatory of delusion. There are just so many options to choose from. It’s the environment around us that is all wrong. Someone is actively conspiring against us. There are some indicators that show clearly that we’re still moving forth as planned. The more clever and quick-thinking we are, the more likely we are to come up with a story that keeps us stuck.

Inevitably, there’s a moment when it all comes apart. We finally hit the wall. We’re in shock,  feeling injured by the cruel reality and betrayed by it. But – it is only when we hit that wall do we get the chance for self-reflection. There’s an opportunity, when the shell of self-delusion is cracked, to actually gain some clarity. We might remember our colleagues’ gentle hints and worried faces, the early signs of stuckness we’ve chosen to ignore, and the now-obviously illustory stories we’ve told ourselves.

Should we experience it, this moment is a significant milestone. It allows us to create a little space between reality and the stories we tell ourselves. It allows us to hold our stories as objects instead of being subject to them. Experienced once, it’s a perspective that can be temporarily lost, but never fully forgotten. Next time the allure of modernism tempts us, we might still feel the pull – but think twice about answering the call. Once we’ve hit that wall, we’ve learned that “knowing what we’re doing” and “doing everything right” are just stories we tell ourselves, and they have little to nothing to do with reality.

The somewhat sad part is that this lesson can not be taught. No amount of explanation or teaching will bring one closer to the precious insight without the painful experiential part. This particular bit of wisdom can only be gained by face planting into the unyielding, uncaring reality at full speed. Sometimes you just gotta hit the wall.

Pace layers of predictability

I’ve been talking about pace layering a lot, within the dandelions and elephants story and in other pieces. Here’s yet another way to apply this framing.

One thing that stood out to me was this notion of a common force that animates their formation. Here’s my guess: this force is the desire for predictability.

For example, in organizations, there is usually a larger group — often a dominant majority — of folks who seek a fairly predictable environment. They usually feel rather uncomfortable about the uncertainty of a problem space, and you can hear this discomfort in phrases like “just tell me what to do” or “we need clear timelines and deliverables”. There is nothing wrong with this stance. I find myself in this group at times, especially when I am in fallback. Another way to think of it is that we find ourselves here when our predictability footprint expands. This group tends to seek “one true shared mental model” and usually reacts with pain to disconfirming evidence, forming the slowest-moving predictability pace layer.

To manage that pain, there’s a much smaller group of those who are able to face unpredictability a bit more. These folks have a slightly higher tolerance for ambiguity and can experience polarities not as a series of harrowing swings back and forth, but rather as one nuanced system. They can play with multiple conflicting mental models and see disconfirming evidence as input, rather than pain inflicted. This ability is not without cost, and usually requires continuous support and scaffolding.

This smaller group forms a much thinner and faster-moving predictability pace layer on top of the slower layer. When an organization employs such folks effectively, they become instrumental in gently evolving the “one true shared mental model” of the larger group in a direction that is long-term productive for this organization. This is the stance that I myself enjoy the most and feel that resonant sense of aligning with Purpose.

Sometimes a team is lucky enough to have true pirates: a handful of people whose predictability footprint is so small that they are able to go where even the most ambiguity-tolerant people dare not touch. Disconfirming evidence is necessary sustenance to them. These folks can examine the load-bearing beliefs in ways that would terrify or at least profoundly upset most members of the organization. They can do deep ontological dives and come back unfazed, with insights that have the power to shape the future of the organization.

When employed effectively, these are the peeps that establish foundational frameworks within which the gentle evolution of the organization occurs. This is the fastest-moving predictability layer where I aspire to be, even though most of the time, I pop up to that layer only momentarily.

Of course, this is not some sort of normative layout of layers that every organization must possess. My guess is that each organization has their own layer configurations. 

What’s important about them is that they are typically mostly separate and insular from each other – and for good reason. Exposing folks at the lowest layer to the more intense pace of the higher layer will rapidly trigger allergic response. And giving them a sense of the top layer’s pace will seem like Lovecraftian horror. Boundaries tend to arise to prevent such shenanigans.

What’s most curious is that these pacing layers rarely respect hierarchies and organizational structures. There could be leaders of large teams who crave predictability. There could be random junior team members who are surprisingly great at diving into the abyss of uncertainty and bringing back useful insights. The ability to tolerate unpredictability changes with one’s circumstances, and hierarchies tend to strive for permanence.

As a result, the insulation between layers tends to develop in haphazard and unproductive ways. Within predictability-craving organizations, those who are comfortable with uncertainty are deemed “troublemakers” and shunned or ignored. Conversely, folks who desire more predictability are labeled as “unimaginative” in places where experimentation and exploration are valued. Instead of recognizing mutual significance and collaborating, teams at different predictability pace layers resent each other’s differences.

In practice, this means that harnessing the full potential of this pace layering and using it to the advantage of the organization is very uncommon. I keep wondering what a team that is designed around predictability pace layers – rather than in spite of them – would even look like.

Given that overall levels of unpredictability around us seem to be ever-increasing, this might be an important challenge to take on. Perhaps you, my readers, will find an opportunity here.

AI Patterns and Breadboard

In my last post, I kept talking about AI patterns, but kept it a bit vague. I thought it might be useful to share a couple of examples to describe what I mean by “AI patterns” a bit more clearly. Once again, put your technical hats on.

🗺️ How to read AI pattern diagrams

As part of practicing the “build a thing to build the thing” principle, we implemented quite a few of AI patterns in Breadboard. I will use the diagrams we generate from the boards (thank you Mermaid.js!) to illustrate the patterns. Here’s a quick guide on how to read the diagrams – and as a happy coincidence, a brief overview of Breadboard concepts.

 🔵 The inquisitively blue parallelogram represents the “input” node. This is where the user’s input is requested by the pattern. Because most patterns ask for input first, it’s a good place to start when tracing the flow of the graph.

🟢 The cheerfully green hexagon is the “output” node, which provides the output to the user of the pattern. For many patterns, that’s the end point, the journey’s destination, while for a few – just a brief stopover.

🟡 The curiously yellow boxes are all nodes that do interesting stuff. For example, “generateText” node invokes the LLM, while “promptTemplate” combines a template and various bits of text into a prompt that’s suitable for the LLM. Most of the time, you can guess what the function does by looking at its name.

🔴 The protectively red box with rounded corners is the “secrets” node, which has access to the user’s sensitive data. For most (all?) patterns, it is used to retrieve and pass the API Key to unlock the ability to invoke the large language model.

🍷 The variously-shaped wine-colored boxes are utility nodes: they are mostly here to serve other nodes by supplying important data and making it possible for graphs to be composable and useful. We’ll be mostly ignoring them here – but I will very likely be back to sing their song in the future.

Most nodes will have a two-line label. The first line is the type of the node and a second is its unique identifier. Because there can be multiple nodes of the same type, we need an id to distinguish between them.

 Just like with the literal breadboards,  nodes are connected with wires. Wires are represented by lines with arrows. The direction of the arrow on each wire represents the flow of information. So, when the graph shows this:

… it means that the information flows from the “promptTemplate” node to the “generateText” node.

Each wire is labeled. All labels have the same consistent “out->in” format. A good way to think of it is that every node may have multiple inputs and outputs. The wires connect these inputs and outputs.

In the example above, the output named “prompt” of the “promptTemplate” node is wired to the input named “text” of the “generateText” node. Most of the time, it’s not difficult to infer the purpose of the wire. Like, the wire above flows the prompt produced by the “promptTemplate” node as input of the “generateText” node. If you are curious about all the ins and outs of nodes (pun intended!), check out this guide on Github.

Some wires will have a circle at the end of them, rather than an arrow. These are constant wires. There’s a lot more to them, but for now, a good way to think of them is that they are here to specify constant values. Like in the diagram below, the “template” utility node supplies a constant “template” input to the “promptTemplate” node. 

With this quick Breadboard refresher out of the way, we’re ready to dig into the actual patterns. To keep this post from becoming a book, I’ll give you only three examples.

 🧮 The Calculator pattern

Let’s start with the widely used Calculator pattern (you can also see it here in on Github, implemented in Breadboard):

The structure of this pattern is very simple: user input goes into the “promptTemplate” node, which produces a prompt that goes into the “generateText” node, the output of which is fed to “runJavascript” node, and the result is returned as output.

As it often happens with AI patterns, the magic is in the contents of the prompt template. In this pattern, the LLM is used to find solutions to mathematical problems in a very clever way. 

As you may have heard, LLMs aren’t so great at math. So instead of approaching the problem head-on, we lean onto LLM’s strength: we convert a math problem into a language problem.

In the Calculator pattern, we ask the LLM to do what it does best: generate text. We ask it to write code that solves a math problem, rather than try to find the answer to the question. Here’s a prompt to do that:

Translate the math problem below into a JavaScript function named `compute` that can be executed to provide the answer to the problem.
Math Problem: {{question}}
Solution:

Because writing code is a language problem, LLMs are pretty good at it. So, with a high degree of consistency, the output of the LLM will be a function that, when run, produces the right answer. Leave computation to the old-style computers. Let LLMs write code that will be computed.

For instance, when we replace the {{question}} placeholder with:

What is the square root
of the perimeter of a circle w
ith a diameter of 5?

The LLM will happily produce this function:

function compute() {
  const diameter = 5;
  const radius = diameter / 2;
  const perimeter = 2 * Math.PI * radius;
  return Math.sqrt(perimeter);
}

Which, when executed, will give us the correct answer of `3.963327297606011`. If you ask any conversational agent today a math question and it surprises you with an accurate answer, chances are that some variant of the Calculator pattern is being employed.

📰 The Summarizer pattern

Another common pattern builds on the LLM’s strength of narrating information, even when presented with bits of random content. I experimented with this ability early this year, and here’s an implementation of the pattern in Breadboard (also here on Github):

When we look at the structure above, we can see that user input splits into two paths.

The first route is circuitous. It takes us through the “urlTemplate” node that creates a valid URL (it’s a Google News RSS feed with the topic as the query), which is then fed to the “fetch” node. The “fetch” node grabs the contents of this URL, and sends it to the “xmlToJson” and “jsonata” nodes that munge RSS into a list of headlines.

The second and the first route meet up at the “promptTemplate” node, where they predictable move to the “generateText” node and, finally, the result is presented to the user.

The concept is fairly straightforward: give the LLM a topic and a few sentences, and request a summary. If – as is the case in the graph above – we are summarizing news headlines, a prompt will look something like this:

Use the news headlines below to write a few sentences to summarize the latest news on this topic:
##Topic:{{topic}}
## Headlines{{headlines}}
## Summary:

In this prompt, we have two placeholders: the {{topic}}, which is where the subject of summarization will go, and the {{headlines}}, where we will plug in the various headlines from a news source (Google News).

The key distinction between just asking an LLM a question and using this pattern is that we’re not relying on LLM’s parametric memory to contain the answer. We’re not asking it to find the answer for us. Instead, we are only employing its narrative-making abilities, supplying the raw information in the prompt.

So, if I for example put “breadboards” into the {{topic}} placeholder, and the following list of headlines from Google News (just the first first 20 for this particular board) into the {{headlines}} placeholder:

Thermochromic Treatment Keeps Solderless Breadboards Smokeless - Hackaday
Jumper Wires For Electronic Components - IndiaTimes
10 hostess hacks to make your food look better than it is - Colorado Springs Gazette
Gabriel's Cyberdeck Red V2 Packs in a LattePanda Delta 3, Analog Discovery 2, HackRF One, and More - Hackster.io
How to Measure Voltage on a Breadboard - MUO - MakeUseOf
The Ultimate Breadboard Platform? - Hackster.io
Building Circuits Flexibly - Hackaday
Lewiston Art Festival: A 'dinosaur' of woodwork - Niagara Gazette
Podcast 220: Transparent Ice, Fake Aliens, And Bendy Breadboards ... - Hackaday
Flexboard: a flexible breadboard for flexible and rapid prototyping of ... - Tech Explorist
Derek Fogt | Communities | pinecountynews.com - pinecitymn.com
MARNI JAMESON: Compensate for humdrum food with stylish ... - Sarasota Herald-Tribune
Build HMI screens with MicroLayout for your Meadow Apps - Hackster.io
Tidy Breadboard Uses Banana Bread - Hackaday
Old 6809 Computer Lives Again On Breadboards - Hackaday
My Favorite Things: Hardware Hacking and Reverse Engineering - Security Boulevard
Luna in Cocoa Beach offers serves thoughtful modern Italian food - Florida Today
Teaching Method Increases Students' Interest in Programming and ... - University of Arkansas Newswire
From A 6502 Breadboard Computer To Lode Runner And Beyond - Hackaday
How to Breadboard Electronics Projects with Raspberry Pi Pico - Tom's Hardware

… we will get this output from an LLM:

The latest news on breadboards include a new thermochromic treatment
that keeps solderless breadboards smokeless, a flexible breadboard for
flexible and rapid prototyping, and a new method for teaching students
programming and electronics.

For the quality of the junk we fed it, it ain’t half bad!

The Summarizer pattern has a much more popular cousin named Retrieval-augmented Generation (RAG). RAG is all the rage these days, and everyone wants to have one. If we peek under the covers, we’ll recognize the Summarizer pattern combined with another neat LLM capability of semantic embeddings into the Voltron of patterns.

🔁 The ReAct pattern

I would be remiss not to bring up ReAct when talking about AI patterns. This pattern ushered the new mini-era of LLM applications, a breakthrough that redefined what LLMs can do.

The ReAct pattern is different from the ones mentioned earlier, because it is cyclical: rather than asking an LLM once, it may do so several times, repeating until the problem is solved.

ReAct introduces this really interesting idea that we can induce chain-of-thought reasoning capabilities in LLMs if we structure our interaction with them in a certain way. In this chain of thought, the LLM interacts with the outside environment, suggesting actions to take and then reason about the outcomes of these actions.

I’ve talked about LLM-based reasoning a few times before, so this concept shouldn’t be entirely novel to my readers.

In ReAct, the key trick is in establishing a predictable beat of reasoning within the prompt:

1️⃣ First comes the Question – the question that the user asks
2️⃣ Then, comes the Thought – the opportunity for an LLM to reason about what to do next
3️⃣ After Thought is Action – LLM’s suggested action to take
4️⃣ Finally, the Observation – the outcome of the action, supplied by the tool 

Steps 2️⃣,  3️⃣,  and 4️⃣ keep repeating until the answer is found. 

The LLM is only allowed to pipe in on steps 2️⃣ and 3️⃣: that is, it can only produce the “Thought” and “Action” parts of the overall sequence.

Step 1️⃣ is provided by the user, and the observation in step 4️⃣ is supplied as the outcome of whatever action the LLM suggested to take.

As the steps repeat, all of these steps are being added to the overall prompt, allowing the LLM to see the history of the interaction and reason about it. In this way, and unlike in the Calculator and Summarizer patterns, the ReAct pattern simulates memory: with each invocation, the LLM can see how it acted in the past.

It’s a pretty convoluted contraption, as you can see it from the Breadboard implementation (here on Github):

However, the presence of memory and a repeating cadence of steps guides LLMs to allow it to answer much more complex and interesting questions. The answers are no longer produced in one go. Instead, LLMs walk toward the answer in multiple iterations.

Let’s see if we can make it a bit more concrete by breaking down an actual transcript of a ReAct pattern solving a problem.

First, we supply the question:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?

In response, the LLM the produces the following Thought and Action steps:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon

The LLM is asking to search for the distance between the Earth and the Moon. Notice that it is not trying to make up the answer. It is merely suggesting the next action to take. Taking control over from the LLM, the ReAct pattern invokes search and produces the following Observation:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km

With the observation in hand, the ReAct pattern hands control back to the LLM, and get this response:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km
2️⃣ Thought: I need to square root this distance
3️⃣ Action: math
   Action Input: sqrt(384400)

Great! Now the LLM wants us to do math. Fine. The ReAct machinery rattles and burrs, and produces the next Observation:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km
2️⃣ Thought: I need to square root this distance
3️⃣ Action: math
   Action Input: sqrt(384400)
4️⃣ Observation: 620

The ReAct pattern yields control back to the LLM, and it triumphantly declares victory:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km
2️⃣ Thought: I need to square root this distance
3️⃣ Action: math
   Action Input: sqrt(384400)
4️⃣ Observation: 620
2️⃣ Thought: I now know the final answer
   Final Answer: 620

Great job, model. You did it.

The ReAct pattern also introduces, almost as an afterthought, the concept of tools. Since the LLM is asked to suggest an action, it seems useful to specify the kinds of tools the LLM has at its disposal.

In the transcript above, the “search” and “math” tools were used. For other kinds of problems, there might be a need for other kinds of tools. 

This is where the most valuable aspect of the ReAct pattern resides: if we can specify our own tools, we can make LLMs do useful things. For example, I could hand it a “calendar” tool, an “email” tool, and a list of my friends and ask it to schedule a lunch for us. Or I could turn it into a menu-ordering system, where it would rely on menu-understanding tools to take customer orders.

The pattern stays the same,  but the tools change. With the ReAct pattern, we can build actual helpful agents. If you’ve been watching the LLM space, you have no doubt noticed a lot of activity around this notion.

🍞Patterns with Breadboard

These are just a few examples of interesting patterns that emerged in the generative AI space in the last few months. Honestly, the pace of pattern discovery has been nuts. So far, I see no signs of it slowing down. What I’d like to do with Breadboard is to help make these patterns more legible to everyone – so that more people can play with them, create their own, and explore this fascinating space together.

My intuition is that when we lower the barrier to entry to this process of discovery and make it easy to tell whether the new pattern is good or not, we have a better chance of exploring the space more thoroughly and realizing the full potential of large language models.

Composing graphs with Breadboard

This post is more technical and more rambling than my usual repertoire. It’s a deep dive into where my mind is today, and it’s mostly about technical design. Think of it as a sample of interesting problems that I am puzzling over these days.

Let’s start with graphs. In the past, I’ve mentioned this interesting tension between graphs and trees: how our human tendency to organize things into tree-like structures (hierarchies and other container models) is at odds with the fluid, interconnected nature of the world around us. I framed it as: “every graph wants to become a tree, and still secretly wants to remain a graph”, and described in the context of developer frameworks.

So naturally, when presented with an opportunity to write a developer library, I decided to use the graph structure as its core. The result is Breadboard. It’s still early on. I like to say that we currently have a toddler, and there’s much left to do to get the library to adulthood.  However, it seems useful to start sharing what I’ve learned so far and design decisions I landed on. 

🐱Whyyyy

If you even occasionally scan these chronicles of my learnings, you will undoubtedly know that I am fascinated by the potential of applying large language models (LLMs) as a technology and the kinds of new interesting spaces they could open.

As a result, I invested quite a bit of time tinkering with the models, trying to make them jump through various hoops and do things – just to get a sense of what it is that they are truly capable of. Judging from the endless updates I hear from my friends and colleagues, so is everyone else.

New interesting patterns of applying LLMs seem to arise almost daily – and that’s pretty exciting. What was less exciting for me was the distinct lack of tools that help us discover these patterns. Most of the frameworks that are rising to relative prominence appear to focus on capturing the newly discovered patterns and making them more accessible. This is great! But what about a framework that facilitates tinkering with LLMs to find new patterns?

I started Breadboard with two objectives in mind:

  1. Make creating new generative AI patterns accessible and fun
  2. Enable easy sharing, remixing, composition, and reuse of these patterns.

My hope is that Breadboard helps accelerate the pace with which new interesting and useful patterns for applying generative AI are found. Because honestly, it feels like we have barely scratched the surface. It would be super-sad if the current local maxima of chatbots would be as far as we get with this cycle of AI innovation.

🍞The metaphor

The first thing I wanted to get right was the mental model with which a developer might approach the library. 

Graphs are typically hard to describe. I am in awe with whomever came up with the term “Web” to describe the entire tangle of the hyperlinked documents. Kudos.

As you may remember, I am also very interested in the role makers play in the generative AI space. That’s how breadboards came to mind: the solderless construction bases for prototyping electronic circuits. Breadboards are a perfect maker’s tool. They are easy to put together and easy to take apart, to change the layout and tinker with various parts.

Lucky for me, breadboards are also graphs: the electronic circuits they carry are directed graphs, where each electronic component is a node in the graph and the jump wires that connect them are edges. By placing different nodes on the board and wiring them in various ways, we get different kinds of prototypes.

This is exactly what I was looking for: a one-word name for the library that comes with the mental model for what it does. As an additional benefit, “breadboard” selects for makers: if you know and love breadboards (or even just the idea of breadboards), you will likely look forward to playing with this library.

🧩 The composition system

Another piece of the puzzle was composition. Over the last decade, I ended up studying composition and designing composable systems quite extensively. In Breadboard, I wanted to lay down a sound foundation for composition.

There are three different ways to compose things in Breadboard: 🧩 nodes, 🍱 kits, and 🎛️ boards.

🧩 Nodes are the most obvious unit of composition: we can place nodes on a breadboard and wire them together. At this layer of composition, makers compose nodes to make our prototypes. Once they have a neat prototype, makers can share the board that contains the prototype. A shared board is something that anyone can pull down from a URL and start playing with it. They can clone it, tweak it, and share it again.

To get the node composition right, we need a set of nodes that allow us to build something useful and interesting. While still at an early stage, it is my intent to arrive at a starter kit of sorts: a relatively small set of general-purpose nodes that enable making all kinds of cool things.

🍱 We don’t have to stop with just one kit. Kits are another unit of composition. Makers are able to create and group interesting nodes into kits – and publish them to share with others. For instance, a project or a company might want to wrap their interesting services as nodes and publish them as a kit, allowing any maker to grab those nodes and start using them in their boards.

A maker can also just build a kit for themselves, and use it in their own prototyping only. While kits do not need to be published, boards that use unpublished kits can’t be shared with others – or at least shared in any useful way.

🎛️ Boards themselves are also units of composition. Makers can include boards of others into their board, turning an included board into a sort of virtual node. The board inclusion feature is similar to a hyperlink: just like on the Web, including a board simply links from one board to another, rather than subsuming it. Such loose coupling unlocks the full potential for interdependent collaboration, and I fully expect the common dependency management practices to be applicable.

In addition to inclusion, boards can have slots. Slots are another way to add modularity to boards. When I build a board, I can leave it incomplete by specifying one or more places – “slots” – where someone else can include their boards. This is a useful trick that software developers call “dependency injection”. For instance, if I developed a generic pattern to invoke various tools with generative AI,  I can leave a slot for these tools. When other makers reuse my board, they can insert their own sets of tools into this slot without having to modify my board.

🤖 The traversal machine

It took me a little bit of time to settle on what is a “node” in Breadboard and how these nodes get traversed in the graph. I ended up going with the actor model-inspired design, leaving lots of room to explore concurrency and distributed processing in the future. For the moment however, I am primarily guided by the motivation to make Breadboard graphs easy to understand.

One capability I wanted to enable was building graphs that have cycles within them. Pretty much anything interesting contains feedback loops, and so Breadboard supports directed graphs with cycles out of the box. Calculating topography of such graphs is an NP-complete problem, but lucky for us, traversing them is fairly trivial. After all, most computer programs are directed graphs with cycles.

At the core of the traversal machine is this concept: a well-behaving node is a pure function. More precisely, as close to a pure function as we can get. Since makers can create their own nodes, Breadboard can’t guarantee any of that, but I’d like to encourage it.

Since pure functions don’t contain state, state needs to be managed outside of the function. Breadboard relies on wires as the method to manage state. Wires are the way both data and control flow.

This sets up the basics of the traversal logic:

  • Every node has inputs and outputs. The inputs are the wires running into the node, and outputs are wires running out. A node consumes inputs and provides outputs.
  • The node – or technically the function that the node represents – is only run when all inputs have been provided to it by the nodes that ran before this node. Put differently, a node will not run if some of its inputs weren’t provided.
  • Graph traversal starts with running nodes that don’t have any inputs wired into them.

That’s about it.

To geek out on this a bit more, I went with a mailbox-like setup where wires are effectively one-time variables that store data. The data in this variable is written by a node output and read by a node input. A super-cool effect of such a setup is that the state of the graph is captured entirely in the wires, which means that Breadboard can pause and resume traversal of the graph by saving what’s currently stored in the wires.

🚧 What is next

Looking ahead, I am quite optimistic about Breadboard. We already have a small seedling of a team developing. In the next few weeks, I’ll keep making interesting patterns with it to keep informing the development of the library. Build a thing to build the thing, right?

Once the fundamentals settle a bit, we can start thinking about graduating Breadboard into an early adulthood, releasing the v1. Hopefully, at this point, we will have enough onramp for you and other makers to start actively using it in your prototyping adventures.

If you feel excited about this idea and don’t want to wait until then, please check out the list of open issues on Github and join the conversation. Be prepared to eat unbaked cookies and occasionally find bits of construction debris in them – and help make Breadboard better.

The ODA and OODA stances

After writing a bit about principles and tensions, I thought I’d switch tracks and go with something slightly different. Philosophy meets strategy in this story about how we see ourselves in relation to the world around us.
I’d like to start by making a distinction between two different ways in which we place ourselves in our environment. The first one we will call the ODA stance and we’ll name the second the OODA stance. I’ll explain the names shortly.

📈 The ODA stance

The ODA stance  is fairly common and broadly applied. In this stance, we see ourselves situated in our environment and able to perceive our environment directly. When we act on our environment, we observe changes, make decisions, and act again. This cycle – let’s call it the ODA loop (Observe, Decide, Act) – is our way to navigate the environment.

In many situations, the ODA loop surprisingly well. We observe customer behavior, we decide on the change we need to make to our product to adapt to this behavior, and then we make that change. Because we can see the environment around us (the Observe part), we can make plans, create OKRs, make Gantt charts, and proceed forth with confidence.

When in the ODA stance, it is all about making the right decisions. To make sure our actions yield the outcomes that we desire, our decisions have to be correct. If our organization’s narratives rotate around effectiveness of decision-making, it’s very likely that we’re in the ODA stance.

Because of that, organizations in the ODA stance are usually program-heavy. They trend toward rails and cranks. Rails are processes and practices that ensure that only correct decisions are made. Cranks are tools that to make sure that following the right processes and practices is as easy as turning the crank. When I am in the ODA stance, I have this firm belief that if I put down enough rails and create ergonomic enough cranks, I can solve any problem at scale.

The ODA stance will start to feel insufficient and missing ingredients when we get blindsided by an unanticipated change. When our plans and projections get swept away by a disruptive innovation or a newly discovered challenge, we begin to struggle.

Something feels off. It often feels like someone or something is deceiving us. Organizations tend to react to this feeling in various ways. Sometimes, the guilty parties are identified and dismissed. Sometimes, the external entities are deemed antagonistic and troops are rallied to defend the fort. Very often, the disruption is dismissed as a blip, an outlier that’s not worth paying attention to. In most cases, the whole ordeal is forgotten. Until it happens again.

Technological innovation really messes with the ODA stance. Every new change brings chaos, the uncomfortable time when things stop making sense. The best thing we can do, it seems, is to hang on for our dear lives and wait out the turbulence. Unfortunately, the frequency of storms only seems to increase.

Unsurprisingly, the ODA stance organizations experiencing the storms intuitively lean into even more rails and cranks. They strive to create better decision-making processes, document practices more precisely, and enforce more stringently that they are followed.

🌀The OODA stance

After getting bruised by the turbulence a few too many times, a new stance tends to emerge. Speaking from experience, we rarely grasp the idea and adopt this stance immediately. It takes actual experiential scars to arrive at it.

The OODA stance discards the idea of perceiving the environment directly. Instead, we insert this concept of a mental model between our observations and our decisions. This mental model is what, back in the ODA stance, we mistook for the environment.

In this stance, observations stream into our model, and continuously update this model, causing us to orient – to change our understanding of our environment. When we see an observation, we orient, then make decisions, then act. Helpfully, a fellow named John Boyd already coined the term for this cycle: the OODA loop.

The addition of an extra “O” to the ODA loop from the previous stance implies a more humble posture toward the environment. We do not know what the environment looks like. We may never have a full grasp of it. We only have observations as clues. We try to cobble these observations as best we can into a model of the environment, and then try to guess what the environment will do next by using our model to make these predictions.

It’s a productive kind of humility. Instead of beginner’s humility, where we are too timid to make predictions because we are alarmed by our own ignorance, with productive humility we admit that the environment is likely too complex for us to grok it fully, yet we have tools to navigate this complexity.

Organizations operating in the OODA stance focus on bettering their mental models. They recognize that the key to making good decisions stems from their ability to maintain mental models that result in good predictions.
This recognition usually leads to three types of investments:

1️⃣ Sensing. The OODA stance craves observation. Productive humility dictates that external sensing is paramount. We put in a lot of effort to ensure that information from outside of the organization (and the inside!) flows as freely as as abundantly as possible. Sensing is not a responsibility limited to a small group of folks with the right job titles, but a daily practice for everyone.

Organizations with the OODA stance encourage the culture of reading and writing, of seeking out novel trends. There are spaces for robust conversations about these trends and tools to plug into the flow easily. 

2️⃣ Flexibility. When we’re in the OODA stance, we want to ensure that we’re able to orient flexibly. No matter how meticulously maintained and useful so far, mental models are only based on past observations. Any new disconfirming evidence must be treated as a signal that the model might need an update. 

Organizations in the OODA stance have a healthy fear of seeing it all wrong. They build tripwires and warning lights into their processes, and treat challenges to established mental models as gifts, rather than annoyances. Coming from this organization’s leader, “Prove me wrong” is not a threat, but a plea.

In contrast with the ODA-stance organizations, it’s not the speed of the decision-making, but rather the agility of adjusting course that is valued and developed.

3️⃣ Shared mental model space. Organizations that favor the OODA stance nurture shared mental models. A funny thing: when we adopt the OODA stance, we recognize that everyone has their own mental model that they use to make predictions. Lived experiences, culture and upbringing, all contribute to a massive kaleidoscope of mental models of individuals across the organization. When looking at the same environment, we are seeing slightly different things, often without knowing it.

Contemplating this notion can feel overwhelming. The whole post-modernist movement might have risen out of that. To get past the “is there really a real reality?” line of inquiry, we lean into productive humility. It is clear that a collection of individuals of greater diversity of perspectives will likely have parts of a larger mental model amongst them. Everyone has a piece of the puzzle.

The trick is to assemble this puzzle. In the OODA stance, we expressly cultivate spaces where people can share their experiences, and have confidence that their perspective will be listened to and incorporated into the bigger picture. Collectively, we learn how to overcome the discomfort of others seeing something entirely different from what is plainly obvious to us.

☯️ Compare and contrast

Both stances have their utility. The OODA stance takes a lot more self-discipline and accumulated experience – the scars! – to acquire and hold. Because of that, the ODA stance is the default choice for most organizations.

Particularly in environments that are very familiar, don’t change very much, or change in monotone ways (like seasons), the ODA stance can work great. A good way to think of it is that when we don’t need to Orient ourselves a lot within an environment, we can simply omit the middle “O” from the OODA loop.

The ODA stance brings industrial scale. I might even argue that one can’t have industrial scale without leaning toward the ODA stance. We can only contemplate large-scale endeavors only when our model of environment is so sound and well-embedded across the organization that we don’t even think about it. To make a well-functioning bureaucracy, one needs a well-established system of governance.

On the other hand, in novel environments and environments that change rapidly, where the existing mental models keep failing to predict useful outcomes, the OODA stance becomes necessary. The ODA stance is simply blind to all the novelty, experiencing it as utter chaos. This is when it becomes necessary to decouple our mental models from the environment – and embrace the weird and wild ride of the OODA loop.

One miracle at a time

Technological innovation can be viewed as a network of miracles: breakthroughs that make possible something that wasn’t impossible before. Some miracles are major and others are minor.

It is the job of a technology strategist to identify miracles that will happen or will need to happen to make forward progress on the idea they have in mind. Miracles are fickle. Their appearance is not guaranteed. Something that looks like it’s just within reach can continue to stay that way for decades. Betting on a miracle is never safe – otherwise, they won’t be called miracles.

It is in this context that the tension between two forces arises.

On one side of this tension, there’s a desire to increase the likelihood of making forward progress. It usually shows up as wanting to pick the safer, more predictable next steps, and reduce the impact of the miracle not happening.

On the other side, there’s the wish to move forward faster. This is where we see the impatience and the urge to dream big, and reach for the stars – to go all in on bringing the miracle forth.

Both extremes lead to unproductive outcomes. Making safe bets means not being different from the rest of contenders, and being different is the name of the game in technological innovation. At the same time, the unhealthy disregard for the impossible is a great recipe for losing everything and gaining nothing.

To resolve this tension, I’ve learned to apply the “one miracle at a time” principle. The gist of this principle is that we pick one challenging part of the idea we’d like to realize and find ways to de-risk everything else. We minimize the number of miracles necessary for success of the one we care about.

This doesn’t mean we give up on the larger vision. A good way to think about it as finding the first stepping stone to jump to.  Sometimes it’s helpful to imagine lining up all the miracles into a series of steppings stones. Which one will we start with?

Applying this principle to practice might look like this.

1️⃣ Start with the inventory of miracles. Have a rigorous miracle count conversation. What’s loosely easy and what’s difficult? How hard are the difficult bits? Which ones feel like miracles?

2️⃣ With the miracles rounded up, find the ones that are important. There will always be the miracles that feel more like dependencies compared to those at the core of the vision. Consider taking those miracles off the table. Apply this rule of thumb: the stronger your conviction around the vision, the less you should care about the dependency miracles.

3️⃣ Pick one. Ensure that everyone has the same one miracle in mind.  Very often, especially in teams that are running ahead of the pack, we will find a large diversity of what peeps are passionate about. This diversity is great for ideation, but when we’re choosing the miracle, it can be counter-productive.

A common trend, especially among organizational cultures biased toward being agreeable,  is that there’s a whole portfolio of pet miracles being brought along with the chosen one: everyone says “okay, let’s focus on that one” and thinks “… and mine, too”. 

Choosing the next miracle is a convergent process. It requires difficult and honest conversations. In my experience, this is the hardest step.

Don’t be surprised if the chosen miracle is a minor dependency. This likely means that the vision is vague and needs further refinement – and few ways are better to do so than trying to start at the edges of the problem.

4️⃣ Once the miracle is picked, work to reduce uncertainty everywhere else. Formulate ideas in a way that are complementary to what is currently possible. Replace all those dependency miracles with withered tech – something that’s well-established and known to be reliable. Align with projects and priorities that are already on everyone’s mind.

There will be worries and well-argued points about how choosing today’s technologies to build tomorrow’s is fraught with peril. They will be very tempting. I’ve gotten snared by them many times. I also found that the most productive way is to embrace the paradox. Accept these arguments as true – and yet, recognize that each new miracle cuts down the odds of our work mattering when tomorrow finally arrives.

With miracles, there’s always a coin toss somewhere. Some things we thought were certain and simple prove otherwise. And things that we decided weren’t worth paying attention boomerang at our backs. By picking one miracle at a time, we can reduce the total surface of uncertainty and be principled about where we focus our attention.

Build a thing to build the thing

When building new products, there’s always a weird tension between making something “real” and production-ready, and spending a bit of time just experimenting with all of the various ideas of what this “real” might look like.

This tension is less noticeable when we actually know what we want to build. If I am building a new Web rendering engine, I literally have all the specs – and a few well-established implementations for reference. However, when the actual product is a bit of an unknown, the tension starts to surge.

There are typically two forces that create this tension. First, therse’s the desire to ship expeditiously and engage the customers. This might come from the intention to maximize our chances to get it right, but also could just be a business necessity.

Then, there’s another force – the desire to deliver something that truly resonates with the customers. It’s bizarre how the underlying animating intention could be the same “let’s get it right”, but the approach is different: instead of jumping in as soon as possible, we try to first figure out what “right” looks like.

My intuition is that the two varying approaches come from different evaluations of the “chances budget”: how many chances does this particular idea have before we blew it? Folks who see a large chance budget will veer toward “let’s just ship something and iterate (or survive/get promoted, etc)”. Folks who see only a handful of chances in the budget will tend to “let’s first get our ducks in a row”.

Depending on the organization, there will be a pull toward one extreme or another: and sometimes a soup of both at the same time. There might be peeps jumping to ship whatever remotely looks like a feature and spend marketing dollars on boosting its visibility. There might also be people trying to carefully orchestrate large-scale “spherical cow” ecosystems that can only practically exist in a caffeine-induced slide deck. 

📐 The Principle

In my experience, the trick to resolve this tension is the practice I call “build a thing to build the thing”. It’s a pretty simple trick, please don’t get too excited. The hard part is mostly in knowing how to apply it.

When we decide to “build a thing to build the thing”, we agree to focus first on building something that is immediately adjacent to what we actually want to build. In the developer experience field, this adjacency most commonly looks like this: “Let’s build something useful with our product, and see what we learn from it”.

If we’re building a new library or framework, let’s build something that uses it – so that we learn how to improve our library or framework. Build a thing to build the thing.

“Earnest effort” is an important ingredient. If this is just some half-hearted dabbling to check the box, the trick will not work.

Close to a decade ago, when I was working on the Chrome Web Platform team, we wanted to get a better sense of whether or not the APIs and primitives we’re shipping are actually helping developers make better mobile Web applications.

So a few of us locked ourselves in a room and spent a few weeks actually trying to build a clone of a well-known mobile app as a Web app, powered by the latest and greatest bits that we were shipping or about to ship. Our hypothesis was a negative proof: if we – the people who actually build the platform bits – can’t do it, then nobody can.

We also adopted the “live as our customer” posture and used only the tools that were available outside of Google.

Every week, we wrote up a report of what we learned. Where the friction was, where the seemingly easy tasks turned into albatrosses. Where primitives and APIs that we thought were useful actually weren’t.

We failed gloriously. I remember showing the Web app to our VP and the first load taking hundreds of seconds on a reasonably crappy phone. We tried hard. We rummaged in all the bags of tricks. We profiled JS. We profiled C++. We messed with V8 and Chromium code trying to make it less slow. In one particularly unwise moment, I wrote code to construct DOM with C++. 
At the end of the adventure, we had an incontrovertible proof: if we wanted for mobile Web apps to be on par with their native equivalents, we had to do something different as a team.

This exercise served as a shift for how I and my colleagues thought about what’s important (and not important). It triggered a change in priorities for the larger organization.  I’ll spare the gory details of how it all went down. Suffice to say, a big chunk of the current narrative about Web performance was shaped by the insights we gained from our adventure.

Sometimes, building a thing to build the thing is clarifying like that. Sometimes, it just creates more questions. Whatever the outcome, there will be precious insights, waiting for us to be harvested.

💔 The hard part

The greatest challenge of adhering to the “build a thing to build the thing” principle is in our own ability to be honest with ourselves.

Here are some failure cases that I’ve seen. Use them as tripwires if you decide to apply this principle.

1️⃣ The DevRel trap. During the exercise, the people who are building on top of the product are different from those who are building the product.
For instance, a contractor or a DevRel team is building an app with the library that the engineering team built.  This insidious pattern is so widespread that it’s even considered a best practice. In my experience, it is anything but. It feels so obvious: of course DevRel folks are the best people to do this project!

However, most of the valuable insights will fall into the crack between the DevRel and the engineering team. Glorious failures will not trigger reevaluation of priorities, but rather shrugs and side glances at the DevRel team: “Well, maybe they’re not as good as we thought they were”.

2️⃣ Forgetting the thing. We get so engrossed in the exercise that we forget which thing we’re actually building. This most commonly happens when the main product is vague and ill-defined, and the adjacent thing feels a lot more concrete and graspable.

The chain of events that leads to this failure case usually looks like this:

  • Step 1: We want to build <foo>.
  • Step 2: <foo> is entirely ill-defined. We have no clue how to build it.
  • Step 3: We decide to start with <bar>, which is a thing that could be built with <foo>.
  • Step 4: A few weeks/months later… We’re building <bar>. Nobody remembers <foo>.

There is nothing wrong with a product pivot. Just make sure it’s done intentionally.

3️⃣ The confirmation demo. Like confirmation bias, confirmation demos only show the good parts of your products. These “things built to build things” aren’t made to challenge the assumptions or draw insights. They are carefully crafted potemkin villages whose entire purpose is to avoid novel insights to be drawn. These things will change no minds.

Don’t get me wrong. Demos are important. However, they are in an entirely different universe from applying the  “build a thing to build the thing” principle.

Anytime the ambiguity of the path forward is high, and it is not quite clear what we’re building, it might be worth having at least one “a thing to build the thing” exercise in progress, and that the insights from it are collected diligently to aid with navigating complexity.