Why AI orchestration

Why do I find the problem of AI patterns and more generally, AI orchestration so interesting that I literally started building a framework for it? Why do we even need graphs and chains in this whole AI thing? My colleagues with a traditional software engineering background have been asking me this question a lot lately.

Put very briefly, at the height of the current AI spring that we’re experiencing, orchestration is a crucial tool for getting AI applications to the shipping point.

To elaborate,  imagine that an idea for a software application takes a journey from inception to full realization through these two gates.

First, it needs to pass the “hey… this might just work” gate. Let’s call this gate the “Once” gate, since it’s exactly how many times we need to see our prototype work to get through it.

Then, it needs to pass through the “okay, this works reasonably consistently” gate. We’ll call it the “Mostly” gate to reflect the confidence we have in the prototype’s ability to work. It might be missing some features, lack in polish and underwhelm in performance benchmarks, but it is something we can give to a small group of trusted users to play with and not be completely embarrassed.

Beyond these two gates, there’s some shipping point, where the prototype – now a fully-fledged user experience – passes our bar for shipping quality and we finally release it to our users.

A mistake that many traditional software developers, their managers, and sponsors/investors make is that, when looking at AI-based applications, they presume the typical cadence of passing through these gates.

Let’s first sketch out this traditional software development cadence as a sequence below.

The “Once” gate plays a significant role, since it requires finding and coding up the first realization of the idea. In traditional software development, passing this gate means that there exists a kernel of a shipping product, albeit still in dire need of growing and nurturing.

The trip to the “Mostly” gate represents this process of maturing the prototype. It is typically less about ideation and mostly converging on the robust implementation of the idea. There may be some circuitous detours that await us, but more often than not, it’s about climbing the hill.

In traditional software development, this part of the journey is a matter of technical excellence and resilience. It requires discipline and often requires a certain kind of organizing skill. On more than one occasion, I’ve seen brilliant program managers brought in, who then help the team march toward their target with proper processes, burndown lists, and schedules. We grit our teeth and persevere, and are eventually rewarded with software that passes the shipping bar.

There’s still a lot of work to be done past that gate, like polish and further optimization. This is important work, but I will elide it from this story for brevity.

In AI applications, or at least mine and my friends/colleagues’ experiences with it, this story looks startlingly different. And definitely doesn’t fit into a neat sequential framing.

Passing the “Once” gate is often a matter of an evening project. Our colleagues wake up to a screencast of a thing that shouldn’t be possible, but somehow is. Everyone is thrilled and excited. Their traditional software developer instincts kick in: a joyful “let’s wrap this up and ship it!” is heard through the halls of the office.

Unfortunately, when we try to deviate even a little from the steps in the original screencast, we get perplexing and unsatisfying results. Uh oh.

We try boxing the squishy, weird nature of large language models into the production software constraints. We spend a lot of time playing with prompts, chaining them, tuning models, quantizing, chunking, augmenting – it all starts to feel like alchemy at some point. Spells, chants, and incantations. Maaaybe – maybe – we get to coax a model to do what we want more frequently. 

One of my colleagues calls it the “70% problem” – no matter how much we try, we can’t seem to get past our application producing consistent results  more than 70% of the time. Even by generous software quality standards, that’s not “Mostly”.

Getting to that next gate has little resemblance to the maturation process from traditional software development. Instead, it looks a lot more like the looping over and over back to “Once”, where we rework the original idea entirely and change nearly everything.

When working with AI applications, this capacity to rearrange everything and stay loose about the details of the thing we build, this design flexibility is what dramatically increases our chances of crossing to “Mostly” gate. 

Teams that hinge their success on adhering to the demo they sold to pass through the “Once” gate are much more likely to never see the next gate. Teams that decide that they can just lay down some code and improve iteratively – as traditional software engineering practices would suggest – are the ones who will likely work themselves into a gnarly spaghetti corner. At least today, for many cases – no matter how exciting and tantalizing, the “70% problem” remains an impassable barrier. We are much better off relying on an orchestration framework to give us the space to change our approach and keep experimenting.

This is a temporary state and it is not a novel phenomenon in technological innovation. Every new cycle of innovation goes through this. Every hype cycle eventually leads to the plateau of productivity, where traditional software development rules.

However, we are not at that plateau yet. My intuition is that we’re still climbing the slope toward the peak of inflated expectations. In such an environment, most of us will run into the “70% problem” barrier head-first. So, if you’re planning to build with large language models, be prepared to change everything many times over. Choose a robust orchestration framework to make that possible.

Placing and wiring nodes in Breadboard

This one is also a bit on the more technical side. It’s also reflective of where most of my thinking is these days. If you enjoy geeking out on syntaxes and grammars of opinionated Javascript APIs, this will be a fun adventure – and an invitation.

In this essay, I’ll describe the general approach I took in designing the Breadboard library API and the reasoning behind it. All of this is still in flux, just barely meeting the contact with reality.

One of key things I wanted to accomplish with this project is the ability to express graphs in code. To make this work, I really wanted the syntax to feel light and easy, and take as few characters as possible, while still being easy to grasp. I also wanted for the API to feel playful and not too stuffy.

There are four key beats to the overall story of working with the API:

1️⃣ Creating a board and adding kits to it
2️⃣ Placing nodes on the board
3️⃣ Wiring nodes
4️⃣ Running and debugging the board.

Throughout the development cycle, makers will likely spend most of their time in steps 2️⃣ and 3️⃣, and then lean on step 4️⃣ to make the board act according to their intention. To get there with minimal suffering, it seemed important to ensure that placing nodes and wiring them results in code that is still readable and understandable when running the board and debugging it.

This turned out to be a formidable challenge. Unlike trees, directed graphs – and particularly directed graphs with cycles – aren’t as easy for us humans to comprehend. This appears to be particularly true when graphs are described in the sequential medium of code. 

I myself ended up quickly reaching for a way to visualize the boards I was writing. I suspect that most API consumers will want that, too – at least at the beginning. As I started developing more knack for writing graphs in code, I became less reliant on visualizations.

To represent graphs visually, I chose Mermaid, a diagramming and charting library. The choice was easy, because it’s a library that is built into Github Markdown, enabling easy documentation of graphs. I am sure there are better ways to represent graphs visually, but I followed my own “one miracle at a time” principle and went with a tool that’s already widely available.

🎛️ Placing nodes on the board

The syntax for placing nodes of the board is largely inspired by D3: the act of placement is a function call. As an example, every Board instance has a node called `input`. Placing the `input` node on the board is a matter of calling `input()` function on that instance:

import { Board } from “@google-labs/breadboard;

// create new Board instance
const board = new Board();
// place a node of type `input` on the board.

After this call, the board contains an input node.

You can get a reference to it:

const input = board.input();

And then use that reference elsewhere in your code. You can place multiple inputs on the board:

const input1 = board.input();
const input2 = board.input();

Similarly, when adding a new kit to the board, each kit instance has a set of functions that can be called to place nodes of various types on the board to which the kit was added:

import { Starter } from “@google-labs/llm-starter;

// Add new kit to the existing board
const kit = board.addKit(Starter);

// place the `generateText` node on the board.
// for more information about this node type, see:
// https://github.com/google/labs-prototypes/tree/main/seeds/llm-starter#the-generatetext-node

Hopefully, this approach will be fairly familiar and uncontroversial to folks who use JS libraries in their work. Now, onto the more hairy (wire-ey?) bits.

🧵 Wiring nodes

To wire nodes, I went with a somewhat unconventional approach. I struggled with a few ideas here, and ended up with a syntax that definitely looks weird, at least at first.

Here’s a brief outline of the crux of the problem.  In Breadboard, a  wire connects two nodes. Every node has inputs and outputs. For example, the `generateText` node that calls the PaLM API `generateText` method accepts several input properties, like the API key and the text of the prompt, and produces outputs, like the generated text.

So, to make a connection between two nodes meaningful, we need to somehow capture four parameters:

➡️  The tail, or node from which the wire originates.
⬅️ The head, or the the node toward which the wire is directed.
🗣️ The from property, or the output of the tail node from which the wire connects 
👂 The to property, or the input of the head node to which the wire connects

To make this more concrete, let’s code up a very simple board:

import { Board } from "@google-labs/breadboard";

// create a new board
const board = new Board();
// place input node on the board
const tail = board.input();
// place output node on the board
const head = board.output();

Suppose that next, we would like to connect property named “say” in `tail` to property named “hear” in `head`. To do this,  I went with  the following syntax:

// Wires `tail` node’s output named `say` to `head` node’s output named `hear`.
tail.wire(“say->hear, head);

Note that the actual wire is expressed as a string of text.  This is a bit unorthodox, but it provides a nice symmetry: the code literally looks like the diagram above. First, there’s the outgoing node, then the wire, and finally the incoming node.

This syntax also easily affords fluent interface programming, where I can keep wiring nodes in the same long statement. For example, here’s how the LLM-powered calculator pattern from the post about AI patterns looks like when written with Breadboard library:

math.input({ $id: "math-question" }).wire(
      "Translate the math problem below into a JavaScript function named" +
      "`compute` that can be executed to provide the answer to the" +
      "problem\nMath Problem: {{question}}\nSolution:",
      { $id: "math-function" }
        .generateText({ $id: "math-function-completion" })
            .runJavascript("compute->", { $id: "compute" })
            .wire("result->text", math.output({ $id: "print" }))
        .wire("<-PALM_KEY", kit.secrets(["PALM_KEY"]))

Based on early feedback, there’s barely a middle ground of reactions to this choice of syntax. People either love it and find it super-cute and descriptive (“See?! It literally looks like a graph!”) or they hate it and never want to use it again (“What are all these strings? And why is that arrow pointing backward?!”) Maybe such contrast of opinions is a good thing?

However, aside from differences in taste,  the biggest downside of this approach is that the wire is  expressed as a string: there are plenty of opportunities to make mistakes between these double-quotes. Especially in a strongly-typed land of TypeScript, this feels like a loss of fidelity – a black hole in the otherwise tight system. I have already found myself frustrated by a simple misspelling in the wire string, and it seems like a real problem.

I played briefly with TypeScript template literal types, and even built a prototype that can show syntax errors when the nodes are miswired. However, I keep wondering – maybe there’s an even better way to do that?

So here’s an invitation: if coming up with a well-crafted TypeScript/Javascript API is something that you’re excited about, please come join our little Discord and help us Breadboard folks find an even better way to capture graphs in code. We would love your help and appreciate your wisdom.

The engine and the car

The whole large language model space is brand new, and there are lots of folks trying to make sense of it. If you’re one of those folks, here’s an analogy that might come handy.

Any gasoline-powered car has an engine. This engine is typically something we refer to as a “V8” or “an inline 4” or sometimes even a “Wankel Rotary Engine”. Engines are super-cool. There are many engine geeks out there – so many that they warrant a video game written for them.

However, engines aren’t cars. Cars are much more than their engines. Though engines are definitely at the heart of every engine, cars have many additional systems around them: fuel, electrical, steering, etc. Not to mention safety features to protect the passengers and the driver, and a whole set of comforts that we enjoy in a modern car. Pressing a button to roll down a window is not something that is done by the engine, but it’s definitely part of the whole car experience.

When we talk about this generation of AI systems, we typically talk about large language models (LLMs). In our analogies, LLMs are like engines. They are amazing! They are able to generate text by making inferences from the massive parametric memory accrued through training over a massive corpus of information.

However, they aren’t cars. One of the most common mistakes that I see being made is confusing engines (LLMs) with cars (LLM-based products). This is so common that even people who work on those products sometimes miss the distinction.

When I talk to the users of the PaLM API, I see this confusion show up frequently in this manner: developers want to reproduce results from the LLM-based products like Bard or ChatGPT . When they try to get the same results from the API, they are disappointed that they don’t match. Factuality is lacking, API can’t go to the internet and fetch an article, etc. 

In doing so, they confuse the engine with the car: the API, which offers access to the model, is not the same as the products built with it. With an LLM API, we have a big-block V8. To make it go down the road, we still need to build the car around it.

 To build on this analogy, we live in the early age of cars: the engines still figure prominently in the appearance and daily experience of a vehicle. We still have to turn the crank to start the car, oil the engine frequently, and be savvy enough to fix minor problems that will definitely arise.

As our cars become more refined, the engines get relegated into a well-insulated compartment. Users of cars rarely see them or operate on them directly.

This is already happening with LLM-based products. Very few current offerings that you might encounter in public use are LLMs that are directly exposed to the user.

So, when you use a chat-based system, please be aware that this is a car, not the engine. It’s a tangle of various AI patterns that are carefully orchestrated to work as one coherent product. There is likely a reasoning pattern at the front, which relies on an LLM to understand the question and find the right tool to answer it. There is likely a growing collection of such tools – each an AI pattern in itself. There are likely some bits for making sure the results are factual, grounded in sources, and safe.

As the LLM products become more refined, the actual value niches for LLMs become more and more recognizable. Instead of thinking of one large LLM that does everything, we might be seeing specialization: LLMs that are purpose-designed for reasoning, narration, classification, code completion, etc. Each might not be super-interesting in itself, but make a lot of sense in the overall car of an LLM-based product.

Perhaps unsurprisingly, the next generation of cars might not even have the same kind of engine. While the window control buttons and the steering systems remain the same, the lofty gasoline engines are being replaced with electric motors that fit into a fraction of space. The car experience remains more or less the same (aside from the annoying/exhilarating engine noise), but the source of locomotion changes entirely.

It is possible that something like this will happen with LLMs and LLM-based products as well. The new open space that was created by LLMs will be reshaped – perhaps multiple times! – as we discover how the actual products are used. 

AI Patterns and Breadboard

In my last post, I kept talking about AI patterns, but kept it a bit vague. I thought it might be useful to share a couple of examples to describe what I mean by “AI patterns” a bit more clearly. Once again, put your technical hats on.

🗺️ How to read AI pattern diagrams

As part of practicing the “build a thing to build the thing” principle, we implemented quite a few of AI patterns in Breadboard. I will use the diagrams we generate from the boards (thank you Mermaid.js!) to illustrate the patterns. Here’s a quick guide on how to read the diagrams – and as a happy coincidence, a brief overview of Breadboard concepts.

 🔵 The inquisitively blue parallelogram represents the “input” node. This is where the user’s input is requested by the pattern. Because most patterns ask for input first, it’s a good place to start when tracing the flow of the graph.

🟢 The cheerfully green hexagon is the “output” node, which provides the output to the user of the pattern. For many patterns, that’s the end point, the journey’s destination, while for a few – just a brief stopover.

🟡 The curiously yellow boxes are all nodes that do interesting stuff. For example, “generateText” node invokes the LLM, while “promptTemplate” combines a template and various bits of text into a prompt that’s suitable for the LLM. Most of the time, you can guess what the function does by looking at its name.

🔴 The protectively red box with rounded corners is the “secrets” node, which has access to the user’s sensitive data. For most (all?) patterns, it is used to retrieve and pass the API Key to unlock the ability to invoke the large language model.

🍷 The variously-shaped wine-colored boxes are utility nodes: they are mostly here to serve other nodes by supplying important data and making it possible for graphs to be composable and useful. We’ll be mostly ignoring them here – but I will very likely be back to sing their song in the future.

Most nodes will have a two-line label. The first line is the type of the node and a second is its unique identifier. Because there can be multiple nodes of the same type, we need an id to distinguish between them.

 Just like with the literal breadboards,  nodes are connected with wires. Wires are represented by lines with arrows. The direction of the arrow on each wire represents the flow of information. So, when the graph shows this:

… it means that the information flows from the “promptTemplate” node to the “generateText” node.

Each wire is labeled. All labels have the same consistent “out->in” format. A good way to think of it is that every node may have multiple inputs and outputs. The wires connect these inputs and outputs.

In the example above, the output named “prompt” of the “promptTemplate” node is wired to the input named “text” of the “generateText” node. Most of the time, it’s not difficult to infer the purpose of the wire. Like, the wire above flows the prompt produced by the “promptTemplate” node as input of the “generateText” node. If you are curious about all the ins and outs of nodes (pun intended!), check out this guide on Github.

Some wires will have a circle at the end of them, rather than an arrow. These are constant wires. There’s a lot more to them, but for now, a good way to think of them is that they are here to specify constant values. Like in the diagram below, the “template” utility node supplies a constant “template” input to the “promptTemplate” node. 

With this quick Breadboard refresher out of the way, we’re ready to dig into the actual patterns. To keep this post from becoming a book, I’ll give you only three examples.

 🧮 The Calculator pattern

Let’s start with the widely used Calculator pattern (you can also see it here in on Github, implemented in Breadboard):

The structure of this pattern is very simple: user input goes into the “promptTemplate” node, which produces a prompt that goes into the “generateText” node, the output of which is fed to “runJavascript” node, and the result is returned as output.

As it often happens with AI patterns, the magic is in the contents of the prompt template. In this pattern, the LLM is used to find solutions to mathematical problems in a very clever way. 

As you may have heard, LLMs aren’t so great at math. So instead of approaching the problem head-on, we lean onto LLM’s strength: we convert a math problem into a language problem.

In the Calculator pattern, we ask the LLM to do what it does best: generate text. We ask it to write code that solves a math problem, rather than try to find the answer to the question. Here’s a prompt to do that:

Translate the math problem below into a JavaScript function named `compute` that can be executed to provide the answer to the problem.
Math Problem: {{question}}

Because writing code is a language problem, LLMs are pretty good at it. So, with a high degree of consistency, the output of the LLM will be a function that, when run, produces the right answer. Leave computation to the old-style computers. Let LLMs write code that will be computed.

For instance, when we replace the {{question}} placeholder with:

What is the square root
of the perimeter of a circle w
ith a diameter of 5?

The LLM will happily produce this function:

function compute() {
  const diameter = 5;
  const radius = diameter / 2;
  const perimeter = 2 * Math.PI * radius;
  return Math.sqrt(perimeter);

Which, when executed, will give us the correct answer of `3.963327297606011`. If you ask any conversational agent today a math question and it surprises you with an accurate answer, chances are that some variant of the Calculator pattern is being employed.

📰 The Summarizer pattern

Another common pattern builds on the LLM’s strength of narrating information, even when presented with bits of random content. I experimented with this ability early this year, and here’s an implementation of the pattern in Breadboard (also here on Github):

When we look at the structure above, we can see that user input splits into two paths.

The first route is circuitous. It takes us through the “urlTemplate” node that creates a valid URL (it’s a Google News RSS feed with the topic as the query), which is then fed to the “fetch” node. The “fetch” node grabs the contents of this URL, and sends it to the “xmlToJson” and “jsonata” nodes that munge RSS into a list of headlines.

The second and the first route meet up at the “promptTemplate” node, where they predictable move to the “generateText” node and, finally, the result is presented to the user.

The concept is fairly straightforward: give the LLM a topic and a few sentences, and request a summary. If – as is the case in the graph above – we are summarizing news headlines, a prompt will look something like this:

Use the news headlines below to write a few sentences to summarize the latest news on this topic:
## Headlines{{headlines}}
## Summary:

In this prompt, we have two placeholders: the {{topic}}, which is where the subject of summarization will go, and the {{headlines}}, where we will plug in the various headlines from a news source (Google News).

The key distinction between just asking an LLM a question and using this pattern is that we’re not relying on LLM’s parametric memory to contain the answer. We’re not asking it to find the answer for us. Instead, we are only employing its narrative-making abilities, supplying the raw information in the prompt.

So, if I for example put “breadboards” into the {{topic}} placeholder, and the following list of headlines from Google News (just the first first 20 for this particular board) into the {{headlines}} placeholder:

Thermochromic Treatment Keeps Solderless Breadboards Smokeless - Hackaday
Jumper Wires For Electronic Components - IndiaTimes
10 hostess hacks to make your food look better than it is - Colorado Springs Gazette
Gabriel's Cyberdeck Red V2 Packs in a LattePanda Delta 3, Analog Discovery 2, HackRF One, and More - Hackster.io
How to Measure Voltage on a Breadboard - MUO - MakeUseOf
The Ultimate Breadboard Platform? - Hackster.io
Building Circuits Flexibly - Hackaday
Lewiston Art Festival: A 'dinosaur' of woodwork - Niagara Gazette
Podcast 220: Transparent Ice, Fake Aliens, And Bendy Breadboards ... - Hackaday
Flexboard: a flexible breadboard for flexible and rapid prototyping of ... - Tech Explorist
Derek Fogt | Communities | pinecountynews.com - pinecitymn.com
MARNI JAMESON: Compensate for humdrum food with stylish ... - Sarasota Herald-Tribune
Build HMI screens with MicroLayout for your Meadow Apps - Hackster.io
Tidy Breadboard Uses Banana Bread - Hackaday
Old 6809 Computer Lives Again On Breadboards - Hackaday
My Favorite Things: Hardware Hacking and Reverse Engineering - Security Boulevard
Luna in Cocoa Beach offers serves thoughtful modern Italian food - Florida Today
Teaching Method Increases Students' Interest in Programming and ... - University of Arkansas Newswire
From A 6502 Breadboard Computer To Lode Runner And Beyond - Hackaday
How to Breadboard Electronics Projects with Raspberry Pi Pico - Tom's Hardware

… we will get this output from an LLM:

The latest news on breadboards include a new thermochromic treatment
that keeps solderless breadboards smokeless, a flexible breadboard for
flexible and rapid prototyping, and a new method for teaching students
programming and electronics.

For the quality of the junk we fed it, it ain’t half bad!

The Summarizer pattern has a much more popular cousin named Retrieval-augmented Generation (RAG). RAG is all the rage these days, and everyone wants to have one. If we peek under the covers, we’ll recognize the Summarizer pattern combined with another neat LLM capability of semantic embeddings into the Voltron of patterns.

🔁 The ReAct pattern

I would be remiss not to bring up ReAct when talking about AI patterns. This pattern ushered the new mini-era of LLM applications, a breakthrough that redefined what LLMs can do.

The ReAct pattern is different from the ones mentioned earlier, because it is cyclical: rather than asking an LLM once, it may do so several times, repeating until the problem is solved.

ReAct introduces this really interesting idea that we can induce chain-of-thought reasoning capabilities in LLMs if we structure our interaction with them in a certain way. In this chain of thought, the LLM interacts with the outside environment, suggesting actions to take and then reason about the outcomes of these actions.

I’ve talked about LLM-based reasoning a few times before, so this concept shouldn’t be entirely novel to my readers.

In ReAct, the key trick is in establishing a predictable beat of reasoning within the prompt:

1️⃣ First comes the Question – the question that the user asks
2️⃣ Then, comes the Thought – the opportunity for an LLM to reason about what to do next
3️⃣ After Thought is Action – LLM’s suggested action to take
4️⃣ Finally, the Observation – the outcome of the action, supplied by the tool 

Steps 2️⃣,  3️⃣,  and 4️⃣ keep repeating until the answer is found. 

The LLM is only allowed to pipe in on steps 2️⃣ and 3️⃣: that is, it can only produce the “Thought” and “Action” parts of the overall sequence.

Step 1️⃣ is provided by the user, and the observation in step 4️⃣ is supplied as the outcome of whatever action the LLM suggested to take.

As the steps repeat, all of these steps are being added to the overall prompt, allowing the LLM to see the history of the interaction and reason about it. In this way, and unlike in the Calculator and Summarizer patterns, the ReAct pattern simulates memory: with each invocation, the LLM can see how it acted in the past.

It’s a pretty convoluted contraption, as you can see it from the Breadboard implementation (here on Github):

However, the presence of memory and a repeating cadence of steps guides LLMs to allow it to answer much more complex and interesting questions. The answers are no longer produced in one go. Instead, LLMs walk toward the answer in multiple iterations.

Let’s see if we can make it a bit more concrete by breaking down an actual transcript of a ReAct pattern solving a problem.

First, we supply the question:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?

In response, the LLM the produces the following Thought and Action steps:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon

The LLM is asking to search for the distance between the Earth and the Moon. Notice that it is not trying to make up the answer. It is merely suggesting the next action to take. Taking control over from the LLM, the ReAct pattern invokes search and produces the following Observation:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km

With the observation in hand, the ReAct pattern hands control back to the LLM, and get this response:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km
2️⃣ Thought: I need to square root this distance
3️⃣ Action: math
   Action Input: sqrt(384400)

Great! Now the LLM wants us to do math. Fine. The ReAct machinery rattles and burrs, and produces the next Observation:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km
2️⃣ Thought: I need to square root this distance
3️⃣ Action: math
   Action Input: sqrt(384400)
4️⃣ Observation: 620

The ReAct pattern yields control back to the LLM, and it triumphantly declares victory:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km
2️⃣ Thought: I need to square root this distance
3️⃣ Action: math
   Action Input: sqrt(384400)
4️⃣ Observation: 620
2️⃣ Thought: I now know the final answer
   Final Answer: 620

Great job, model. You did it.

The ReAct pattern also introduces, almost as an afterthought, the concept of tools. Since the LLM is asked to suggest an action, it seems useful to specify the kinds of tools the LLM has at its disposal.

In the transcript above, the “search” and “math” tools were used. For other kinds of problems, there might be a need for other kinds of tools. 

This is where the most valuable aspect of the ReAct pattern resides: if we can specify our own tools, we can make LLMs do useful things. For example, I could hand it a “calendar” tool, an “email” tool, and a list of my friends and ask it to schedule a lunch for us. Or I could turn it into a menu-ordering system, where it would rely on menu-understanding tools to take customer orders.

The pattern stays the same,  but the tools change. With the ReAct pattern, we can build actual helpful agents. If you’ve been watching the LLM space, you have no doubt noticed a lot of activity around this notion.

🍞Patterns with Breadboard

These are just a few examples of interesting patterns that emerged in the generative AI space in the last few months. Honestly, the pace of pattern discovery has been nuts. So far, I see no signs of it slowing down. What I’d like to do with Breadboard is to help make these patterns more legible to everyone – so that more people can play with them, create their own, and explore this fascinating space together.

My intuition is that when we lower the barrier to entry to this process of discovery and make it easy to tell whether the new pattern is good or not, we have a better chance of exploring the space more thoroughly and realizing the full potential of large language models.

Composing graphs with Breadboard

This post is more technical and more rambling than my usual repertoire. It’s a deep dive into where my mind is today, and it’s mostly about technical design. Think of it as a sample of interesting problems that I am puzzling over these days.

Let’s start with graphs. In the past, I’ve mentioned this interesting tension between graphs and trees: how our human tendency to organize things into tree-like structures (hierarchies and other container models) is at odds with the fluid, interconnected nature of the world around us. I framed it as: “every graph wants to become a tree, and still secretly wants to remain a graph”, and described in the context of developer frameworks.

So naturally, when presented with an opportunity to write a developer library, I decided to use the graph structure as its core. The result is Breadboard. It’s still early on. I like to say that we currently have a toddler, and there’s much left to do to get the library to adulthood.  However, it seems useful to start sharing what I’ve learned so far and design decisions I landed on. 


If you even occasionally scan these chronicles of my learnings, you will undoubtedly know that I am fascinated by the potential of applying large language models (LLMs) as a technology and the kinds of new interesting spaces they could open.

As a result, I invested quite a bit of time tinkering with the models, trying to make them jump through various hoops and do things – just to get a sense of what it is that they are truly capable of. Judging from the endless updates I hear from my friends and colleagues, so is everyone else.

New interesting patterns of applying LLMs seem to arise almost daily – and that’s pretty exciting. What was less exciting for me was the distinct lack of tools that help us discover these patterns. Most of the frameworks that are rising to relative prominence appear to focus on capturing the newly discovered patterns and making them more accessible. This is great! But what about a framework that facilitates tinkering with LLMs to find new patterns?

I started Breadboard with two objectives in mind:

  1. Make creating new generative AI patterns accessible and fun
  2. Enable easy sharing, remixing, composition, and reuse of these patterns.

My hope is that Breadboard helps accelerate the pace with which new interesting and useful patterns for applying generative AI are found. Because honestly, it feels like we have barely scratched the surface. It would be super-sad if the current local maxima of chatbots would be as far as we get with this cycle of AI innovation.

🍞The metaphor

The first thing I wanted to get right was the mental model with which a developer might approach the library. 

Graphs are typically hard to describe. I am in awe with whomever came up with the term “Web” to describe the entire tangle of the hyperlinked documents. Kudos.

As you may remember, I am also very interested in the role makers play in the generative AI space. That’s how breadboards came to mind: the solderless construction bases for prototyping electronic circuits. Breadboards are a perfect maker’s tool. They are easy to put together and easy to take apart, to change the layout and tinker with various parts.

Lucky for me, breadboards are also graphs: the electronic circuits they carry are directed graphs, where each electronic component is a node in the graph and the jump wires that connect them are edges. By placing different nodes on the board and wiring them in various ways, we get different kinds of prototypes.

This is exactly what I was looking for: a one-word name for the library that comes with the mental model for what it does. As an additional benefit, “breadboard” selects for makers: if you know and love breadboards (or even just the idea of breadboards), you will likely look forward to playing with this library.

🧩 The composition system

Another piece of the puzzle was composition. Over the last decade, I ended up studying composition and designing composable systems quite extensively. In Breadboard, I wanted to lay down a sound foundation for composition.

There are three different ways to compose things in Breadboard: 🧩 nodes, 🍱 kits, and 🎛️ boards.

🧩 Nodes are the most obvious unit of composition: we can place nodes on a breadboard and wire them together. At this layer of composition, makers compose nodes to make our prototypes. Once they have a neat prototype, makers can share the board that contains the prototype. A shared board is something that anyone can pull down from a URL and start playing with it. They can clone it, tweak it, and share it again.

To get the node composition right, we need a set of nodes that allow us to build something useful and interesting. While still at an early stage, it is my intent to arrive at a starter kit of sorts: a relatively small set of general-purpose nodes that enable making all kinds of cool things.

🍱 We don’t have to stop with just one kit. Kits are another unit of composition. Makers are able to create and group interesting nodes into kits – and publish them to share with others. For instance, a project or a company might want to wrap their interesting services as nodes and publish them as a kit, allowing any maker to grab those nodes and start using them in their boards.

A maker can also just build a kit for themselves, and use it in their own prototyping only. While kits do not need to be published, boards that use unpublished kits can’t be shared with others – or at least shared in any useful way.

🎛️ Boards themselves are also units of composition. Makers can include boards of others into their board, turning an included board into a sort of virtual node. The board inclusion feature is similar to a hyperlink: just like on the Web, including a board simply links from one board to another, rather than subsuming it. Such loose coupling unlocks the full potential for interdependent collaboration, and I fully expect the common dependency management practices to be applicable.

In addition to inclusion, boards can have slots. Slots are another way to add modularity to boards. When I build a board, I can leave it incomplete by specifying one or more places – “slots” – where someone else can include their boards. This is a useful trick that software developers call “dependency injection”. For instance, if I developed a generic pattern to invoke various tools with generative AI,  I can leave a slot for these tools. When other makers reuse my board, they can insert their own sets of tools into this slot without having to modify my board.

🤖 The traversal machine

It took me a little bit of time to settle on what is a “node” in Breadboard and how these nodes get traversed in the graph. I ended up going with the actor model-inspired design, leaving lots of room to explore concurrency and distributed processing in the future. For the moment however, I am primarily guided by the motivation to make Breadboard graphs easy to understand.

One capability I wanted to enable was building graphs that have cycles within them. Pretty much anything interesting contains feedback loops, and so Breadboard supports directed graphs with cycles out of the box. Calculating topography of such graphs is an NP-complete problem, but lucky for us, traversing them is fairly trivial. After all, most computer programs are directed graphs with cycles.

At the core of the traversal machine is this concept: a well-behaving node is a pure function. More precisely, as close to a pure function as we can get. Since makers can create their own nodes, Breadboard can’t guarantee any of that, but I’d like to encourage it.

Since pure functions don’t contain state, state needs to be managed outside of the function. Breadboard relies on wires as the method to manage state. Wires are the way both data and control flow.

This sets up the basics of the traversal logic:

  • Every node has inputs and outputs. The inputs are the wires running into the node, and outputs are wires running out. A node consumes inputs and provides outputs.
  • The node – or technically the function that the node represents – is only run when all inputs have been provided to it by the nodes that ran before this node. Put differently, a node will not run if some of its inputs weren’t provided.
  • Graph traversal starts with running nodes that don’t have any inputs wired into them.

That’s about it.

To geek out on this a bit more, I went with a mailbox-like setup where wires are effectively one-time variables that store data. The data in this variable is written by a node output and read by a node input. A super-cool effect of such a setup is that the state of the graph is captured entirely in the wires, which means that Breadboard can pause and resume traversal of the graph by saving what’s currently stored in the wires.

🚧 What is next

Looking ahead, I am quite optimistic about Breadboard. We already have a small seedling of a team developing. In the next few weeks, I’ll keep making interesting patterns with it to keep informing the development of the library. Build a thing to build the thing, right?

Once the fundamentals settle a bit, we can start thinking about graduating Breadboard into an early adulthood, releasing the v1. Hopefully, at this point, we will have enough onramp for you and other makers to start actively using it in your prototyping adventures.

If you feel excited about this idea and don’t want to wait until then, please check out the list of open issues on Github and join the conversation. Be prepared to eat unbaked cookies and occasionally find bits of construction debris in them – and help make Breadboard better.