The Breadboard developer cycle

Over the past couple of months, we’ve been working on a bunch of new things in the Breadboard project, mostly situated around developer ergonomics. I’ve mentioned creating AI recipes a bunch in the past, and it might be a good time to talk about the mental model of how such recipes are actually created.

❤️‍🔥 Hot reload 

One of the Breadboard project design choices we’re leaning into is the “hot reload” pattern. “Hot reload” is fairly common in the modern developer UX, and we’d like to bring it into the realm of AI recipes. In this pattern, the developer sees not just the code they write, but also results of this code running, nearly instantaneously. This creates a lightning-fast iteration cycle, enabling the developer to quickly explore multiple choices and see which ones look/work better.

Most “hot reload” implementations call for two surfaces that are typically positioned side by side: one for writing the source code, and another for observing the results. In Breadboard today, the first one is typically an editor (I use VSCode) and the second one is our nascent Breadboard debugger.

As the name “hot reload” signifies, when I save the source code file in my editor, the debugger automatically reloads to reflect the recent changes.

The typical workflow I’ve settled into with Breadboard is that I start writing the board and get it to some point where it has enough to do at least something, I save the file and play with the board in the debugger. Playing with it both informs me whether I am on the right path and gives me ideas on what to do next.

I then act on this feedback, either fixing a problem that I am seeing, or progressing forth with one of the ideas that emerged through playing.

For my long-time readers: yes, we’ve baked the OODA loop right into Breadboard.

Overall, it’s a pretty fun experience. I get to see the board rising out of code by iteratively playing with it.

🪲 Debugger

To enable this fun, Paul Lewis has been doing the magician’s work bringing up the debugger. It’s very much a work in progress, though even what’s there now is already useful for board-making.

The main purpose of the debugger is to provide both the visualization of the AI recipe that is being developed and the process of it running. As I put it in chat, I love “seeing it run”: I get to see and understand what is happening during the run — as it happens! – and dig into every bit of detail.

There’s even a timeline that I can scrub through to better help me understand how events unfolded.

One thing that gives Breadboard its powers is that it’s built around a very flexible composition system. This means that my recipes may reach for other recipes during the run – sometimes a whole bunch of them. Reasoning about that can be rather challenging without help. As we all know, indirection is amazing until we have to spelunk the dependency chains.

To help alleviate this pain, the Breadboard debugger treats recipe invocations as the front-and-center concept in the UI. The timeline presents them as layers, similar to the layers in photo/video/sound editing tools. I can walk up and down the nested recipes or fold them away to reduce cognitive load.

There’s so much more work that we will be doing here in the coming weeks and my current description of the Breadboard debugger is likely to become obsolete very quickly. However, one thing will remain the same: the iteration speed is so fast that working with Breadboard is joyful.

🧸 Exploration through play

This leads me to the key point of what we’re trying to accomplish with the development cycle in Breadboard: exploring is fun and enjoyable when the iterations are fast and easily reversible.

This is something that is near and dear to my heart. 

When the stakes in exploration are high, we tend to adopt a defensive approach to exploring: we prepare, we train, and brace ourselves when it is the time to venture out into the wilderness. Every such exploration is a risky bet that must be carefully weighted, and uncertainty reduced as much as possible.

When the stakes of exploration are low, we have a much more playful attitude. We poke and prod, we try this and that, we goof around. We end up going down the paths that we would have never imagined. We end up discovering things that couldn’t have been found by an anxious explorer.

It is the second kind of exploration that I would love to unlock with Breadboard. Even when learning Breadboard itself, I want our users to have the “just try it” attitude: maybe we don’t know how this or that node or kit works. Maybe we don’t have an answer to what parameters a recipe accepts. So we just try it and see what happens – and get reasonable answers and insights with each try.

If you’re excited about the ideas I wrote about here, please come join us. We’re looking for enthusiastic dance partners who dream of making the exploration of AI frontiers more accessible and enjoyable.

Declarative vs. imperative

I recently had a really fun conversation with a colleague about “declarative vs. imperative” programming paradigms, and here’s a somewhat rambling riff that I captured here as a result.

When we make a distinction between “declarative” and “imperative” programming, we usually want to emphasize a certain kind of separation between the “what” and the “how”. The “what” is typically the intention of the developer, and the “how” is the various means of accomplishing the “what”.

For me, this realization came a while back, from a chat with Sebastian Markbåge in the early days of React, where he stated that React is declarative. I was completely disoriented, since to me, it was HTML and CSS that were declarative, and React was firmly in the imperative land.

It took a bit of flustering and a lot of Sebastian’s patience for me to grok that these terms aren’t not fixed. Rather, they are a matter of perspective. What might seem like “you’re in control of all the details” imperative from one vantage point will look completely declarative from another.

📏 The line between the “what” and the “how” 

Instead of trying to puzzle out whether a given programming language, framework, or paradigm is declarative or imperative, it might be more helpful to identify the line that separates the “what” and the “how”.

For instance, in CSS, the “what” is the presentation of an HTML document (technically a DOM tree), and the “how” is the method by which this presentation is applied. In CSS, we declare what styling attributes we would like each document element to look like, and leave the job of applying these styles to the document in the capable hands of CSS.

Similarly, in React, we declare the structure of our components and their relationship to the state, while leaving the actual rendering of the application up to the framework.

Every abstraction layer brings some “declarativeness” with it, shifting the burden of having to think about some the implementation details from the shoulders of the developer into the layer.

If we look carefully, we should be able to see the line drawn between the “how” and the “what” in every abstraction layer.

In drawing this line, the creators of an abstraction layer – whether they are intentional about it or not – make a value judgment. They decide what is important and must remain in developer’s control, and what is not as important and could be abstracted away. I called this value judgment “an opinion” earlier in my writings.

One way to view such a value judgment is as a bet: it is difficult to know ahead of time whether or not the new abstraction layer will find success among developers. The degree of opinion underlines the risk that the bet entails. More opinionated abstraction layers make riskier bets than less opinionated ones.

If we measure reward in adoption and long-term usage, then the higher risk bets also promise higher reward: the degree of difference in opinion can serve as a strong differentiator and could prove vital to the popularity of the layer. In other words, if our layer isn’t that different from the layer below, then its perceived value isn’t that great to a developer.

Therein lies the ancient dynamic that underlies bets (or any value judgments, for that matter): when designing layers of abstraction, we are called to find that balance of being different enough, yet not too different from the layer below.

🪢 The rickety bridge of uncertainty

While looking for that balance, one of the most significant and crucial exercises that any framework, language, or programming paradigm will undertake is the one of value-trading with their potential developers.

This exercise can be described by one question: What is it that a developer needs to give up in order to unlock the full value of our framework?

Very commonly, it’s the degree of control. We ask our potential developers to relinquish access to some lower-layer capabilities, or perhaps some means to influence control flow.

Sometimes (and usually alongside with lesser control), it’s the initial investment of learning time to fully understand and gain full dexterity of wielding the layer’s opinion.

Whatever the case, there is typically a rickety bridge of uncertainty between the developer first hearing of our framework and their full-hearted adoption.

I once had the opportunity to explain CSS to an engineer who spent most of their career drawing pixels in C++, and they were mesmerized by the amount of machinery that styling the Web entails. If all you want to draw is a green box in the middle of the screen, CSS is a massively over-engineered beast. It is a long walk across unevenly placed planks to the point where the full value of this machinery even starts to make sense, value-wise. Even then, we’re constantly doubting ourselves: is this the right opinion? Could there be better ways to capture the same value?

This bridge of uncertainty is something that every opinionated layer has to cross. Once the network effects take root, the severity of the challenge diminishes significantly. We are socialized species, and the more people adopt the opinion that the abstraction layer espouses, the more normal it becomes – perhaps even becoming the default, for better or worse.

🧰 Bridging techniques

If we are to build abstraction layers, we are better off learning various ways to make the bridge more robust and short.

One technique that my colleague Paul Lewis shared with me is the classic “a-ha moment”: structure our introductions and materials in such a way that the potential value is clearly visible as early as possible. Trading is easier when we know what we’re gaining.

This may look like a killer demo that shows something that nobody else can do (or do easily). It may look like a tutorial that begins with a final product that just begs to be hacked on and looks fun to play with. It could also be a set of samples that elegantly solve problems that developers have.

Another technique is something that Bernhard Seefeld is actively experimenting with: intentionally designing the layer in such a way that feels familiar at first glance, but allows cranking up to the next level – incrementally. You can see this work (in progress 🚧) in the new syntax for Breadboard: it looks just like a typical JS code at first, rapidly ramping up to graph serialization, composition, and all the goodies that Breadboard has to offer.

I am guessing that upon reading these examples, you immediately thought of a few others. Bridging techniques may vary and the technique playbook will keep growing, but one thing that unites them all is that they aim to help developers justify the trade of their usual conveniences for the brave new world of the layer’s opinion.

Designing new layers is an adventure with the indeterminate outcome. We might be right about our value judgments, or we might be wrong. It could be that no matter how much we believe in our rightness, nobody joins to share our opinion with us. No technique will guarantee the outcome we wish for. And that’s what makes API design and developer experience work in general so fun for me. 

Hourglass model and Breadboard

Architecturally, Breadboard aspires to have an hourglass shape to its stack. As Bernhard Seefeld points out, this architecture is fairly common in compilers, and other NLP tools, so we borrowed it for Breadboard as well.

In this brief essay, I will use the Breadboard project to expand a bit on the ins and outs of the hourglass-shaped architecture.

At the waist of the hourglass, there’s a  single entity, a common format or protocol. In Breadboard, it is a JSON-based format that we use to represent any AI recipe (I called them AI patterns earlier, but I like the word “recipe” better for its liveliness). When you look through the project, you’ll notice that all recipes are JSON files of the same format. 

The actual syntax of the JSON file may change, but the meaning it captures will stay the same. This is one of the key tenets of designing an hourglass stack: despite any changes around it, the semantics of the waist layer must stay relatively constant.

The top part of the hourglass is occupied by the recipe producers, or “frontends” in the compiler lingo. Because they all output to the same common format, there can be a great variety of these. For example, we currently have two different syntaxes for writing AI recipes in TypeScript and JavaScript. One can imagine a designer tool that allows creating AI recipes visually. 

Nothing stops someone from building a Python or Go or Kotlin or any other kind of frontend. As long as it generates the common format as its output, it’s part of the Breadboard hourglass stack.

The bottom part of the stack is where the recipe consumers live. The consumers, or “backends”, are typically runtimes: they take the recipe, expressed in the common format and run it. At this moment in Breadboard, there’s only a Javascript runtime that runs in both Node and Web environments. We hope that the number of runtimes expands. For instance, wouldn’t it be cool to load a Breadboard recipe within a colab? Or maybe run it in C++? Breadboard strives for all of these options to be feasible.

Runtimes aren’t the only kinds of backends. For instance, there may be an analysis backend, which studies the topography of the recipe and makes some judgments about its integrity or other kinds of properties. What sorts of inputs does this recipe take? What are its outputs? What are the runtime characteristics of this recipe?

Sometimes, it might be challenging to tell a backend apart from a frontend: an IDE may include both frontends (editor, compiler, etc.) and backends (runtime, debugger, etc.). However, it’s worth drawing this distinction at the system design stage.

The main benefit of an hourglass stack design is that it leaves a lot of room for experimentation and innovation at the edges of the stack, while providing enough gravitational pull to help the frontends and backends to stick together. Just like for any common language or protocol, the waist of the hourglass serves as a binding agreement between the otherwise diverse consumer and producer ideas. If this language or protocol is able to convey valuable information (which I believe we do with AI recipes in Breadboard), a network effect can emerge under the right conditions: each new producer or consumer creates combinatorial expansion of possibilities – and thus, opportunities for creating new value – within the stack.

It is far too early to tell whether we’ll be lucky enough to create these conditions for Breadboard, but I am certainly hopeful that the excitement around AI will serve as our tailwind.

The key risk for any hourglass stack is that it relies on interoperability: the idea that all consumers and producers interpret the common format in exactly the same way. Following the Hyrum’s Law, any ambiguity in the semantics of the format will eventually result in disagreements across the layers.

These disagreements are typically trailing indicators of the semantic ambiguities. By the time the bugs are identified and issues are filed, it is often too late to fix the underlying problems. Many languages committees spend eons fighting the early design mistakes of their predecessors.

As far as I know, the best way to mitigate this risk is two-fold. We must a) have enough experience in system design to notice and address semantic ambiguities early and b) have enough diversity of producers and consumers within the hourglass early in the game.

To be sure, a system design “spidey sense”, no matter how well-developed, and the ability to thoroughly exercise the stack early don’t offer 100% guarantees. Interoperability tends to be a long-term game. We can flatten the long tail of ambiguities, but we can’t entirely cut it off. Any hourglass-shaped architecture must be prepared to invest a bit of continuous effort into gardening its interoperability, slowly but surely shaving down the semantic ambiguities within the common format.

Why AI orchestration

Why do I find the problem of AI patterns and more generally, AI orchestration so interesting that I literally started building a framework for it? Why do we even need graphs and chains in this whole AI thing? My colleagues with a traditional software engineering background have been asking me this question a lot lately.

Put very briefly, at the height of the current AI spring that we’re experiencing, orchestration is a crucial tool for getting AI applications to the shipping point.

To elaborate,  imagine that an idea for a software application takes a journey from inception to full realization through these two gates.

First, it needs to pass the “hey… this might just work” gate. Let’s call this gate the “Once” gate, since it’s exactly how many times we need to see our prototype work to get through it.

Then, it needs to pass through the “okay, this works reasonably consistently” gate. We’ll call it the “Mostly” gate to reflect the confidence we have in the prototype’s ability to work. It might be missing some features, lack in polish and underwhelm in performance benchmarks, but it is something we can give to a small group of trusted users to play with and not be completely embarrassed.

Beyond these two gates, there’s some shipping point, where the prototype – now a fully-fledged user experience – passes our bar for shipping quality and we finally release it to our users.

A mistake that many traditional software developers, their managers, and sponsors/investors make is that, when looking at AI-based applications, they presume the typical cadence of passing through these gates.

Let’s first sketch out this traditional software development cadence as a sequence below.

The “Once” gate plays a significant role, since it requires finding and coding up the first realization of the idea. In traditional software development, passing this gate means that there exists a kernel of a shipping product, albeit still in dire need of growing and nurturing.

The trip to the “Mostly” gate represents this process of maturing the prototype. It is typically less about ideation and mostly converging on the robust implementation of the idea. There may be some circuitous detours that await us, but more often than not, it’s about climbing the hill.

In traditional software development, this part of the journey is a matter of technical excellence and resilience. It requires discipline and often requires a certain kind of organizing skill. On more than one occasion, I’ve seen brilliant program managers brought in, who then help the team march toward their target with proper processes, burndown lists, and schedules. We grit our teeth and persevere, and are eventually rewarded with software that passes the shipping bar.

There’s still a lot of work to be done past that gate, like polish and further optimization. This is important work, but I will elide it from this story for brevity.

In AI applications, or at least mine and my friends/colleagues’ experiences with it, this story looks startlingly different. And definitely doesn’t fit into a neat sequential framing.

Passing the “Once” gate is often a matter of an evening project. Our colleagues wake up to a screencast of a thing that shouldn’t be possible, but somehow is. Everyone is thrilled and excited. Their traditional software developer instincts kick in: a joyful “let’s wrap this up and ship it!” is heard through the halls of the office.

Unfortunately, when we try to deviate even a little from the steps in the original screencast, we get perplexing and unsatisfying results. Uh oh.

We try boxing the squishy, weird nature of large language models into the production software constraints. We spend a lot of time playing with prompts, chaining them, tuning models, quantizing, chunking, augmenting – it all starts to feel like alchemy at some point. Spells, chants, and incantations. Maaaybe – maybe – we get to coax a model to do what we want more frequently. 

One of my colleagues calls it the “70% problem” – no matter how much we try, we can’t seem to get past our application producing consistent results  more than 70% of the time. Even by generous software quality standards, that’s not “Mostly”.

Getting to that next gate has little resemblance to the maturation process from traditional software development. Instead, it looks a lot more like the looping over and over back to “Once”, where we rework the original idea entirely and change nearly everything.

When working with AI applications, this capacity to rearrange everything and stay loose about the details of the thing we build, this design flexibility is what dramatically increases our chances of crossing to “Mostly” gate. 

Teams that hinge their success on adhering to the demo they sold to pass through the “Once” gate are much more likely to never see the next gate. Teams that decide that they can just lay down some code and improve iteratively – as traditional software engineering practices would suggest – are the ones who will likely work themselves into a gnarly spaghetti corner. At least today, for many cases – no matter how exciting and tantalizing, the “70% problem” remains an impassable barrier. We are much better off relying on an orchestration framework to give us the space to change our approach and keep experimenting.

This is a temporary state and it is not a novel phenomenon in technological innovation. Every new cycle of innovation goes through this. Every hype cycle eventually leads to the plateau of productivity, where traditional software development rules.

However, we are not at that plateau yet. My intuition is that we’re still climbing the slope toward the peak of inflated expectations. In such an environment, most of us will run into the “70% problem” barrier head-first. So, if you’re planning to build with large language models, be prepared to change everything many times over. Choose a robust orchestration framework to make that possible.

Placing and wiring nodes in Breadboard

This one is also a bit on the more technical side. It’s also reflective of where most of my thinking is these days. If you enjoy geeking out on syntaxes and grammars of opinionated Javascript APIs, this will be a fun adventure – and an invitation.

In this essay, I’ll describe the general approach I took in designing the Breadboard library API and the reasoning behind it. All of this is still in flux, just barely meeting the contact with reality.

One of key things I wanted to accomplish with this project is the ability to express graphs in code. To make this work, I really wanted the syntax to feel light and easy, and take as few characters as possible, while still being easy to grasp. I also wanted for the API to feel playful and not too stuffy.

There are four key beats to the overall story of working with the API:

1️⃣ Creating a board and adding kits to it
2️⃣ Placing nodes on the board
3️⃣ Wiring nodes
4️⃣ Running and debugging the board.

Throughout the development cycle, makers will likely spend most of their time in steps 2️⃣ and 3️⃣, and then lean on step 4️⃣ to make the board act according to their intention. To get there with minimal suffering, it seemed important to ensure that placing nodes and wiring them results in code that is still readable and understandable when running the board and debugging it.

This turned out to be a formidable challenge. Unlike trees, directed graphs – and particularly directed graphs with cycles – aren’t as easy for us humans to comprehend. This appears to be particularly true when graphs are described in the sequential medium of code. 

I myself ended up quickly reaching for a way to visualize the boards I was writing. I suspect that most API consumers will want that, too – at least at the beginning. As I started developing more knack for writing graphs in code, I became less reliant on visualizations.

To represent graphs visually, I chose Mermaid, a diagramming and charting library. The choice was easy, because it’s a library that is built into Github Markdown, enabling easy documentation of graphs. I am sure there are better ways to represent graphs visually, but I followed my own “one miracle at a time” principle and went with a tool that’s already widely available.

🎛️ Placing nodes on the board

The syntax for placing nodes of the board is largely inspired by D3: the act of placement is a function call. As an example, every Board instance has a node called `input`. Placing the `input` node on the board is a matter of calling `input()` function on that instance:

import { Board } from “@google-labs/breadboard;

// create new Board instance
const board = new Board();
// place a node of type `input` on the board.
board.input();

After this call, the board contains an input node.

You can get a reference to it:

const input = board.input();

And then use that reference elsewhere in your code. You can place multiple inputs on the board:

const input1 = board.input();
const input2 = board.input();

Similarly, when adding a new kit to the board, each kit instance has a set of functions that can be called to place nodes of various types on the board to which the kit was added:

import { Starter } from “@google-labs/llm-starter;

// Add new kit to the existing board
const kit = board.addKit(Starter);

// place the `generateText` node on the board.
// for more information about this node type, see:
// https://github.com/google/labs-prototypes/tree/main/seeds/llm-starter#the-generatetext-node
kit.generateText();

Hopefully, this approach will be fairly familiar and uncontroversial to folks who use JS libraries in their work. Now, onto the more hairy (wire-ey?) bits.

🧵 Wiring nodes

To wire nodes, I went with a somewhat unconventional approach. I struggled with a few ideas here, and ended up with a syntax that definitely looks weird, at least at first.

Here’s a brief outline of the crux of the problem.  In Breadboard, a  wire connects two nodes. Every node has inputs and outputs. For example, the `generateText` node that calls the PaLM API `generateText` method accepts several input properties, like the API key and the text of the prompt, and produces outputs, like the generated text.

So, to make a connection between two nodes meaningful, we need to somehow capture four parameters:

➡️  The tail, or node from which the wire originates.
⬅️ The head, or the the node toward which the wire is directed.
🗣️ The from property, or the output of the tail node from which the wire connects 
👂 The to property, or the input of the head node to which the wire connects

To make this more concrete, let’s code up a very simple board:

import { Board } from "@google-labs/breadboard";

// create a new board
const board = new Board();
// place input node on the board
const tail = board.input();
// place output node on the board
const head = board.output();

Suppose that next, we would like to connect property named “say” in `tail` to property named “hear” in `head`. To do this,  I went with  the following syntax:

// Wires `tail` node’s output named `say` to `head` node’s output named `hear`.
tail.wire(“say->hear, head);

Note that the actual wire is expressed as a string of text.  This is a bit unorthodox, but it provides a nice symmetry: the code literally looks like the diagram above. First, there’s the outgoing node, then the wire, and finally the incoming node.

This syntax also easily affords fluent interface programming, where I can keep wiring nodes in the same long statement. For example, here’s how the LLM-powered calculator pattern from the post about AI patterns looks like when written with Breadboard library:

math.input({ $id: "math-question" }).wire(
  "text->question",
  kit
    .promptTemplate(
      "Translate the math problem below into a JavaScript function named" +
      "`compute` that can be executed to provide the answer to the" +
      "problem\nMath Problem: {{question}}\nSolution:",
      { $id: "math-function" }
    )
    .wire(
      "prompt->text",
      kit
        .generateText({ $id: "math-function-completion" })
        .wire(
          "completion->code",
          kit
            .runJavascript("compute->", { $id: "compute" })
            .wire("result->text", math.output({ $id: "print" }))
        )
        .wire("<-PALM_KEY", kit.secrets(["PALM_KEY"]))
    )
);

Based on early feedback, there’s barely a middle ground of reactions to this choice of syntax. People either love it and find it super-cute and descriptive (“See?! It literally looks like a graph!”) or they hate it and never want to use it again (“What are all these strings? And why is that arrow pointing backward?!”) Maybe such contrast of opinions is a good thing?

However, aside from differences in taste,  the biggest downside of this approach is that the wire is  expressed as a string: there are plenty of opportunities to make mistakes between these double-quotes. Especially in a strongly-typed land of TypeScript, this feels like a loss of fidelity – a black hole in the otherwise tight system. I have already found myself frustrated by a simple misspelling in the wire string, and it seems like a real problem.

I played briefly with TypeScript template literal types, and even built a prototype that can show syntax errors when the nodes are miswired. However, I keep wondering – maybe there’s an even better way to do that?

So here’s an invitation: if coming up with a well-crafted TypeScript/Javascript API is something that you’re excited about, please come join our little Discord and help us Breadboard folks find an even better way to capture graphs in code. We would love your help and appreciate your wisdom.

The engine and the car

The whole large language model space is brand new, and there are lots of folks trying to make sense of it. If you’re one of those folks, here’s an analogy that might come handy.

Any gasoline-powered car has an engine. This engine is typically something we refer to as a “V8” or “an inline 4” or sometimes even a “Wankel Rotary Engine”. Engines are super-cool. There are many engine geeks out there – so many that they warrant a video game written for them.

However, engines aren’t cars. Cars are much more than their engines. Though engines are definitely at the heart of every engine, cars have many additional systems around them: fuel, electrical, steering, etc. Not to mention safety features to protect the passengers and the driver, and a whole set of comforts that we enjoy in a modern car. Pressing a button to roll down a window is not something that is done by the engine, but it’s definitely part of the whole car experience.

When we talk about this generation of AI systems, we typically talk about large language models (LLMs). In our analogies, LLMs are like engines. They are amazing! They are able to generate text by making inferences from the massive parametric memory accrued through training over a massive corpus of information.

However, they aren’t cars. One of the most common mistakes that I see being made is confusing engines (LLMs) with cars (LLM-based products). This is so common that even people who work on those products sometimes miss the distinction.

When I talk to the users of the PaLM API, I see this confusion show up frequently in this manner: developers want to reproduce results from the LLM-based products like Bard or ChatGPT . When they try to get the same results from the API, they are disappointed that they don’t match. Factuality is lacking, API can’t go to the internet and fetch an article, etc. 

In doing so, they confuse the engine with the car: the API, which offers access to the model, is not the same as the products built with it. With an LLM API, we have a big-block V8. To make it go down the road, we still need to build the car around it.

 To build on this analogy, we live in the early age of cars: the engines still figure prominently in the appearance and daily experience of a vehicle. We still have to turn the crank to start the car, oil the engine frequently, and be savvy enough to fix minor problems that will definitely arise.

As our cars become more refined, the engines get relegated into a well-insulated compartment. Users of cars rarely see them or operate on them directly.

This is already happening with LLM-based products. Very few current offerings that you might encounter in public use are LLMs that are directly exposed to the user.

So, when you use a chat-based system, please be aware that this is a car, not the engine. It’s a tangle of various AI patterns that are carefully orchestrated to work as one coherent product. There is likely a reasoning pattern at the front, which relies on an LLM to understand the question and find the right tool to answer it. There is likely a growing collection of such tools – each an AI pattern in itself. There are likely some bits for making sure the results are factual, grounded in sources, and safe.

As the LLM products become more refined, the actual value niches for LLMs become more and more recognizable. Instead of thinking of one large LLM that does everything, we might be seeing specialization: LLMs that are purpose-designed for reasoning, narration, classification, code completion, etc. Each might not be super-interesting in itself, but make a lot of sense in the overall car of an LLM-based product.

Perhaps unsurprisingly, the next generation of cars might not even have the same kind of engine. While the window control buttons and the steering systems remain the same, the lofty gasoline engines are being replaced with electric motors that fit into a fraction of space. The car experience remains more or less the same (aside from the annoying/exhilarating engine noise), but the source of locomotion changes entirely.

It is possible that something like this will happen with LLMs and LLM-based products as well. The new open space that was created by LLMs will be reshaped – perhaps multiple times! – as we discover how the actual products are used. 

AI Patterns and Breadboard

In my last post, I kept talking about AI patterns, but kept it a bit vague. I thought it might be useful to share a couple of examples to describe what I mean by “AI patterns” a bit more clearly. Once again, put your technical hats on.

🗺️ How to read AI pattern diagrams

As part of practicing the “build a thing to build the thing” principle, we implemented quite a few of AI patterns in Breadboard. I will use the diagrams we generate from the boards (thank you Mermaid.js!) to illustrate the patterns. Here’s a quick guide on how to read the diagrams – and as a happy coincidence, a brief overview of Breadboard concepts.

 🔵 The inquisitively blue parallelogram represents the “input” node. This is where the user’s input is requested by the pattern. Because most patterns ask for input first, it’s a good place to start when tracing the flow of the graph.

🟢 The cheerfully green hexagon is the “output” node, which provides the output to the user of the pattern. For many patterns, that’s the end point, the journey’s destination, while for a few – just a brief stopover.

🟡 The curiously yellow boxes are all nodes that do interesting stuff. For example, “generateText” node invokes the LLM, while “promptTemplate” combines a template and various bits of text into a prompt that’s suitable for the LLM. Most of the time, you can guess what the function does by looking at its name.

🔴 The protectively red box with rounded corners is the “secrets” node, which has access to the user’s sensitive data. For most (all?) patterns, it is used to retrieve and pass the API Key to unlock the ability to invoke the large language model.

🍷 The variously-shaped wine-colored boxes are utility nodes: they are mostly here to serve other nodes by supplying important data and making it possible for graphs to be composable and useful. We’ll be mostly ignoring them here – but I will very likely be back to sing their song in the future.

Most nodes will have a two-line label. The first line is the type of the node and a second is its unique identifier. Because there can be multiple nodes of the same type, we need an id to distinguish between them.

 Just like with the literal breadboards,  nodes are connected with wires. Wires are represented by lines with arrows. The direction of the arrow on each wire represents the flow of information. So, when the graph shows this:

… it means that the information flows from the “promptTemplate” node to the “generateText” node.

Each wire is labeled. All labels have the same consistent “out->in” format. A good way to think of it is that every node may have multiple inputs and outputs. The wires connect these inputs and outputs.

In the example above, the output named “prompt” of the “promptTemplate” node is wired to the input named “text” of the “generateText” node. Most of the time, it’s not difficult to infer the purpose of the wire. Like, the wire above flows the prompt produced by the “promptTemplate” node as input of the “generateText” node. If you are curious about all the ins and outs of nodes (pun intended!), check out this guide on Github.

Some wires will have a circle at the end of them, rather than an arrow. These are constant wires. There’s a lot more to them, but for now, a good way to think of them is that they are here to specify constant values. Like in the diagram below, the “template” utility node supplies a constant “template” input to the “promptTemplate” node. 

With this quick Breadboard refresher out of the way, we’re ready to dig into the actual patterns. To keep this post from becoming a book, I’ll give you only three examples.

 🧮 The Calculator pattern

Let’s start with the widely used Calculator pattern (you can also see it here in on Github, implemented in Breadboard):

The structure of this pattern is very simple: user input goes into the “promptTemplate” node, which produces a prompt that goes into the “generateText” node, the output of which is fed to “runJavascript” node, and the result is returned as output.

As it often happens with AI patterns, the magic is in the contents of the prompt template. In this pattern, the LLM is used to find solutions to mathematical problems in a very clever way. 

As you may have heard, LLMs aren’t so great at math. So instead of approaching the problem head-on, we lean onto LLM’s strength: we convert a math problem into a language problem.

In the Calculator pattern, we ask the LLM to do what it does best: generate text. We ask it to write code that solves a math problem, rather than try to find the answer to the question. Here’s a prompt to do that:

Translate the math problem below into a JavaScript function named `compute` that can be executed to provide the answer to the problem.
Math Problem: {{question}}
Solution:

Because writing code is a language problem, LLMs are pretty good at it. So, with a high degree of consistency, the output of the LLM will be a function that, when run, produces the right answer. Leave computation to the old-style computers. Let LLMs write code that will be computed.

For instance, when we replace the {{question}} placeholder with:

What is the square root
of the perimeter of a circle w
ith a diameter of 5?

The LLM will happily produce this function:

function compute() {
  const diameter = 5;
  const radius = diameter / 2;
  const perimeter = 2 * Math.PI * radius;
  return Math.sqrt(perimeter);
}

Which, when executed, will give us the correct answer of `3.963327297606011`. If you ask any conversational agent today a math question and it surprises you with an accurate answer, chances are that some variant of the Calculator pattern is being employed.

📰 The Summarizer pattern

Another common pattern builds on the LLM’s strength of narrating information, even when presented with bits of random content. I experimented with this ability early this year, and here’s an implementation of the pattern in Breadboard (also here on Github):

When we look at the structure above, we can see that user input splits into two paths.

The first route is circuitous. It takes us through the “urlTemplate” node that creates a valid URL (it’s a Google News RSS feed with the topic as the query), which is then fed to the “fetch” node. The “fetch” node grabs the contents of this URL, and sends it to the “xmlToJson” and “jsonata” nodes that munge RSS into a list of headlines.

The second and the first route meet up at the “promptTemplate” node, where they predictable move to the “generateText” node and, finally, the result is presented to the user.

The concept is fairly straightforward: give the LLM a topic and a few sentences, and request a summary. If – as is the case in the graph above – we are summarizing news headlines, a prompt will look something like this:

Use the news headlines below to write a few sentences to summarize the latest news on this topic:
##Topic:{{topic}}
## Headlines{{headlines}}
## Summary:

In this prompt, we have two placeholders: the {{topic}}, which is where the subject of summarization will go, and the {{headlines}}, where we will plug in the various headlines from a news source (Google News).

The key distinction between just asking an LLM a question and using this pattern is that we’re not relying on LLM’s parametric memory to contain the answer. We’re not asking it to find the answer for us. Instead, we are only employing its narrative-making abilities, supplying the raw information in the prompt.

So, if I for example put “breadboards” into the {{topic}} placeholder, and the following list of headlines from Google News (just the first first 20 for this particular board) into the {{headlines}} placeholder:

Thermochromic Treatment Keeps Solderless Breadboards Smokeless - Hackaday
Jumper Wires For Electronic Components - IndiaTimes
10 hostess hacks to make your food look better than it is - Colorado Springs Gazette
Gabriel's Cyberdeck Red V2 Packs in a LattePanda Delta 3, Analog Discovery 2, HackRF One, and More - Hackster.io
How to Measure Voltage on a Breadboard - MUO - MakeUseOf
The Ultimate Breadboard Platform? - Hackster.io
Building Circuits Flexibly - Hackaday
Lewiston Art Festival: A 'dinosaur' of woodwork - Niagara Gazette
Podcast 220: Transparent Ice, Fake Aliens, And Bendy Breadboards ... - Hackaday
Flexboard: a flexible breadboard for flexible and rapid prototyping of ... - Tech Explorist
Derek Fogt | Communities | pinecountynews.com - pinecitymn.com
MARNI JAMESON: Compensate for humdrum food with stylish ... - Sarasota Herald-Tribune
Build HMI screens with MicroLayout for your Meadow Apps - Hackster.io
Tidy Breadboard Uses Banana Bread - Hackaday
Old 6809 Computer Lives Again On Breadboards - Hackaday
My Favorite Things: Hardware Hacking and Reverse Engineering - Security Boulevard
Luna in Cocoa Beach offers serves thoughtful modern Italian food - Florida Today
Teaching Method Increases Students' Interest in Programming and ... - University of Arkansas Newswire
From A 6502 Breadboard Computer To Lode Runner And Beyond - Hackaday
How to Breadboard Electronics Projects with Raspberry Pi Pico - Tom's Hardware

… we will get this output from an LLM:

The latest news on breadboards include a new thermochromic treatment
that keeps solderless breadboards smokeless, a flexible breadboard for
flexible and rapid prototyping, and a new method for teaching students
programming and electronics.

For the quality of the junk we fed it, it ain’t half bad!

The Summarizer pattern has a much more popular cousin named Retrieval-augmented Generation (RAG). RAG is all the rage these days, and everyone wants to have one. If we peek under the covers, we’ll recognize the Summarizer pattern combined with another neat LLM capability of semantic embeddings into the Voltron of patterns.

🔁 The ReAct pattern

I would be remiss not to bring up ReAct when talking about AI patterns. This pattern ushered the new mini-era of LLM applications, a breakthrough that redefined what LLMs can do.

The ReAct pattern is different from the ones mentioned earlier, because it is cyclical: rather than asking an LLM once, it may do so several times, repeating until the problem is solved.

ReAct introduces this really interesting idea that we can induce chain-of-thought reasoning capabilities in LLMs if we structure our interaction with them in a certain way. In this chain of thought, the LLM interacts with the outside environment, suggesting actions to take and then reason about the outcomes of these actions.

I’ve talked about LLM-based reasoning a few times before, so this concept shouldn’t be entirely novel to my readers.

In ReAct, the key trick is in establishing a predictable beat of reasoning within the prompt:

1️⃣ First comes the Question – the question that the user asks
2️⃣ Then, comes the Thought – the opportunity for an LLM to reason about what to do next
3️⃣ After Thought is Action – LLM’s suggested action to take
4️⃣ Finally, the Observation – the outcome of the action, supplied by the tool 

Steps 2️⃣,  3️⃣,  and 4️⃣ keep repeating until the answer is found. 

The LLM is only allowed to pipe in on steps 2️⃣ and 3️⃣: that is, it can only produce the “Thought” and “Action” parts of the overall sequence.

Step 1️⃣ is provided by the user, and the observation in step 4️⃣ is supplied as the outcome of whatever action the LLM suggested to take.

As the steps repeat, all of these steps are being added to the overall prompt, allowing the LLM to see the history of the interaction and reason about it. In this way, and unlike in the Calculator and Summarizer patterns, the ReAct pattern simulates memory: with each invocation, the LLM can see how it acted in the past.

It’s a pretty convoluted contraption, as you can see it from the Breadboard implementation (here on Github):

However, the presence of memory and a repeating cadence of steps guides LLMs to allow it to answer much more complex and interesting questions. The answers are no longer produced in one go. Instead, LLMs walk toward the answer in multiple iterations.

Let’s see if we can make it a bit more concrete by breaking down an actual transcript of a ReAct pattern solving a problem.

First, we supply the question:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?

In response, the LLM the produces the following Thought and Action steps:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon

The LLM is asking to search for the distance between the Earth and the Moon. Notice that it is not trying to make up the answer. It is merely suggesting the next action to take. Taking control over from the LLM, the ReAct pattern invokes search and produces the following Observation:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km

With the observation in hand, the ReAct pattern hands control back to the LLM, and get this response:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km
2️⃣ Thought: I need to square root this distance
3️⃣ Action: math
   Action Input: sqrt(384400)

Great! Now the LLM wants us to do math. Fine. The ReAct machinery rattles and burrs, and produces the next Observation:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km
2️⃣ Thought: I need to square root this distance
3️⃣ Action: math
   Action Input: sqrt(384400)
4️⃣ Observation: 620

The ReAct pattern yields control back to the LLM, and it triumphantly declares victory:

1️⃣ Question: what is the square root of the distance between earth and moon in kilometers?
2️⃣ Thought: I need to find the distance between earth and moon
3️⃣ Action: search
   Action Input: distance between earth and moon
4️⃣ Observation: 384,400 km
2️⃣ Thought: I need to square root this distance
3️⃣ Action: math
   Action Input: sqrt(384400)
4️⃣ Observation: 620
2️⃣ Thought: I now know the final answer
   Final Answer: 620

Great job, model. You did it.

The ReAct pattern also introduces, almost as an afterthought, the concept of tools. Since the LLM is asked to suggest an action, it seems useful to specify the kinds of tools the LLM has at its disposal.

In the transcript above, the “search” and “math” tools were used. For other kinds of problems, there might be a need for other kinds of tools. 

This is where the most valuable aspect of the ReAct pattern resides: if we can specify our own tools, we can make LLMs do useful things. For example, I could hand it a “calendar” tool, an “email” tool, and a list of my friends and ask it to schedule a lunch for us. Or I could turn it into a menu-ordering system, where it would rely on menu-understanding tools to take customer orders.

The pattern stays the same,  but the tools change. With the ReAct pattern, we can build actual helpful agents. If you’ve been watching the LLM space, you have no doubt noticed a lot of activity around this notion.

🍞Patterns with Breadboard

These are just a few examples of interesting patterns that emerged in the generative AI space in the last few months. Honestly, the pace of pattern discovery has been nuts. So far, I see no signs of it slowing down. What I’d like to do with Breadboard is to help make these patterns more legible to everyone – so that more people can play with them, create their own, and explore this fascinating space together.

My intuition is that when we lower the barrier to entry to this process of discovery and make it easy to tell whether the new pattern is good or not, we have a better chance of exploring the space more thoroughly and realizing the full potential of large language models.

Composing graphs with Breadboard

This post is more technical and more rambling than my usual repertoire. It’s a deep dive into where my mind is today, and it’s mostly about technical design. Think of it as a sample of interesting problems that I am puzzling over these days.

Let’s start with graphs. In the past, I’ve mentioned this interesting tension between graphs and trees: how our human tendency to organize things into tree-like structures (hierarchies and other container models) is at odds with the fluid, interconnected nature of the world around us. I framed it as: “every graph wants to become a tree, and still secretly wants to remain a graph”, and described in the context of developer frameworks.

So naturally, when presented with an opportunity to write a developer library, I decided to use the graph structure as its core. The result is Breadboard. It’s still early on. I like to say that we currently have a toddler, and there’s much left to do to get the library to adulthood.  However, it seems useful to start sharing what I’ve learned so far and design decisions I landed on. 

🐱Whyyyy

If you even occasionally scan these chronicles of my learnings, you will undoubtedly know that I am fascinated by the potential of applying large language models (LLMs) as a technology and the kinds of new interesting spaces they could open.

As a result, I invested quite a bit of time tinkering with the models, trying to make them jump through various hoops and do things – just to get a sense of what it is that they are truly capable of. Judging from the endless updates I hear from my friends and colleagues, so is everyone else.

New interesting patterns of applying LLMs seem to arise almost daily – and that’s pretty exciting. What was less exciting for me was the distinct lack of tools that help us discover these patterns. Most of the frameworks that are rising to relative prominence appear to focus on capturing the newly discovered patterns and making them more accessible. This is great! But what about a framework that facilitates tinkering with LLMs to find new patterns?

I started Breadboard with two objectives in mind:

  1. Make creating new generative AI patterns accessible and fun
  2. Enable easy sharing, remixing, composition, and reuse of these patterns.

My hope is that Breadboard helps accelerate the pace with which new interesting and useful patterns for applying generative AI are found. Because honestly, it feels like we have barely scratched the surface. It would be super-sad if the current local maxima of chatbots would be as far as we get with this cycle of AI innovation.

🍞The metaphor

The first thing I wanted to get right was the mental model with which a developer might approach the library. 

Graphs are typically hard to describe. I am in awe with whomever came up with the term “Web” to describe the entire tangle of the hyperlinked documents. Kudos.

As you may remember, I am also very interested in the role makers play in the generative AI space. That’s how breadboards came to mind: the solderless construction bases for prototyping electronic circuits. Breadboards are a perfect maker’s tool. They are easy to put together and easy to take apart, to change the layout and tinker with various parts.

Lucky for me, breadboards are also graphs: the electronic circuits they carry are directed graphs, where each electronic component is a node in the graph and the jump wires that connect them are edges. By placing different nodes on the board and wiring them in various ways, we get different kinds of prototypes.

This is exactly what I was looking for: a one-word name for the library that comes with the mental model for what it does. As an additional benefit, “breadboard” selects for makers: if you know and love breadboards (or even just the idea of breadboards), you will likely look forward to playing with this library.

🧩 The composition system

Another piece of the puzzle was composition. Over the last decade, I ended up studying composition and designing composable systems quite extensively. In Breadboard, I wanted to lay down a sound foundation for composition.

There are three different ways to compose things in Breadboard: 🧩 nodes, 🍱 kits, and 🎛️ boards.

🧩 Nodes are the most obvious unit of composition: we can place nodes on a breadboard and wire them together. At this layer of composition, makers compose nodes to make our prototypes. Once they have a neat prototype, makers can share the board that contains the prototype. A shared board is something that anyone can pull down from a URL and start playing with it. They can clone it, tweak it, and share it again.

To get the node composition right, we need a set of nodes that allow us to build something useful and interesting. While still at an early stage, it is my intent to arrive at a starter kit of sorts: a relatively small set of general-purpose nodes that enable making all kinds of cool things.

🍱 We don’t have to stop with just one kit. Kits are another unit of composition. Makers are able to create and group interesting nodes into kits – and publish them to share with others. For instance, a project or a company might want to wrap their interesting services as nodes and publish them as a kit, allowing any maker to grab those nodes and start using them in their boards.

A maker can also just build a kit for themselves, and use it in their own prototyping only. While kits do not need to be published, boards that use unpublished kits can’t be shared with others – or at least shared in any useful way.

🎛️ Boards themselves are also units of composition. Makers can include boards of others into their board, turning an included board into a sort of virtual node. The board inclusion feature is similar to a hyperlink: just like on the Web, including a board simply links from one board to another, rather than subsuming it. Such loose coupling unlocks the full potential for interdependent collaboration, and I fully expect the common dependency management practices to be applicable.

In addition to inclusion, boards can have slots. Slots are another way to add modularity to boards. When I build a board, I can leave it incomplete by specifying one or more places – “slots” – where someone else can include their boards. This is a useful trick that software developers call “dependency injection”. For instance, if I developed a generic pattern to invoke various tools with generative AI,  I can leave a slot for these tools. When other makers reuse my board, they can insert their own sets of tools into this slot without having to modify my board.

🤖 The traversal machine

It took me a little bit of time to settle on what is a “node” in Breadboard and how these nodes get traversed in the graph. I ended up going with the actor model-inspired design, leaving lots of room to explore concurrency and distributed processing in the future. For the moment however, I am primarily guided by the motivation to make Breadboard graphs easy to understand.

One capability I wanted to enable was building graphs that have cycles within them. Pretty much anything interesting contains feedback loops, and so Breadboard supports directed graphs with cycles out of the box. Calculating topography of such graphs is an NP-complete problem, but lucky for us, traversing them is fairly trivial. After all, most computer programs are directed graphs with cycles.

At the core of the traversal machine is this concept: a well-behaving node is a pure function. More precisely, as close to a pure function as we can get. Since makers can create their own nodes, Breadboard can’t guarantee any of that, but I’d like to encourage it.

Since pure functions don’t contain state, state needs to be managed outside of the function. Breadboard relies on wires as the method to manage state. Wires are the way both data and control flow.

This sets up the basics of the traversal logic:

  • Every node has inputs and outputs. The inputs are the wires running into the node, and outputs are wires running out. A node consumes inputs and provides outputs.
  • The node – or technically the function that the node represents – is only run when all inputs have been provided to it by the nodes that ran before this node. Put differently, a node will not run if some of its inputs weren’t provided.
  • Graph traversal starts with running nodes that don’t have any inputs wired into them.

That’s about it.

To geek out on this a bit more, I went with a mailbox-like setup where wires are effectively one-time variables that store data. The data in this variable is written by a node output and read by a node input. A super-cool effect of such a setup is that the state of the graph is captured entirely in the wires, which means that Breadboard can pause and resume traversal of the graph by saving what’s currently stored in the wires.

🚧 What is next

Looking ahead, I am quite optimistic about Breadboard. We already have a small seedling of a team developing. In the next few weeks, I’ll keep making interesting patterns with it to keep informing the development of the library. Build a thing to build the thing, right?

Once the fundamentals settle a bit, we can start thinking about graduating Breadboard into an early adulthood, releasing the v1. Hopefully, at this point, we will have enough onramp for you and other makers to start actively using it in your prototyping adventures.

If you feel excited about this idea and don’t want to wait until then, please check out the list of open issues on Github and join the conversation. Be prepared to eat unbaked cookies and occasionally find bits of construction debris in them – and help make Breadboard better.