Flexibility of the medium

I did this fun little experiment recently. I took my two last posts (Thinking to Write and The Bootstrapping Phase) and asked an LLM to turn them into lyrics. Then, after massaging the lyrics a bit, to better fit with the message I wanted to come across, I played with Suno for a bit to transform them into short, 2-minute songs – a sort of vignettes for my long-form writing. Here they are:

Unbaked Cookie Testers on Suno

Catchy, content-ful, and in total, maybe 20 minutes to make. And, it was so much fun! I got to think about what’s important, how to express somewhat dry writing as emotionally interesting. I got to consider what music style would resonate with what I am trying to convey in the original content.

This got me thinking. What I was doing in those few minutes was transforming the medium of the message. With generative AI, the cost of medium transformation seems to be going down dramatically. 

I know how to make music. I know how to write lyrics. But it would have taken me hours of uninterrupted time (which would likely translate into months of elapsed time) to actually produce something like this. Such investment makes medium transformation all but prohibitive. It’s just too much effort.

However,  with the help of a couple of LLMs, I was able to walk over this threshold like there’s nothing to it. I had fun, and – most importantly – I had total agency in the course of the transformation. I had the opportunity to tweak the lyrics. I played around with music styles and rejected a bunch of things I didn’t like. It was all happening in one-minute intervals, in rapid iteration.

This rapid iteration was more reminiscent of jamming with a creative partner than working with a machine. Gemini gave me a bunch of alternatives (some better than others), and Suno was eager to mix bluegrass with glitch, no matter how awful the results. At one moment I paused and realized: wow, this feels closer to the ideal creative collaboration than I’ve ever noticed before.

What’s more importantly, the new ease of medium transformation opens up all kinds of new possibilities. If we presume – and that’s a big one – for a moment that the cost of medium transformation will indeed go down for all of us, we now can flexibly adjust the medium according to the circumstances of the audience.

The message does not have to be locked in a long-form post or an academic tome, waiting for someone to summarize it in an easily consumable format. We can turn it into a catchy tune, or a podcast. It could be a video. It could be something we don’t yet have, like a “zoomable” radio station where I listen to a stream of short-form snippets of ideas, and can “zoom in” to the ones I am most interested in, pausing the stream to have a conversation with the avatar of the author of the book, or have an avatar of someone I respect react to it. I could then “zoom out” again and resume the flow of short-form snippets.

Once flexible, the medium of the message can adapt and meet me where I am currently.

The transformation behind this flexibility will often be lossy. Just like the tweets pixelate the nuance of the human soul, turning a book into a two-verse ditty will flatten its depth. My intuition is that this lossiness and the transformation itself will usher in a whole new era of UX explorations, where we struggle to find that new shared way of interacting with the infinitely flexible, malleable canvas of the medium. Yup, this is going to get weird.

The Bootstrapping Phase

I think I have a slightly better way of describing a particular moment in a product’s life that I alluded to in Rock tumbler teams, Chances to get it right,  and later, Build a thing to build the thing. I call this moment the “bootstrapping phase.” It very much applies to consumer-oriented products as well, but is especially pronounced – and viscerally felt – in the developer experience spaces.

I use the term “bootstrapping phase” to point at the period of time when our aspiring developer product is facing a tension of two forces. On one hand, we must start having actual users to provide the essential feedback loop that will guide us. On the other hand, the product itself isn’t yet good enough to actually help users.

The bootstrapping phase is all about navigating this tension in the most effective way. Move a little too much away from having the feedback loop, and we run the danger of building something that nobody wants. Go a little too hard on growing the user base, and we might prematurely conclude the story of the product entirely.

The trick about this phase is that all assumptions we might have made about the final shape of what we’re building are up in the air. They could be entirely wrong, based on our misunderstanding of the problem space, or overfit to our particular way of thinking. These assumptions must face the contact with reality, be tested – and necessarily, change.

The word “bootstrapping” in the name refers to this iterative process of evolving our assumptions in collaboration with a small group of users who are able and eager to engage.

Those of you hanging out in the Breadboard project heard me use the expression “unbaked cookies”: we would like to have you try the stuff we made, and we’re pretty sure it’s not yet cooked. Our cookies might have bits of crushed glass in them, and we don’t yet know if that cool new ingredient we added last night is actually edible. Yum.

At the bootstrapping phase of the project, the eagerness to eat unbaked cookies is a precious gift. I am in awe of the folks I know who have this mindset. For them, it’s a chance to play with something new and influence – often deeply – what the next iteration of the product will look like. On the receiving end, we get a wealth of insights they generate by trying – and gleefully failing – to use the product as intended.

For this process to work, we must show a complementary eagerness to change our assumptions. It is often disheartening to see our cool ideas be dismantled with a single click or a confused stare. Instead of falling prey to the temptation of filtering out these moments, we must use them as guiding signals – these are the bits that take us toward a better product.

The relationship between the bakers of unbaked cookies and cookie testers requires a lot of trust – and this can only be built over time. Both parties need to develop a sense of collaborative relationship that allows them to take risks, challenging each other. As disconcerting it may be, some insights generated might point at fundamental problems with the product – things that aren’t fixable without rethinking everything. While definitely a last resort, such rethinking must always be on the table. Bits of technology can be changed with some work. The mental models behind the product, once it ships to the broader audience are much, much more difficult to change.

Because of that, the typical UX studies aren’t a great fit for the bootstrapping phase of the project. We’re not looking for folks to react to the validity of mental models we imbued the nascent product with. We fully realize that some of them – likely many – are wrong. Instead, we need a collaborative, tight-feedback loop relationship with the potential users, who feel entrusted with steering the product direction through them chewing on not-yet baked cookies. They aren’t just trusted testers of the product. They aren’t just evaluators of it. They are full participants in its development, representing the users.

Thinking to write

I’ve had this realization about myself recently, and it’s been rather useful in gaining a bit more understanding about how my mind works. I am writing it down in hopes it would help you in your own self-reflections.

The well-worn “Writing to think” maxim is something that’s near and dear to my heart: weaving a sequential story of the highly non-linear processes that are happening in my mind is a precious tool. I usually recommend developing the muscle for writing to think as a way to keep our thoughts organized to my colleagues and friends. Often, when I do, I am asked: “What do I write about?”

It’s a good question. At least for me, the ability to write appears to be closely connected to the space in which I am doing the thinking. It seems like the whole notion of “writing to think” might also work in reverse: when I don’t have something to write about, it might be a signal that my thinking space is fairly small or narrow.

There might be fascinating and very challenging problems that I am working on. I could be spending many hours wracking my brain trying to solve them. However, if this thinking doesn’t spur me to write about it, I am probably inhabiting a rather confined problem space.

I find that writing code and software engineering in general tend to collapse this space for me. Don’t get me wrong, I love making software. It’s one of those things that I genuinely enjoy and get a “coder’s high” from.

Yet, when doing so, I find that my thoughts are sharply focused and narrow. They don’t undulate and wander in vast spaces. They don’t get lost just for the sake of getting lost. Writing code is about bringing an idea to life. It’s a very concretizing process. Writing code is most definitely a process of writing to think, but it’s more of “writing it”, rather than “writing about it”.

The outcome is a crisp – albeit often spaghetti-like – set of instructions that are meant to be understood by a machine, which for all its complicatedness is a lot less complex than a human mind.

On the other hand, when I was doing more strategy work a few years back, I found myself brimming with ideas to write down. It was very easy to just knock out a post – nearly every idea I had was begging to be played with and turned into a story to share. I was in the wide-open space of thinking among people, and particularly long-term horizon, broad thinking, and wandering.

Nothing’s wrong with inhabiting smaller problem spaces for a little while. However, it’s probably not something I would pick as the only way of being. “Inhabiting” brings habits, and habits entrench. Becoming entrenched in smaller problem spaces means that the larger spaces become less and less accessible over time, resulting in strategic myopia.

It seems that to avoid such a diagnosis, we’ve gotta keep finding ways to think in spaces big enough to spur us to write. To use an analogy, “writing to think” is like developing a habit of brushing our teeth, and “thinking to write” is a way to check if we indeed follow this habit. If we find ourselves struggling to write, then maybe we need to broaden the problem space we inhabit.

Aircraft Carriers and Zodiac Boats

An insightful conversation with a colleague inspired me to articulate the distinction between velocity and agility – and all that implies. When I talked about velocity in the past, I sort of elided the fact that agility, or the ability to change direction at a given velocity, plays a crucial role in setting organizations for success in certain conditions. I also want to disclaim that the definition of “agility” I use here is only loosely related to the well-known/loved/hated agile methodologies.

The example I’ve used in the past is that of a zodiac boat and an aircraft carrier. Though they are capable of roughly going at the same velocity, their agility is dramatically different. The zodiac boat’s maneuverability is what gives it a decisive advantage in an environment where the situation changes rapidly. On the other hand, an aircraft carrier is able to sustain velocity for much longer periods of time, which enables it to travel around the globe.

In engineering teams, velocity and agility are often used interchangeably, and I am quite guilty of doing this as well. Only in retrospect I am realizing why some teams that I’ve worked both with (and next to) looked and acted so differently. They were valuing, respectively, velocity or agility.

When the team favors velocity, it invests significantly into its capacity to achieve maximum sustained speed for the longest possible time. Decision-making and engineering processes, tools and infrastructure all feel like that of an aircraft carrier, regardless of the team’s actual size. It’s like the question on everyone’s mind is: “Can we cross the Pacific Ocean? How many times over?” The team is designed to go far, even if that means sacrificing some velocity to its robustness.

For instance, the Blink team I led a while back was all about velocity, borrowing most of its ethos from Google engineering culture. We designed our infrastructure to enable us to ship directly from trunk through diligent test coverage and a phenomenal build system (we built our own!), we followed the practice of code reviews and the shipping process. We talked about how this team was built to run for multiple decades.

This was (and a decade later, still is), of course, the right fit for that project. Rendering engines represent highly overconstrained, immensely complex systems of a relatively well-defined shape. The team that ships a rendering engine will not one day decide to do something completely different. The word “velocity” in such a team is tightly coupled with achieving predictable results over a long period of time.

However, when the final shape of the value niche is still unknown, and the product-market fit is something we only wistfully talk about, a different structure is needed. Here, the engineering team needs to lean into agility. When they do so, the project will act very differently. It will be more like a zodiac boat: not built to run forever, but rather to zig and zag quickly.

A project structured like a zodiac boat will have alarmingly few processes and entrenched practices. “What? They don’t do code reviews?” The trunk might be unshippable for periods that would be unacceptable by any standards of an aircraft carrier team. The codebase will have large cheese holes in implementation and test coverage, with many areas only loosely sketched. In a zodiac boat project, everything is temporary, and meant to shift as soon as a new promising approach is discovered.

Such projects are also typically small-sized. Larger teams means more opinions and more coordination headwinds, so zodiac boat projects will favor fewer folks who deeply understand the code base and have no problem diving in and changing everything. They will also attract those who are comfortable with uncertainty. In a highly dynamic situation, the courage to make choices (even if they might not be the right ones), and the skill to keep the OODA loop spinning are paramount. 

A well-organized startup will have to run like a zodiac boat project. Startups rarely form around old ideas or long-discovered value niches. A lot of maneuvering will be necessary to uncover that motherlode. Any attempts to turn this zodiac boat into an aircraft carrier prematurely will dramatically reduce the probability of finding it. This is why ex-Googlers often struggle in startups: their culturally-instilled intuition will direct them to install nuclear reactors and rivet steel panels onto their boats – and in doing so, sink them.   

Which brings me to engineering mastery. In my experience, there are two kinds of successful zodiac boat projects: ones run by people who aren’t that familiar with robust software engineering, and ones who have achieved enough software engineering mastery to know which practices can be broken and disregarded.

The first group of folks succeeded accidentally. The second – intentionally. This second group knows where to leave the right cheese holes in their project, and will do so consistently with magical results.

That’s what’s so tricky about forming zodiac boat projects. One can’t just put together a bunch of engineers into a small boat and let them loose. As with any elite special force, zodiac boat projects require a crew that is tightly knit, intrinsically motivated, and skilled to extreme.

Curiously, aircraft carrier-culture organizations can sometimes produce relatively high agility through what I call ergodic agility. The ergodic agility refers to the phenomenon where a multitude of projects are given room to start and fail, and over time, through the ergodic motion, find a new value niche. Here, the maneuverability is achieved through quantity and diversity of unchanging directions.

Like the infamous quote from Shrek, this process looks and feels like utter failure from inside of most of these teams, with the lucky winner experiencing the purest form of survivorship bias.

I am not sure if ergodic agility is more or less expensive for a large organization compared to cultivating an archipelago of zodiac boat teams and culture. One thing is certain: to thrive in an ever-changing world, any organization will need to find a way to have both aircraft carriers and zodiac boats in its fleet.

AI Baklava

At its core, the process of layering resolves two counterposed forces. Both originate from the need for predictability in behavior. One from below, where the means are concrete, but the aims are abstract – and one from above, where the aims are concrete, but the means are abstract. 

Each layer is a translation of an abstraction: as we go up in the stack of layers, the means become more abstract and the aims more concrete. At each end, we want to have predictability in what happens next, but the object of our concern is different. Lower layers want to ensure that the means are employed correctly. Higher layers want to ensure that the aims are fulfilled.

For example, a user who wants to press a button to place an order in the food-ordering app has a very concrete aim in mind for this button – they want to satisfy their hunger. The means of doing that are completely opaque to the user – how this will happen is entirely hidden by the veil of abstraction.

One layer below, the app has a somewhat more abstract aim (the user pressed a button), but the means are a bit more concrete: it needs to route the tap to the right handler which will then initiate the process of placing the order.

The aims are even less concrete at a lower layer. The button widget receives a tap and invokes an event handler that corresponds to it. Here, we are unaware of the user’s hunger. We don’t know why they want this button tapped, and nor do we care: we just need to ensure that the right sequence of things transpires when a tap event is received. The means are very concrete.

A reasonable question might be: why layer at all? Why not just connect the means and aims directly? This is where another interesting part of the story comes in. It appears that we humans have limits to the level of complexity of mental models we can reasonably contain in our minds. These limits are well-known to software engineers.

For a seasoned engineer, the pull toward layering emerges nearly simultaneously with the first lines of code written. It’s such a habit, they do it almost automatically. The experience that drives this habit is that of painful untangling of spaghetti after the code we wrote begins to resist change. This resistance, this unwillingness of cooperating with its own creator is not the fault of the code. It is the limit of the engineer’s mental capacity to hold the entirety of the software held in their minds.

When I talk about software with non-technical people, they are nearly always surprised about the amount of bugs any mature software contains. It seems paradoxical that older software has more bugs than the newer code. “What exactly are y’all doing there? Writing bugs?!” Yup. The mental model capacity needed to deeply grok a well-used, well-loved piece of software is typically way beyond that of any individual human.

So we come up with ways to break software into smaller chunks to allow us to compartmentalize their mental models, to specialize. And because of the way the forces of the aims and the means are facing each other, this process of chunking results in layering. Whether we want it or not, the layering will emerge in our code.

Putting it another way, layering is the artifact of the limits of human capacity to hold coherent mental models. If we imagine a software engineer with near-infinite mental model capacity, they could likely write well-functioning, relatively bug-free code using few (if any) layers of abstraction.

The converse is also true: lesser capacity to hold a mental model of traversing from the aims to the means will lead to software with more layers of abstraction.

By now, you probably know where I am going with this. Let’s see if we can apply these insights to the realm of large language models. What kind of layering will yield better results when we ask LLMs to write software for us?

It is my guess that the mental model-holding capacity of an LLM is roughly proportional to the size of this model’s context window. It is not the parametric memory that matters here. The parametric memory reflects an LLM’s ability to choose and apply various layers of abstraction. It is the context window that places a hard limit on what a model can and can not coherently perceive holistically.

Models with smaller context windows will have to rely on thinner layers and be more clever with the abstraction layers they choose. They would have to work harder and need more assistance from their human coworkers.  Models with larger context windows would be able to get by with fewer layers.

How will the LLM-based software engineers compare to human counterparts? Here’s my intuition. LLMs will continue to be abysmally bad at understanding large code bases. There’s just way too many assumptions and tacit knowledge that lurks in those lines of code. We will likely see an industry-wide spirited attempt to solve this problem, and the solution will likely see thinning the abstraction layers within the code base to create safe, limited-scope lanes for synthetic software engineers to be effective in their work.

At the same time, LLMs will have a definite advantage over humans in learning the codebases that are well within their limits. Unlike humans, they will not get tired and are easily cloned. If I fine-tune a model on my codebase, all I need is the GPU/TPU capacity to scale it to a multitude of synthetic workers.

Putting these two together, I wonder if we’ll see the emergence of synthetic software engineering as a discipline. This discipline will encompass the best practices for the human software engineer to construct – and maintain – the scaffolding for the baklava of layers of abstraction populated by a hive of their synthetic kin.

Reinventing Organizations Redux

This one is a bit out there, if only to connect some dots and shake loose new insights. Let’s get that distant look in our eyes and contemplate a possibility that may or may not transpire. Let’s all suppose that the upcoming AI winter is mild, and we settle into the next local maxima of technological progress, surrounded by helpful semi-autonomous agents, powered by large language models. What might that look like?

A teal butterfly sitting on white keyboard, a tribute to Laloux's book cover

I am pretty sure I got the firmness of the performance asymptote wrong last May. The superlinear relationship between quality and cost is here to stay, and will shape a step-ladder-like differentiation of models based on their size. There will be larger models that produce high-quality results for a wide diversity of tasks  – and are also expensive to run. There will be smaller models that are much, much cheaper, but also can only excel at a narrow task. We are likely to see attempts to establish a common scale for model complexity, rather than one model to rule them all.

Given that, we are likely to see more emphasis on the scaffolding that connects the models of varying sizes in addition to the models themselves. For instance, many startups and larger companies are already experimenting with the “inverted matryoshka” scaffolding, where a set of models is arranged so that the smaller, cheaper models are used more frequently for the simpler tasks and the largest models are only progressively reached for more complex (and hopefully, more rare) tasks.

Sure, there will be projects that try to hide that scaffolding under a “universal model”, which upon examination, will reveal a trenchcoat filled with the assortment of models, pretending to be one. 

However, driven by the desire for agency, most will choose to rely on their access to this scaffolding to get better results. The scaffolding will be the secret sauce of success. The way we arrange the models – and how we choose and train the models for particular tasks  – will continue to be the subject of intense experimentation and optimization, even when the pace of model innovation slows down.

This last sentence holds a startling realization. If we consider that each model is a “knowledge worker” of sorts, we can view the aforementioned scaffolding as an organization. If that’s the case, we can now imagine the process of creating and managing a collection of models as organization development. Except in this organization, the majority of workers are large language models.

Already, we see academic papers suggesting waterfall-like approaches to tasks, where multiple models (also known as agents) are lined in an assembly line of sorts, passing their output to the next one. I am also seeing experiments with parallel workstreams, converging together to be ranked. Each of the juncture points in these flows is a “virtual knowledge worker”. Perhaps not in the way Frederic Laloux intended, we are reinventing organizations.

It is quite possible (likely?) that organizations we will work in will include both human and non-human workers in them. These organizations will face the same challenges that any organization will face, and likely more new challenges that we haven’t even considered. There will be levels. Simple tasks performed by armies of lower-nomenclature model-powered workers (we’ll probably call them bots). More complex tasks performed by more expensive models. People will likely be supervising, directing, or tuning knowledge work. There might be an entirely new discipline of virtual organization development that emerges as a way of studying and finding more effective ways to conduct organizations that include model-based agents as part of their workforce.

This may not come to pass. However, what feels right in this picture is that humans will still be there. And because we are unpredictable, volatile humans, who come and go, who change our minds – there will always be a need to maintain a semblance of predictability around the business that owes the organization its existence. And because of that, the relatively more predictable and malleable workers might just serve as the organization development putty: keep adjusting the mixture of non-human workers in the organization to retain its strengths as people leave and join the organization – or change within it.

Perhaps in this future, we will ask not the question of whether or not AI will replace humans – but rather the question of how non-human knowledge workers can scaffold around us  in a way that complements our gifts and gives us space to develop and grow.

The Breadboard developer cycle

Over the past couple of months, we’ve been working on a bunch of new things in the Breadboard project, mostly situated around developer ergonomics. I’ve mentioned creating AI recipes a bunch in the past, and it might be a good time to talk about the mental model of how such recipes are actually created.

❤️‍🔥 Hot reload 

One of the Breadboard project design choices we’re leaning into is the “hot reload” pattern. “Hot reload” is fairly common in the modern developer UX, and we’d like to bring it into the realm of AI recipes. In this pattern, the developer sees not just the code they write, but also results of this code running, nearly instantaneously. This creates a lightning-fast iteration cycle, enabling the developer to quickly explore multiple choices and see which ones look/work better.

Most “hot reload” implementations call for two surfaces that are typically positioned side by side: one for writing the source code, and another for observing the results. In Breadboard today, the first one is typically an editor (I use VSCode) and the second one is our nascent Breadboard debugger.

As the name “hot reload” signifies, when I save the source code file in my editor, the debugger automatically reloads to reflect the recent changes.

The typical workflow I’ve settled into with Breadboard is that I start writing the board and get it to some point where it has enough to do at least something, I save the file and play with the board in the debugger. Playing with it both informs me whether I am on the right path and gives me ideas on what to do next.

I then act on this feedback, either fixing a problem that I am seeing, or progressing forth with one of the ideas that emerged through playing.

For my long-time readers: yes, we’ve baked the OODA loop right into Breadboard.

Overall, it’s a pretty fun experience. I get to see the board rising out of code by iteratively playing with it.

🪲 Debugger

To enable this fun, Paul Lewis has been doing the magician’s work bringing up the debugger. It’s very much a work in progress, though even what’s there now is already useful for board-making.

The main purpose of the debugger is to provide both the visualization of the AI recipe that is being developed and the process of it running. As I put it in chat, I love “seeing it run”: I get to see and understand what is happening during the run — as it happens! – and dig into every bit of detail.

There’s even a timeline that I can scrub through to better help me understand how events unfolded.

One thing that gives Breadboard its powers is that it’s built around a very flexible composition system. This means that my recipes may reach for other recipes during the run – sometimes a whole bunch of them. Reasoning about that can be rather challenging without help. As we all know, indirection is amazing until we have to spelunk the dependency chains.

To help alleviate this pain, the Breadboard debugger treats recipe invocations as the front-and-center concept in the UI. The timeline presents them as layers, similar to the layers in photo/video/sound editing tools. I can walk up and down the nested recipes or fold them away to reduce cognitive load.

There’s so much more work that we will be doing here in the coming weeks and my current description of the Breadboard debugger is likely to become obsolete very quickly. However, one thing will remain the same: the iteration speed is so fast that working with Breadboard is joyful.

🧸 Exploration through play

This leads me to the key point of what we’re trying to accomplish with the development cycle in Breadboard: exploring is fun and enjoyable when the iterations are fast and easily reversible.

This is something that is near and dear to my heart. 

When the stakes in exploration are high, we tend to adopt a defensive approach to exploring: we prepare, we train, and brace ourselves when it is the time to venture out into the wilderness. Every such exploration is a risky bet that must be carefully weighted, and uncertainty reduced as much as possible.

When the stakes of exploration are low, we have a much more playful attitude. We poke and prod, we try this and that, we goof around. We end up going down the paths that we would have never imagined. We end up discovering things that couldn’t have been found by an anxious explorer.

It is the second kind of exploration that I would love to unlock with Breadboard. Even when learning Breadboard itself, I want our users to have the “just try it” attitude: maybe we don’t know how this or that node or kit works. Maybe we don’t have an answer to what parameters a recipe accepts. So we just try it and see what happens – and get reasonable answers and insights with each try.

If you’re excited about the ideas I wrote about here, please come join us. We’re looking for enthusiastic dance partners who dream of making the exploration of AI frontiers more accessible and enjoyable.

Declarative vs. imperative

I recently had a really fun conversation with a colleague about “declarative vs. imperative” programming paradigms, and here’s a somewhat rambling riff that I captured here as a result.

When we make a distinction between “declarative” and “imperative” programming, we usually want to emphasize a certain kind of separation between the “what” and the “how”. The “what” is typically the intention of the developer, and the “how” is the various means of accomplishing the “what”.

For me, this realization came a while back, from a chat with Sebastian Markbåge in the early days of React, where he stated that React is declarative. I was completely disoriented, since to me, it was HTML and CSS that were declarative, and React was firmly in the imperative land.

It took a bit of flustering and a lot of Sebastian’s patience for me to grok that these terms aren’t not fixed. Rather, they are a matter of perspective. What might seem like “you’re in control of all the details” imperative from one vantage point will look completely declarative from another.

📏 The line between the “what” and the “how” 

Instead of trying to puzzle out whether a given programming language, framework, or paradigm is declarative or imperative, it might be more helpful to identify the line that separates the “what” and the “how”.

For instance, in CSS, the “what” is the presentation of an HTML document (technically a DOM tree), and the “how” is the method by which this presentation is applied. In CSS, we declare what styling attributes we would like each document element to look like, and leave the job of applying these styles to the document in the capable hands of CSS.

Similarly, in React, we declare the structure of our components and their relationship to the state, while leaving the actual rendering of the application up to the framework.

Every abstraction layer brings some “declarativeness” with it, shifting the burden of having to think about some the implementation details from the shoulders of the developer into the layer.

If we look carefully, we should be able to see the line drawn between the “how” and the “what” in every abstraction layer.

In drawing this line, the creators of an abstraction layer – whether they are intentional about it or not – make a value judgment. They decide what is important and must remain in developer’s control, and what is not as important and could be abstracted away. I called this value judgment “an opinion” earlier in my writings.

One way to view such a value judgment is as a bet: it is difficult to know ahead of time whether or not the new abstraction layer will find success among developers. The degree of opinion underlines the risk that the bet entails. More opinionated abstraction layers make riskier bets than less opinionated ones.

If we measure reward in adoption and long-term usage, then the higher risk bets also promise higher reward: the degree of difference in opinion can serve as a strong differentiator and could prove vital to the popularity of the layer. In other words, if our layer isn’t that different from the layer below, then its perceived value isn’t that great to a developer.

Therein lies the ancient dynamic that underlies bets (or any value judgments, for that matter): when designing layers of abstraction, we are called to find that balance of being different enough, yet not too different from the layer below.

🪢 The rickety bridge of uncertainty

While looking for that balance, one of the most significant and crucial exercises that any framework, language, or programming paradigm will undertake is the one of value-trading with their potential developers.

This exercise can be described by one question: What is it that a developer needs to give up in order to unlock the full value of our framework?

Very commonly, it’s the degree of control. We ask our potential developers to relinquish access to some lower-layer capabilities, or perhaps some means to influence control flow.

Sometimes (and usually alongside with lesser control), it’s the initial investment of learning time to fully understand and gain full dexterity of wielding the layer’s opinion.

Whatever the case, there is typically a rickety bridge of uncertainty between the developer first hearing of our framework and their full-hearted adoption.

I once had the opportunity to explain CSS to an engineer who spent most of their career drawing pixels in C++, and they were mesmerized by the amount of machinery that styling the Web entails. If all you want to draw is a green box in the middle of the screen, CSS is a massively over-engineered beast. It is a long walk across unevenly placed planks to the point where the full value of this machinery even starts to make sense, value-wise. Even then, we’re constantly doubting ourselves: is this the right opinion? Could there be better ways to capture the same value?

This bridge of uncertainty is something that every opinionated layer has to cross. Once the network effects take root, the severity of the challenge diminishes significantly. We are socialized species, and the more people adopt the opinion that the abstraction layer espouses, the more normal it becomes – perhaps even becoming the default, for better or worse.

🧰 Bridging techniques

If we are to build abstraction layers, we are better off learning various ways to make the bridge more robust and short.

One technique that my colleague Paul Lewis shared with me is the classic “a-ha moment”: structure our introductions and materials in such a way that the potential value is clearly visible as early as possible. Trading is easier when we know what we’re gaining.

This may look like a killer demo that shows something that nobody else can do (or do easily). It may look like a tutorial that begins with a final product that just begs to be hacked on and looks fun to play with. It could also be a set of samples that elegantly solve problems that developers have.

Another technique is something that Bernhard Seefeld is actively experimenting with: intentionally designing the layer in such a way that feels familiar at first glance, but allows cranking up to the next level – incrementally. You can see this work (in progress 🚧) in the new syntax for Breadboard: it looks just like a typical JS code at first, rapidly ramping up to graph serialization, composition, and all the goodies that Breadboard has to offer.

I am guessing that upon reading these examples, you immediately thought of a few others. Bridging techniques may vary and the technique playbook will keep growing, but one thing that unites them all is that they aim to help developers justify the trade of their usual conveniences for the brave new world of the layer’s opinion.

Designing new layers is an adventure with the indeterminate outcome. We might be right about our value judgments, or we might be wrong. It could be that no matter how much we believe in our rightness, nobody joins to share our opinion with us. No technique will guarantee the outcome we wish for. And that’s what makes API design and developer experience work in general so fun for me. 

Hourglass model and Breadboard

Architecturally, Breadboard aspires to have an hourglass shape to its stack. As Bernhard Seefeld points out, this architecture is fairly common in compilers, and other NLP tools, so we borrowed it for Breadboard as well.

In this brief essay, I will use the Breadboard project to expand a bit on the ins and outs of the hourglass-shaped architecture.

At the waist of the hourglass, there’s a  single entity, a common format or protocol. In Breadboard, it is a JSON-based format that we use to represent any AI recipe (I called them AI patterns earlier, but I like the word “recipe” better for its liveliness). When you look through the project, you’ll notice that all recipes are JSON files of the same format. 

The actual syntax of the JSON file may change, but the meaning it captures will stay the same. This is one of the key tenets of designing an hourglass stack: despite any changes around it, the semantics of the waist layer must stay relatively constant.

The top part of the hourglass is occupied by the recipe producers, or “frontends” in the compiler lingo. Because they all output to the same common format, there can be a great variety of these. For example, we currently have two different syntaxes for writing AI recipes in TypeScript and JavaScript. One can imagine a designer tool that allows creating AI recipes visually. 

Nothing stops someone from building a Python or Go or Kotlin or any other kind of frontend. As long as it generates the common format as its output, it’s part of the Breadboard hourglass stack.

The bottom part of the stack is where the recipe consumers live. The consumers, or “backends”, are typically runtimes: they take the recipe, expressed in the common format and run it. At this moment in Breadboard, there’s only a Javascript runtime that runs in both Node and Web environments. We hope that the number of runtimes expands. For instance, wouldn’t it be cool to load a Breadboard recipe within a colab? Or maybe run it in C++? Breadboard strives for all of these options to be feasible.

Runtimes aren’t the only kinds of backends. For instance, there may be an analysis backend, which studies the topography of the recipe and makes some judgments about its integrity or other kinds of properties. What sorts of inputs does this recipe take? What are its outputs? What are the runtime characteristics of this recipe?

Sometimes, it might be challenging to tell a backend apart from a frontend: an IDE may include both frontends (editor, compiler, etc.) and backends (runtime, debugger, etc.). However, it’s worth drawing this distinction at the system design stage.

The main benefit of an hourglass stack design is that it leaves a lot of room for experimentation and innovation at the edges of the stack, while providing enough gravitational pull to help the frontends and backends to stick together. Just like for any common language or protocol, the waist of the hourglass serves as a binding agreement between the otherwise diverse consumer and producer ideas. If this language or protocol is able to convey valuable information (which I believe we do with AI recipes in Breadboard), a network effect can emerge under the right conditions: each new producer or consumer creates combinatorial expansion of possibilities – and thus, opportunities for creating new value – within the stack.

It is far too early to tell whether we’ll be lucky enough to create these conditions for Breadboard, but I am certainly hopeful that the excitement around AI will serve as our tailwind.

The key risk for any hourglass stack is that it relies on interoperability: the idea that all consumers and producers interpret the common format in exactly the same way. Following the Hyrum’s Law, any ambiguity in the semantics of the format will eventually result in disagreements across the layers.

These disagreements are typically trailing indicators of the semantic ambiguities. By the time the bugs are identified and issues are filed, it is often too late to fix the underlying problems. Many languages committees spend eons fighting the early design mistakes of their predecessors.

As far as I know, the best way to mitigate this risk is two-fold. We must a) have enough experience in system design to notice and address semantic ambiguities early and b) have enough diversity of producers and consumers within the hourglass early in the game.

To be sure, a system design “spidey sense”, no matter how well-developed, and the ability to thoroughly exercise the stack early don’t offer 100% guarantees. Interoperability tends to be a long-term game. We can flatten the long tail of ambiguities, but we can’t entirely cut it off. Any hourglass-shaped architecture must be prepared to invest a bit of continuous effort into gardening its interoperability, slowly but surely shaving down the semantic ambiguities within the common format.

The protocol force

Here’s a riff on the “hourglass model” conversation I’ve had with Bernhard Seefeld a while back. I only recently connected it with my earlier mumblings about software layering, and here’s what came out.

If a technology stack is arranged in such a way that there are consumers of value on one end of the stack and producers of it on the other, then at a certain scale, there emerges a force that encourages one middle layer of the stack – the waist – to be as  thin as possible. There will be a distinct preference for lower diversity of offerings at that layer. IP, HTTP, and HTML are all great examples of this force’s effects.

I am going to call this force the “protocol force”, since the outcome of this force is usually a common protocol or format that is used for communicating across that layer. 

To sketch out the mechanics behind the protocol force, the upper layers want to have access to more consumers, and those are only accessible through the lower layers. The lower layers want to get more producers, and only the upper layers can provide those. To get either, the layers have to go through the waist, and they all want to minimize the cost of bridging, of implementing all the permutations that enable consumers and producers to interact with each other. Put differently, everyone wants to spend as little as possible to reach as many consumers/producers as possible.

For instance, if we have multiple interaction protocols available in the waist layer, a growth spurt of an ecosystem of producers and consumers around this layer will trigger the  “winner takes all” dynamics: a protocol that gets some critical mass of consumers (or producers) will reduce the appeal of using other protocols. As these other protocols will fall into disuse, the one that “won” will continue to gain new adopters  – thus thinning the waist layer.

Interestingly, this protocol force may not manifest in the adjacent layers, even if they look like the connectors between the upper and the lower layers. As long as there’s one layer that is thin, the adjacent layers may enjoy high diversity of offerings. 

A good example of this is the Web frameworks. There are always so many of them, right? Why is that? My explanation of this phenomenon, applying the reasoning above, is that they are protected from the effect of the protocol force by the Web Platform layer (the HTML/CSS/JS combo that we all love and enjoy) that sits right underneath them. This layer is incredibly thin: thanks to the set of circumstances and sheer designers’ luck, we only have one Web Platform.

Because of that, there can be a cornucopia of Web frameworks: as long as what they output is Web Platform-compatible, they won’t experience the pressure to agglomerate or race to the bottom.

The protocol force will be present in any layered stack that has producers and consumers as outer layers. One of the bridging layers will succumb to it. If we’re designing a layered system, we are better off preparing for this force to tap us on the shoulder and ask us to pick a layer to squish. Any resistance is futile. Might as well plan for it.

One of my favorite recent examples of such a design is the Protocol Buffers, a framework that Google uses internally to connect disparate pieces of functionality into one coherent whole. As long as a program speaks protos, it can mingle in Google-land. Whether intentionally or not, protocol buffers serve as the waist layer of Google infrastructure.

One of the key challenges of designing something as versatile as the Web Platform or protocol buffers is the fact that this layer will be the most used and, sadly. abused part of our stack. Once squished into a thin waist, this layer becomes the great enabler of our future – or the source of our doom. Its flexibility and capacity to convey the intentions of developers will be constantly tested and stretched and often broken. This is a sign of success and a good thing. As long as we’re prepared to invest into continuously improving and developing the waist of our stack, we can harness the protocol force to help us, rather than hinder.