AI Baklava

At its core, the process of layering resolves two counterposed forces. Both originate from the need for predictability in behavior. One from below, where the means are concrete, but the aims are abstract – and one from above, where the aims are concrete, but the means are abstract. 

Each layer is a translation of an abstraction: as we go up in the stack of layers, the means become more abstract and the aims more concrete. At each end, we want to have predictability in what happens next, but the object of our concern is different. Lower layers want to ensure that the means are employed correctly. Higher layers want to ensure that the aims are fulfilled.

For example, a user who wants to press a button to place an order in the food-ordering app has a very concrete aim in mind for this button – they want to satisfy their hunger. The means of doing that are completely opaque to the user – how this will happen is entirely hidden by the veil of abstraction.

One layer below, the app has a somewhat more abstract aim (the user pressed a button), but the means are a bit more concrete: it needs to route the tap to the right handler which will then initiate the process of placing the order.

The aims are even less concrete at a lower layer. The button widget receives a tap and invokes an event handler that corresponds to it. Here, we are unaware of the user’s hunger. We don’t know why they want this button tapped, and nor do we care: we just need to ensure that the right sequence of things transpires when a tap event is received. The means are very concrete.

A reasonable question might be: why layer at all? Why not just connect the means and aims directly? This is where another interesting part of the story comes in. It appears that we humans have limits to the level of complexity of mental models we can reasonably contain in our minds. These limits are well-known to software engineers.

For a seasoned engineer, the pull toward layering emerges nearly simultaneously with the first lines of code written. It’s such a habit, they do it almost automatically. The experience that drives this habit is that of painful untangling of spaghetti after the code we wrote begins to resist change. This resistance, this unwillingness of cooperating with its own creator is not the fault of the code. It is the limit of the engineer’s mental capacity to hold the entirety of the software held in their minds.

When I talk about software with non-technical people, they are nearly always surprised about the amount of bugs any mature software contains. It seems paradoxical that older software has more bugs than the newer code. “What exactly are y’all doing there? Writing bugs?!” Yup. The mental model capacity needed to deeply grok a well-used, well-loved piece of software is typically way beyond that of any individual human.

So we come up with ways to break software into smaller chunks to allow us to compartmentalize their mental models, to specialize. And because of the way the forces of the aims and the means are facing each other, this process of chunking results in layering. Whether we want it or not, the layering will emerge in our code.

Putting it another way, layering is the artifact of the limits of human capacity to hold coherent mental models. If we imagine a software engineer with near-infinite mental model capacity, they could likely write well-functioning, relatively bug-free code using few (if any) layers of abstraction.

The converse is also true: lesser capacity to hold a mental model of traversing from the aims to the means will lead to software with more layers of abstraction.

By now, you probably know where I am going with this. Let’s see if we can apply these insights to the realm of large language models. What kind of layering will yield better results when we ask LLMs to write software for us?

It is my guess that the mental model-holding capacity of an LLM is roughly proportional to the size of this model’s context window. It is not the parametric memory that matters here. The parametric memory reflects an LLM’s ability to choose and apply various layers of abstraction. It is the context window that places a hard limit on what a model can and can not coherently perceive holistically.

Models with smaller context windows will have to rely on thinner layers and be more clever with the abstraction layers they choose. They would have to work harder and need more assistance from their human coworkers.  Models with larger context windows would be able to get by with fewer layers.

How will the LLM-based software engineers compare to human counterparts? Here’s my intuition. LLMs will continue to be abysmally bad at understanding large code bases. There’s just way too many assumptions and tacit knowledge that lurks in those lines of code. We will likely see an industry-wide spirited attempt to solve this problem, and the solution will likely see thinning the abstraction layers within the code base to create safe, limited-scope lanes for synthetic software engineers to be effective in their work.

At the same time, LLMs will have a definite advantage over humans in learning the codebases that are well within their limits. Unlike humans, they will not get tired and are easily cloned. If I fine-tune a model on my codebase, all I need is the GPU/TPU capacity to scale it to a multitude of synthetic workers.

Putting these two together, I wonder if we’ll see the emergence of synthetic software engineering as a discipline. This discipline will encompass the best practices for the human software engineer to construct – and maintain – the scaffolding for the baklava of layers of abstraction populated by a hive of their synthetic kin.

Reinventing Organizations Redux

This one is a bit out there, if only to connect some dots and shake loose new insights. Let’s get that distant look in our eyes and contemplate a possibility that may or may not transpire. Let’s all suppose that the upcoming AI winter is mild, and we settle into the next local maxima of technological progress, surrounded by helpful semi-autonomous agents, powered by large language models. What might that look like?

A teal butterfly sitting on white keyboard, a tribute to Laloux's book cover

I am pretty sure I got the firmness of the performance asymptote wrong last May. The superlinear relationship between quality and cost is here to stay, and will shape a step-ladder-like differentiation of models based on their size. There will be larger models that produce high-quality results for a wide diversity of tasks  – and are also expensive to run. There will be smaller models that are much, much cheaper, but also can only excel at a narrow task. We are likely to see attempts to establish a common scale for model complexity, rather than one model to rule them all.

Given that, we are likely to see more emphasis on the scaffolding that connects the models of varying sizes in addition to the models themselves. For instance, many startups and larger companies are already experimenting with the “inverted matryoshka” scaffolding, where a set of models is arranged so that the smaller, cheaper models are used more frequently for the simpler tasks and the largest models are only progressively reached for more complex (and hopefully, more rare) tasks.

Sure, there will be projects that try to hide that scaffolding under a “universal model”, which upon examination, will reveal a trenchcoat filled with the assortment of models, pretending to be one. 

However, driven by the desire for agency, most will choose to rely on their access to this scaffolding to get better results. The scaffolding will be the secret sauce of success. The way we arrange the models – and how we choose and train the models for particular tasks  – will continue to be the subject of intense experimentation and optimization, even when the pace of model innovation slows down.

This last sentence holds a startling realization. If we consider that each model is a “knowledge worker” of sorts, we can view the aforementioned scaffolding as an organization. If that’s the case, we can now imagine the process of creating and managing a collection of models as organization development. Except in this organization, the majority of workers are large language models.

Already, we see academic papers suggesting waterfall-like approaches to tasks, where multiple models (also known as agents) are lined in an assembly line of sorts, passing their output to the next one. I am also seeing experiments with parallel workstreams, converging together to be ranked. Each of the juncture points in these flows is a “virtual knowledge worker”. Perhaps not in the way Frederic Laloux intended, we are reinventing organizations.

It is quite possible (likely?) that organizations we will work in will include both human and non-human workers in them. These organizations will face the same challenges that any organization will face, and likely more new challenges that we haven’t even considered. There will be levels. Simple tasks performed by armies of lower-nomenclature model-powered workers (we’ll probably call them bots). More complex tasks performed by more expensive models. People will likely be supervising, directing, or tuning knowledge work. There might be an entirely new discipline of virtual organization development that emerges as a way of studying and finding more effective ways to conduct organizations that include model-based agents as part of their workforce.

This may not come to pass. However, what feels right in this picture is that humans will still be there. And because we are unpredictable, volatile humans, who come and go, who change our minds – there will always be a need to maintain a semblance of predictability around the business that owes the organization its existence. And because of that, the relatively more predictable and malleable workers might just serve as the organization development putty: keep adjusting the mixture of non-human workers in the organization to retain its strengths as people leave and join the organization – or change within it.

Perhaps in this future, we will ask not the question of whether or not AI will replace humans – but rather the question of how non-human knowledge workers can scaffold around us  in a way that complements our gifts and gives us space to develop and grow.