This principle is something that feels very intuitive at first blush, but in my experience, is rather challenging to adhere to, especially as a team.
I will present this principle as resolving a tension between two forces that are very familiar to me in the realm of developer experience. I am pretty sure that these forces are still present in any other product development, albeit they may have different specific traits.
When building developer products for others, we often have a situation where the development environments that our customers have are different from ours.
They may use different stacks, build systems, or apply different processes from ours. Conversely, we may have our special tools that we’re used to and our particular practices that we employ to be more productive.
This difference forms the basis for the tension. Clearly, to ship a product that my customer loves and is willing to adopt means that I need to understand their environment. I need to know deeply the problems that they are facing every day: what is difficult? What is easy? Where can I help?
At the same time, I have my own environment that I am very comfortable in, honed by years of practice and incremental improvements. This is the environment that works best for me. This is the environment that I understand deeply, with all its quirks and idiosyncrasies.
The platonic ideal here is that I have it both ways: I deeply understand both environments, am able to hold both of them in mind, and develop for one while working in another. If you can do this, kudos to you. But more than likely, there’s a bit of self-delusion going on. Based on my experience, this is simply not possible.
Instead, we subconsciously lean toward problems that we encounter in our environments, and tend to be blind toward the ones that our potential customers have. When we ship a thing, it looks like an alien artifact. It appears to solve problems that our customers don’t have, or try to solve their problems in weird, unworkable ways.
Imagine you’re an alien who was hired to be a chef. You’re asked to cook for humans. You can’t eat human food, and some of it looks revolting, honestly. Fried eggs. Blegh. How likely are you to cook something that humans will like?
This tension grows stronger if the difference between the environments is large. Putting it very bluntly: if our developer experience is so different that it feels like an island, we can’t build developer experience products that others will love — or even understand.
To resolve this tension, we must live as our customers. We must strive to shift to as close to the same environment as they have. If our customers use Github as their primary tool, we’d better use Github, as well. If the customers we target mostly use Fortran (bless them!), well then we must learn and adopt it as well.
Applying this principle is usually super-uncomfortable at first. Nobody wants to abandon their well-worn saddle. The new saddle will cause cramps and sore muscles for a while. Expect a lot of pushback and well-reasoned arguments to return to the trusted old saddle. “This bug tracker sucks! I can’t even use keyboard shortcuts to navigate between issues! Our is so much better!” “OMG, this build system is arcane! What is all this XML stuff?! I thought we were in the 21st century?!”
There’s a kind of test that is built into this struggle. We signed up to build for these customers. Do we actually want to do that?
If the answer is still “yes”, we will find that we will be better off in the long term. We will have a much deeper sense of what our customers need, and where the pain points are. We will be able to spot them early and build things that they want to use.
It seems pretty obvious that there is some sort of cyclical nature to technological progress. There seems to be a rhythm to how things happen, and many words were written to discern the structure of the cycle. My introduction to the cycle came by way of Tim Wu’s The Master Switch. In that book, the author proposed a cyclical rhythm of open and closed ecosystems alternating as new innovations emerge. Echoes of similar dynamics are found in Crossing the Chasm, The Innovator’s Dilemma, and other books. It’s like everyone is pointing at the elephant and trying to glimpse the full shape of it.
A tiny part of this elephant is the rise of makers. The rise of makers is the time in this cycle of progress when those who tinker and prototype emerge as the best candidates for finding the Next Big Thing.
Every story of a tech giant’s humble beginning in a garage is the story of makers. My assertion is that the reason why we hear fewer of these stories in some periods of time than others is not because we run out of bright, entrepreneurial minds and certainly not because we run out of garages. Rather, the prominence of makers is a function of the cycle of technological progress.
Put differently, makers’ contributions result in significant outcomes in one part of the cycle and shift to the margins otherwise.
📻 Who are makers?
Perhaps I could back up a bit and start by describing what I mean by “makers”.
Makers are tinkerers and prototypers who are engaging with a new technology not because they have to as part of their job, but because they find it irresistibly interesting and fun. I like to say that developers are “9-to-5-ers” and makers are “9pm-to-early-morning-ers”. This might not be an entirely accurate description, but it captures the spirit. It also hints at the key property of a maker: it’s not a kind of person. It’s a mindset. The same person who puts in their work hours at their day job becomes a maker extraordinaire in the evening or weekends.
Makers make stuff. They aren’t here for entertainment, fame, or other promise of future boon. They get their hands dirty. They build stuff. In the language of Crossing the Chasm, they are the kind of early adopter who doesn’t just adopt the tech. They make new things with it.
Makers play with technology, rather than apply it to achieve business goals. They delight in finding weird quirks and twists, the same way gamers love finding glitches that lead to a speedrun. They find all of your design flaws and unintended side effects – and turn them into a feature. The whole process is messy and often in the “life, uh… finds a way” manner – what makers make may look very different from the technologists’ intended range of use.
Makers write crappy code and wire things on breadboards. They rarely care about the future enterprise strength of their project. They explore ideas non-linearly, random-walking. They scrap half-finished projects and repurposing them into the new ones. All this contributes to the seeming disarray of the maker scene. A good sign of a maker project are fix-forward collaboration practices.
Makers are here to discover something new, to bravely explore. They crave being first to uncover some way to make technology do a thing that nobody else had seen before. Makers become increasingly more disinterested with a particular technology as it matures and becomes polished. Polish and reliability mean that the tech has become mainstream – and thus, less likely to yield a “holy crap!” moment.
Being a maker means being in constant search of that moment. When the thing finally works and goes viral on Twitter, and investors come knocking – it’s a maker’s dream come true. Often, it’s also the end of a maker’s journey. Once the new big thing is found, makers shift to become businesspeople. The fun hobby project grows into a rapidly growing team around the newfound thing. Not all makers choose that path. After all, the thrill of exploration does get replaced by the mundane concerns of running the business. However, those who do, they wield power to reshape and create industries.
… At least, when the conditions are right.
🧫 Conditions for makers’ rise
When a novel technological capability moves forth to acquire an interface, and a broader audience begins to interact with it, there’s a question that hangs in the air: “What is this thing actually good for?”
This is the value question. Whoever answers this question first gains a temporary advantage: until everyone else also figured it out, they can seize the opportunity to acquire this value.
Makers arrive at the scene right about then. They start poking at the interface and make things with it. The amount of power makers will have at this point depends on whether or not they can answer the value question sooner than anyone else.
What are the properties of the technological capability (and the environment it is introduced into) that put makers in the driver’s seat?
I’ve been thinking about this a bit, and I keep coming back to these three: 🔒 access to technology, 🏞️ openness of space, and 🚀 iteration velocity.
All these properties interact with each other, so they aren’t exactly orthogonal.
🔓Access to technology is both the property of new technological capability and its environment. It is probably best measured in the number of makers who could practically start using the capability.
It is the property of the capability because technologists who introduce the interface can choose to make it more or less accessible. It is also the property of the environment, because other, adjacent technological advances may make the capability more accessible than before.
A good example of the capability becoming more accessible due to shifts in environment is the introduction of widely available high-speed Internet. Without any discernible change in how the Web worked (no change in capability itself), the increase in bandwidth created new opportunities for makers to spur what is known as “Web 2.0”.
🏞️ The openness of space is reflected by the number of new opportunities created by the introduction of the technology. A good marker of a wide-open space is that a typical market-sizing exercise keeps collapsing into a fractal mess, creating more questions than answers. It’s not just one thing that suddenly becomes possible, but a whole bunch of things – and there’s this feeling that we have only scratched the surface.
Open spaces favor makers, because they require random-walk and large quantities of participants to facilitate the ergodic exploration of the space. Well-established players tend to be subject to their embodied strategies, leaving them unaware of vast portions of the space, and thus highly vulnerable to your usual innovator’s dilemma and counter-positioning.
🚀 Finally, the iteration velocity is what gives makers the edge. Makers rule tight feedback loops. The shorter the lead times, the more likely makers will show up in the leaderboards. If something can be put together quickly, count the makers to stumble into a thing that actually works. Conversely, if the new technology requires lengthy supply chains and manufacturing processes, makers would play at best supporting roles.
Iteration velocity is also influenced by the level of stakes in the game. The higher are the stakes, the less velocity we’ll see. For example, we are unlikely to see makers playing with passenger airplanes or power grids. Those are the areas where the lead times are necessarily long, and no matter how exciting the innovation, makers won’t play a pivotal role in those spaces.
📈 Makers rising
There is no doubt in my mind that we’re experiencing another moment of makers rising to prominence. The galaxy-sized spaces opened by generative AI, its accessibility, and the velocity with which one could put together an impressive prototype – all are pointing at the notion that perhaps the next big thing will come from a tinkerer’s garage. My intuition is that we’re in an historic moment that we haven’t seen since the birth of the Internet – or perhaps even larger than that.
We are living in the age of makers rising. I am not sure how long it will last. After all, the prominence of makers is relatively short-lived, just a moment in the large story of the technological advance. But oh boy, is this moment significant.
I may have written about this before. It’s a trick that I’ve learned intuitively over the years, and I’ve found it tremendously helpful. It tends to work best in situations where the outcome is uncertain.
When adventuring forth into the unknown, we usually focus on the thing that we want to achieve. Even if the chances of it happening are slim, we set our eyes on the prize and commit to persevere through whatever challenges are thrown at us.
It is just before this point that I usually pause and spend a bit of time contemplating the gifts of failure. The question I ask myself is simple: “What good can come out of failing in this venture?” There are usually plenty of platitudes about learning from failure and all that wonderful stuff that is talked about incessantly in books about innovation. This is not what I am looking for.
When I ask that question, I am looking for tangible benefits that might arise as a result of the failure. Once we’ve reached the threshold where we definitely know we’ve been defeated – by circumstances, a worthy foe, or our own limitations – what are we left with?
Put differently, when all we have is to pick up the broken pieces of our failed enterprise, how quickly can we repurpose these pieces to start anew?
If we don’t contemplate this question earnestly, the answer is usually “not much”. In such cases, the collapse is dramatic and total. Very little of the effort put into the project is recoverable. If anything good comes of it, it’s sheer luck.
The stepping stones tactic is one way through which we can maximize this luck. In Greatness Cannot Be Planned, the authors argued that small increments just outside of the adjacent possible – the stepping stones! – are what ultimately feeds innovation and big discoveries.
By reflecting on the gifts of failure, we can sketch our path as a series of stepping stones that we create for ourselves. What are the stepping stones will we need to produce anyway as we walk toward our vision? Which ones can take on the life of their own, and serve us even after our initial project has failed? Through this lens, the product of our endeavor will no longer look like a monolithic thing. Instead, it will be a modular construct that can flexibly rearrange itself into something completely different, becoming highly resilient.
As another tactic more useful for smaller projects, I sometimes maintain a small portfolio of secondary benefits that could be promoted as primary. These benefits must be somewhat independent of each other, so that if the initial attempts prove fruitless, there’s another path forward that opens up by refocusing on another potentiality.
Both of these tactics arise from reflecting on the original question. When we accept that in highly uncertain, volatile environments, any exploration is much more likely to fail than to succeed, we stop seeing the target of our adventure as the mythical “end all be all”, and focus on accumulating opportunities and potentialities along the way.
I have been lucky enough to be part of the process of shaping several organizations, yet only uncovered this nugget of insight very recently. Very briefly, it helps to discern the forces of the broader environment that influence us without our awareness.
It is very rare that the leaders of a newly formed organization want it to behave exactly the same as the surrounding environment. If that is the case, we have a distinction without a difference, a label rather than an organization. Teams and organizations are created to do something (even if slightly) different from the rest of the environment. At the same time, they are still in that environment and are influenced by the forces of this environment.
If I want to create a team that innovates daringly while situated within an organization that is designed for keeping an important chunk of infrastructure running. I can speak all I want about inspiring creativity and fearless questing for ideas to my team – but if I don’t account for the fact that most processes around me are designed and incentives are structured for a “keeping” mission, none of that may matter. Implacably, my attempts to encourage innovation will come out shaped like infrastructure improvement projects. It will feel like an invisible force being exerted on the newly born team, veering it off our intended course.
More often than not – especially in larger organizations – such forces are invisible until we start deviating from their direction. These forces are meant to organize, to bring various people’s intentions into alignment in service of the mission. Only when we try to do something different do we start experiencing friction.
Two questions might help you start making these more visible.
The first question is “what is our gravity?” This will take a little bit of self-reflection. We rarely ask ourselves why our feet stick magically to the ground. Similarly, organizations have gravity-like forces that everyone is used to. They tend to hide in structural and cultural bits of the org. If you’d like, use my fourplex of constraints to investigate. Which ones of these do we need to counteract when deviating from the default organization configuration? OKRs are probably my favorite example here. No matter how much we try to overcome it, the language of OKRs inexorably drives us to short-term thinking. If we try to build an organization that thinks in a longer timeframe than the rest of its OKR-driven environment, we will have a difficult time placing ourselves within it. “Wait, these people don’t have OKRs? Do they even do anything?”
The second question is “what is our water?”, echoing the parable presented by David Foster Wallace. What are the things that we take for granted to such a degree that we don’t even notice them anymore? How do these things influence how we think and act? Where are they out of alignment with our intentions? If our team has a strong engineering culture, we will naturally have biases that will filter out less “engineer-ey” ideas – and that might be alright, unless these ideas are essential for making progress toward our intended mission.
The answers to these two questions are nearly always tentative and wrong, at least initially. It is important to keep examining them as the time goes on.
A while back, one of my colleagues was reflecting on their first year on a team at which we both worked at the time: “At first, I thought the hardest part will be to get creativity and inspiration going. Surely, the problem was that we weren’t thinking cleverly enough. Oh, boy was I wrong. It’s exactly the opposite! Everyone here is incredibly smart and is bursting with creative energy. Now I believe that where we struggle the most is in applying all that energy in a way that leads to shared learning that persists.” My colleague’s guess was even more refined and nuanced the next year.
Understanding what is stopping us from where we want to go is key to getting there. If we don’t stop to examine the forces that hold us in place, the best we can hope for is an external shock that jolts us out of our trap.
One of my colleagues quipped recently that instruction tuning might not be about teaching large language models (LLMs) what to do, but rather about reminding LLMs of the wealth of wisdom they already have at their foundation. Here’s a rather unkempt vignette on what wisdom I would like an LLM to recall.
I would like to help a model remember how to solve problems that are a bit more complex, and require some rigor in reasoning. We are well underway in our first generation of LLM reasoning explorations, and I am looking a bit behind the current horizon.
Here is my hypothesis: an LLM that is capable of reasoning about a broad variety of problems would be biased to structure its reasoning in a certain way. That is, when we ask it to “think step by step”, it will rely on some predefined structure of these steps while performing a text completion.
If this hypothesis is true, reasoning structures will be at the core of dramatically improving the reasoning ability of the models. Put differently, I anticipate that the next cycle of model evolution will be about exploring the space of reasoning structures and finding more and more capable local maximas of such structures.
The future of instruction tuning might not be about the quantity or even quality of the data, but rather about the rigor of the overarching reasoning structure in the data sets.
To make my “reasoning structure” concept a bit more clear, I tried to conjure one up relying on the problem understanding framework that I’ve created a while back. This is just one rough sketch, but it should give you some idea of what the heck I am talking about.
LLMs are already able to address solved problems with moderate success. The implicit code execution shows a straightforward way to approach them: once the problem is deemed as something that was already solved before, we ask an LLM to write the minimal bit of code to produce the solution. Crunching numbers, reversing words, making charts, and various similar kinds of problems fall in this class.
LLMs are making decent progress on tackling solvable problems. Solvable problems have no well-established solution, but we know what the solution looks like – also known as “the ground truth”. The presence of this ground truth is what enables reasoning boxes, and their iterative approach to finding the solution.
The yet untapped potential of LLMs is in making sense of unsolvable problems. Unsolvable problems are highly adaptive, and often appear to be solving us rather than the other way around. Things like relationships, organizations, politics, and ecosystems all fall into this class of problems – and usually, we don’t end up solving them. Instead, we navigate through them as best we can. A good way to picture an unsolvable problem is thinking of an open ocean. We can’t solve the high seas, but we can navigate them.
The key to navigating an unsolvable problem is in investing a bit of time into sensing the various forces inside of the problem that make it impossible to solve once and for all. I call this process “framing the problem”. A successful framing produces one or more solvable problems that when solved, help us navigate the unsolvable problem.
Let’s use this simple taxonomy as the foundation of a reasoning structure.
The structure will likely begin with making a guess at the class of the problem: are we looking at a solved, solvable, or unsolvable problem? I experimented a bit with asking LLMs about Cynefin, and was pleasantly surprised by the quality of these guesses. Perhaps this is the foundational wisdom of the model that my colleague was talking about?
Based on the outcome of the initial guess, the structure will guide the reasoning to branch and engage with the problem differently based on its class.
🔨 Handling solved problems
For solved problems, an LLM would attempt to just answer it, or write a bit of code as I mentioned above – just as most question-answering LLM-based applications do today. Like so:
Question: What is the square root of 38495?Response: The problem of determining a square root of a number is a solved problem. Here is a program that returns a square root of 38495:```javascriptconsole.log(Math.sqrt(38495));``` The answer is the output of the program.
For this problem class, this is not as much as solving the problem as retrieving the solution from the depth of the model’s wisdom.
🧩 Actually solving problems
If the problem is solvable, the reasoning structure will assume an iterative stance with two interlinked lines of reasoning.
The first line of reasoning tries to answer the question of “Has the problem been solved?”. This is the actual process of solving the problem.
The second line of reasoning tries to answer the question “Do I know everything I need to know to verify that the problem has been solved?”. This is the process of establishing the problem context, which determines the constraints that could be used to verify that the solution is correct. This context serves as the ground truth for the problem.
Establishing problem context is a problem in itself, and the reasoning structure will branch further into various techniques that could be effective. For example, the LLM might decide to write a suite of tests that, when run, will prove or disprove that the answer is correct. It could also opt to ask the user to provide feedback on the answer and learn from that feedback.
It might even choose an “inner monologue” technique, where the ground truth is present only in a socratic sense, with the LLM constantly questioning its own assumptions and making progress by spotting inconsistencies in them. As long as the assumptions are visible, there is an opportunity to question and correct them.
Whatever the choice, the reasoning structure will need to presume that both solving and context-finding processes are iterative. The LLM will no longer attempt to solve the problem in one completion, but rather focus on making progress toward the solution in multiple iterations and prefer frequent contact with the ground truth.
This stance is different from the typical “question-answer” one of the solved problem class. I’ve been able to get the current generation of models to do so with mixed results, and it feels like an important capability to develop, perhaps through instruction tuning.
The model will also need to be capable of keeping track of where in the reasoning structure it is. Given that solving a problem might take multiple completions, there needs to be something that helps the model orient in relation to the overall progress of solving the problem.
Question: Write a to-do list appResponse: Writing a to-do list app is a solvable problem, because it is very clear when the app is built and whether it allows viewing the to-do list, as well as adding and removing items from the list. What is the information am I missing to solve this problem? I need to ask the following questions: - Other than showing a to-do list, adding, and removing items from the to-do list, what other features are important to you? - Should this app support multiple lists or is just one list sufficient?- On what device will you use this app?
Then, a few interactions later:
Question: Write a to-do list appContext: This is a solvable problem, and we are currently in the process of determining requirements. Previous interactions established the following requirements: - app will support only one to-do list- app will run on an iPhone- app will allow viewing one to-do list- it should be possible to add and remove items on the list- it should be possible to sort items on the list. Response: <completion continues here>
Today, this tracking is primarily done by recording the history of the problem-solving interactions, like a chat transcript. Most chat models will accept the history of the previous messages and are tuned to treat it as guidance for the text completion. I can’t wait to see the new, more effective methods of retaining problem context be discovered.
For instance, a more sophisticated reasoning structure might guide a model to detect that the problem could be split into multiple sub-problems that could be solved more easily, in parallel or in sequence. In such cases, the problem context would map into multiple sub-problems and eventually reduce back to one, once the sub-problems are solved. For example:
Question: Write an app that allows me to create a to-do listResponse: Writing a to-do list app is a solvable problem, because it can be tested if the app:- successfully starts- allows viewing the to-do list- supports adding and removing items from the list.To simplify the problem, I will break it up into two sub-problems:- the backend that provides the to-do list API for storing and retrieving the list- the frontend that consumes this API.The context for the backend sub-problem will be:- write a backend for a to-do list app<detailed backend spec goes here>- write an API for the backend that supports three methods: retrieve the list, add a new item, and remove an existing item.<detailed spec of the API goes here>The context for the frontend sub-problem will be:- write a Web app frontend for a to-do list app- the fronted will support viewing to-do list on mobile devices, and specifically on Safari for iPhone <detailed frontend spec goes here>- the frontend will integrate with the following API<detailed spec of the API goes here>
🌀 Framing unsolvable problems
When the LLM guesses the problem as unsolvable, the reasoning structure guides the process toward discerning the tangle of forces that creates unsolvability. This part of the process is less about finding solutions and more about building a reasonable dynamic model of the problem.
I am not yet sure how such a model could be represented efficiently as text, but it will likely start with a list of statements that form into a causal graph. For example, here’s how a model may respond to a complex question:
Question: Should I stay in college or start my own AI company?Response: making a decision on whether to stay in college or start an AI company appears to be an unsolvable problem. There does not seem to be one right answer and it is hard to check the correctness of the answer.To better understand the forces at play, I will ask the following clarifying questions:- What is difficult about making this decision?- Why does it have to be an either-or decision?- What holds you back from starting a company?- What is so attractive about starting a company?- Why is leaving college necessary?- Why is staying in college important for you?
After a few rounds of questions and answers, the graph will begin to take shape, the LLM might identify a few leverage points: solvable problems within the larger unsolvable problem.
Question: Should I stay in college or start my own AI company?Context: This is a complex problem and we are currently in the process of building a model of the problem. Here are the edges of the model graph:- AI is very exciting, and there are a lot of cool opportunities - If I wait, I will miss these opportunities- To graduate successfully, I need to wait- To build a company, I need a lot of time- To build successful foundation for my life, I need to excel at school- To excel at school, I need a lot of time- Time is an extremely limited resource<graph edges continue>Response: Based on the graph I have so far, I believe that time might be a leverage point in this problem. I will recommend framing the original problem as that of optimizing for the most effective use of time and creating space for both college and company in the schedule.Suggested problem statement: Optimize my schedule to enable both staying in college and starting my own AI company.
Once the framing emerges, the reasoning structure will follow the recipe for the solvable problem that popped out. While the original problem remains unsolved, there are now concrete steps that the user can take to navigate it.
The dynamic model that is created by the process of navigating an unsolvable problem is a valuable artifact. In fact, it probably needs to be retained. Depending on the success and failure of the framing that was produced, it will need to be updated and refined.
🩹 Detecting problem classification mistakes
Because the first step of the reasoning structure is just a guess, the structure will need to provide guidance to detect the mistake and shift the problem into a different class.
Some of these mistakes aren’t as bad as the others. For example, mistaking a solved problem as solvable or a solvable problem as unsolvable will both result in inefficient use of resources, but produce a satisfying answer.
There is another category of mistakes where the solution is produced, but isn’t valid, like when a solvable problem was diagnosed as solved, and the attempt to solve the problem results in the wrong answer. I wonder if some sort of “inner critic” reasoning might be necessary even for solved problems to mitigate that.
The mistake that is a bit more troublesome is when the unsolvable problem was guessed to be solvable. It will look like inability to pin down constraints, creating a sense of divergence, rather than convergence toward the objective. Detecting this might be challenging, because the reasoning will continue finding new solvable sub-problems, and appear to make the infinite-regress sort of progress. Here, the reasoning structure will need some sort of a trip wire: a way to ask the model to lift its pencil and say “buddy, you’re spiraling out of control, this might be an unsolvable problem”.
😎 Mission impossible
Part of this essay was an exercise of trying to write a sketch of a reasoning structure in its entirety. Now, you and I could look at it and wonder whether or not something like a reasoning structure might even be feasible to imbue into a model.
It definitely feels a bit daunting. However, if the concept seems intriguing to you, please give it a try. Maybe hack together something very simple to start small, and expand from there. Who knows, maybe through this inquiry, the full depth of wisdom of a large language model will be revealed to us.
I’ve been playing with various software development configurations that might enable rapid experimentation, and landed on this particular one. I am pretty sure there are even more effective ways, and I can’t wait to learn from you about them. This is what I have so far.
A quick disclaimer: this is not yet another “how to set up a repo” tutorial. It’s mostly a capture of my learnings. I will refer to a couple of such tutorials, though.
To set things up. I was looking for a way to enable a small-ish team to enable development of rapid prototypes. That is, write something, see if it does what we need, test the waters, learn like crazy, and break no sweat if it doesn’t.
🧫 Ecosystem
The first question on my mind was that of the developer ecosystem. To unlock fertile learning through testing the waters, prototypes need to ship. They do not have to ship as polished products with firm SLAs, but they do need to reach some users who would be willing to mess with the prototypes, react to them, and provide feedback. To maximize the chances of serendipitous feedback, we must play in the most populous ecosystems of folks who like to mess with unpolished stuff. When choosing a place to learn, pick the rowdiest bazaar.
This requirement narrowed down the possible environments quite a bit. Looking at Stack Overflow survey results, the two ecosystems stood out as by far the most legible for the title: Javascript developers and Python developers. They form the fat head of the developer environment power curve. These are the places to play.
I then spent a bunch of time messing with both environments, and ended up deciding on the Javascript ecosystem. There were several reasons for that, not all of them entirely objective. Roughly, it all came down to two factors:
Javascript runs both in the browser and on the server, and the surprising amount of code and infrastructure that can be shared between the two allows for fewer jumping through hoops to make things go;
The overall state of the scaffolding and tooling in the Javascript ecosystem seems to be a touch less messy than that of Python, with Python still overcoming some of the legacy warts around package publishing, environment isolation, transition to python3, and addition of types. At least for me, I found that I end up fighting Python more often than fighting Javascript.
🧰 Toolchain
After picking the environment, I wasted a bunch of time resisting TypeScript. As a Javascript old-timer and a known build step grump, I really didn’t want to like it. But after getting over my hang ups, I must admit: TypeScript is basically the best thing that could ever happen to unlock rapid prototyping. As long as I know where the layer gaps are (hint: the missing runtime type support), it’s basically the perfect tool for the job. Especially with the way it is integrated into VSCode, TypeScript hovers at just the right altitude to help me write the code quickly and have high confidence in this code working on the first run.
Which brings me to the next increment in my journey. If we choose TypeScript, we must go with VSCode as the development surface. I am sure there are other cool editors out there (I hear you, vim/emacs fans!), but if we’re looking for something that fits TypeScript like a glove, there is simply no substitute. Combined with eslint and prettier, the VSCode support for TypeScript makes development an enjoyable experience.
So… Node, Web, TypeScript, VSCode. These are the choices that came out of my exploration. I briefly played with the various Node package managers, and concluded that npm is likely the thing to stick with. I love the idea behind pnpm and yarn is super-fun, but at least for me, I decided to go with what comes in the box with Node. Deno is cool, too – but as a newcomer, it simply doesn’t meet the “rowdiest bazaar” bar.
The choices made so far define the basic shape of the prototypes we will develop and the sketch of the development flow. The prototypes will be either shipped as Web apps or libraries/tools as npm packages. Every prototype will start as an npm package. It might have server-only code, client-only code, or a mix of both. Prototypes that look like tools and libraries will be published on npm.
#️⃣ Runtime versions and settings
I invested a bit of time deciding on versions and settings of TypeScript and Node. One key guiding principle I chose was “as close to the metal as possible”. TypeScript compiler is quite versatile and it can output to a variety of targets to satisfy the needs of even the most bizarre deployments. Given that we’re prototyping and writing new code, we don’t need to concern ourselves with the full breadth of deployment possibilities – and we certainly can be choosy about the version of the browser we expect to present our experiments.
With this leeway and the recognition that TypeScript is mostly an implementation of ECMAScript (the standard behind Javascript) plus type annotations, we can configure the TypeScript compiler to mostly remove type annotations.
For Node, I chose to go with v18.16, primarily because this is the version that introduced the real fetch implementation, which matches what modern Web browsers ship.
So, if we have Node 18 and the config of the TypeScript below, we should minimize the amount of new code introduced by the TypeScript compiler and maximize the client/server code compatibility.
As an aside, there was a fun rabbit hole of a layer gap into which I fell while exploring this space. Turns out, Node TypeScript type annotations don’t have the declarations for the fetch implementation. So I ended up doing this funky thing with adding the “DOM” library to the TypeScript config. This worked better than I expected. As long as we remember that a) TypeScript types are not seen by the actual Javascript runtime and b) most of the actual DOM objects aren’t available in Node, one can get away with a lot of fun hacks. For example, we can run unit tests for client-side code on the server!
🏠 Repository configuration and layout
With versions and runtime configs squared away, I proceeded to fiddle with configuring the repository itself. I first started with the “let a thousand tiny repos bloom” idea, but then quickly shifted toward the Node monorepo. This choice might seem weird given the whole rapid prototyping emphasis. The big realization for me was that we want to encourage our prototypes to mingle: we want them to easily reuse each other’s bits. It is out of those dependencies that interesting insights emerge. We might spot a library or a tool in a chunk of code that every other prototype seems to rely on. We might recognize patterns that change how we think about the boundaries around prototypes and would need space to reshape them. With all prototypes being individual packages, the friction of dependency tracking will simply prevent that.
There are multitudes of ways in which one could bring up a TypeScript monorepo. I really liked this guide, or this setup that relies exclusively on the TypeScript compiler to track dependencies. Ultimately, I realized that I prefer to use separate build tools that track the dependency build graph, and invoke the compiler to do their bidding. This is the setup that Vercel’s Turborepo folks advocate, and this is the one I ended up choosing.
Any Node monorepo will loosely have this format: there will be a bunch of config files and other goop in the root of the repository, and then there will be a directory or two (usually called “packages” or “apps”) that contains directories for the individual packages.
My intuition is that to facilitate rapid prototyping, we need a convention that reflects the state of any package in the monorepo. For example, we could have two package-holding directories, one for “seeds” and one for “core”. In the “seeds” directory, we place packages that are early prototypes that we’re just playing around with. Once a package acquires dependencies and becomes useful for other prototypes, we graduate to the “core” directory.
Another useful convention when working with Node monorepos is that the npm package names are all scoped under the same npm organization and the name of that organization matches the name of the repo.
So for example, if our monorepo is named “awesome-crew-prototypes”, all packages are published under the “@awesome-crew-prototypes” npm organization. For example, a prototype for a library that does URL parsing will be published as “@awesome-crew-prototypes/url-parser”. This way, the fact that the “url-parser” is part of the “awesome-crew-prototypes” monorepo is reflected in its name.
🚀 Team practices
As the final challenge, I worked out the best practices for the team that might be working in this repository. This section is the least well-formed, since typically, the practices emerge organically from collaborating together and depend quite a bit on the mix of the people on the team.
Having said that, the following rules of thumb felt right right as the foundation for the practices:
Mingle – seek to reuse other packages that we build, but don’t panic if that doesn’t work out. Think of reuse as very, very early indicators of a package usefulness.
Keep the rewrite count high – don’t sweat facing the possibility of rewriting the code we’re writing multiple times.
Duct tape and popsicle sticks – since we’re likely going to rewrite it, what lands does not need to be perfect or even all that great, as long as it gets the job done.
Ship many small things – rather than aiming for a definite product with a “wow” release moment, look to ship tiny tools and libraries that are actually helpful.
Armed with all of these, a team that is eager to experiment should be able to run forward quickly and explore the problem space that they’ve chosen for themselves, and have fun along the way. Who knows, maybe I’ll actually set up one of these myself one day. And if I do, I’ll let you know how it goes.
I also quickly put together a template for the environment that I described in this post. It probably has bugs, but should give you a more concrete idea of the actual setup.
There seems to be some layering rhythm to how software capabilities are harnessed to become applications. Every new technology tends to grow these four layers: Capabilities, Interfaces, Frameworks, and Applications.
There does not seem to be a way of skipping or short-cutting around this process. The four layers grow with or without us. We either develop these layers ourselves or they appear without our help. Understanding this rhythm and the cadence of layer emergence could be the difference between flopping around in bewilderment and growing a durable successful business. Here are the beats.
⚡ Capabilities
Every new technological capability usually spends a bit of time in a purgatory of sorts, waiting for its power to become accessible. It needs to traverse the crevasse of understanding: move from being grokable by only a handful of those who built it to some larger audience. Many technologies dwell in this space for a while, trapped in the minds of inventors or in the hallways of laboratories. I might be stating the obvious here: the process of inventing something is not enough for it to be adopted.
I will use large language models as the first example in this story, but if you look closely, most technological advances follow this same rhythm. The transformer paper and the general capability for building models has been around for a while, but until last year, it was mostly contained to the few folks who needed to understand the subject matter deeply.
🔌 Interfaces
The breakthrough usually comes in the form of a new layer that emerges on top of the capability: the Interfaces layer. This is typically what we see as the beginning of the technology adoption growth curve. The Interfaces layer can be literally the API for the technology or any other form of simplifying contract that enables more people to start using the technology.
The Interfaces layer serves as the democratizer of the Capabilities layer: what was previously only accessible to the select few – be that due to the complex nature of the technology, capital investment costs, or some other barrier – is now accessible to a much larger group. This new audience is likely still fractionally small compared to all the world’s population, but it must be numerous enough for the tinkering dynamic to emerge.
This tinkering dynamic is key to the success of the technology. Tinkerers aren’t super-familiar with how the technology works. They don’t have any of the deep knowledge or awareness of its limits. This gives them a tremendous advantage over the inventors of the technology – they aren’t trapped by the preconceived notions of what this technology is about. Tinkers tinker. Operating at the Interfaces layer, they just try to apply the tech in this way and that and see what happens.
Many research and development organizations make a crucial mistake by presuming that tinkering is something that a small group of experts can do. This usually backfires, because for this phase of the process to play out successfully, we need two ingredients: 1) folks who have their own ideas about what they might do with the capabilities and 2) a large enough quantity of these folks to actually start introducing surprising new possibilities.
Because of this element of surprise, tinkering is a fundamentally unpredictable activity. This is why R&D teams tend to not engage in it. Especially in cases when the societal impact of technology is unclear, there could be a lot of downside hidden in this step.
In the case of large language models, OpenAI and StabilityAI were the ones who decided that this risk was worth it. By providing a simple API to its models, OpenAI significantly lowered the barrier to accessing the LLM capabilities. Similarly, by making their Stable Diffusion model easily accessible, StabilityAI ushered a new era of tinkering with multimodal models. They were the first to offer the large language models Interfaces layer.
Because it’s born to facilitate the tinkering dynamic, the Interfaces layer tends to be opinionated in a very specific way. It is concerned with reducing the burden of barriers to entry. Just like any layer, it does so by eliding details: to simplify, some knobs and sliders become no longer accessible to the consumer of the interface.
If the usage of the Interfaces layer starts to grow, this indicates that the underlying Capabilities layer appears to have some inherent value, and there is a desire to capture as much of this value as possible.
🔋Frameworks
This is the point at which a new layer begins to show up. This third layer, the Frameworks, focuses on utility. This layer asks: how might we utilize underlying the Interfaces layer in more effective ways, and make it even more accessible to an even broader audience?
Utility might mean different things in different situations: in some, the value of rapidly producing something that works is the most important thing. In others, it is the overall performance or reliability that matters most. Most often, it’s some mix of both and other factors.
Whatever it is, the search for maximizing utility results in development of frameworks, libraries, tools, and services that consume the Interfaces layer. Because there are many definitions of utility and many possible ways to achieve it, the Frameworks layer tends to be the most opinionated of the stack.
In my experience, the diversity of opinion introduced in the Frameworks layer depends on two factors: the inherent value of the capability and the own opinion of the Interfaces layer.
The first factor is fairly straightforward: the more valuable the capability, the more likely there will be a wealth of opinions that will grow in the Framework layer.
The second factor is more nuanced. When the Interfaces layer is introduced, its authors build it by applying their own mental model of how the capability will be used via their interface. Since there aren’t actual users of the layer yet, it is at best a bunch of guesses. Then, the process of tinkering puts these guesses to the test. Surprising new uses are discovered, and the broadly adopted mental models of the consumers of the interface usually differ from the original guesses.
This difference becomes the opinion of the Interfaces layer. The larger this difference, the more effort the Frameworks layer will have to put into compensating for this difference – and thus, create more opportunities for even more diversity of opinion.
An illustration for how this plays out is the abundance of the Web frameworks. Since the Web browser started out as the document-viewing application, it still has all of those early guesses firmly entrenched. Indeed, the main API for web development is called the Document Object Model. We all have moved on from this notion, asking our browsers to help us conduct business, entertain us, and work with us in many more ways than the original designers of this API envisioned. Hence, the never-ending stream of new Web frameworks, each trying yet another way to close this mental model gap.
It is also important to call out a paradox that develops as a result of the interaction between the Frameworks and the Interfaces layer. The Frameworks layer appears to simultaneously apply two conflicting pressures to the Interfaces layer below: to change and to stay the same.
On one hand, it is very tempting for the Interfaces layer maintainers to change it, now that their initial guesses were tested. And indeed, when talking to the Frameworks layers developers, the Interfaces layer maintainers will often hear requests for change.
At the same time, changing means breaking the existing contract, which creates all kinds of trouble for the Frameworks layer – these kinds of changes are usually a lot of work (see my deprecation two-step write-up from a while back) and take a long time.
The typical state for a viable Interfaces layer is that it is mired in a slow slog of constant change whose pace feels glacial from the outside. Once the Frameworks layer emerges, the API layer becomes increasingly more challenging to evolve. For those of you currently in this slog, wear it as a badge of honor: it is a strong signal that you’ve made something incredibly successful.
The Frameworks layer becomes the de-facto place where the best practices and patterns of applying the capability are developed and stored. This is why it is only after a decent Frameworks layer appears do we start seeing robust Applications actually utilizing the capability at the bottom of the stack.
📱Applications
The Applications layer tops our four-stack of layers. This layer is where the technology finally faces its users – the consumers of the technological capability. These consumers might be the end users who aren’t at all technology-savvy, or they could be just another group of developers who are relieved to not have to think about how our particular bit of technology works on the inside.
The pressure toward maximizing utility develops at this layer. Consumer-grade software is serious business, and it often takes all available capacity to just stay in the game. While introducing new capabilities could be an appealing method to expand this business, at this layer, we seek the most efficient way possible to do so. The whole reason the Frameworks layer exists is to unlock this efficiency – and to further scale the availability of the technology.
This highlights another common pitfall of a research organization is to try to ram a brand new capability right into an application, without thinking about the Interfaces and Frameworks layers between them. This usually looks like a collaboration between the team that builds at the Capabilities layer and the team that builds at the Application layer. It is usually a sordid mess. Even if the collaboration nominally succeeds, neither participant is happy in the end. The Capability layer folks feel like they’ve got the most narrow and unimaginative implementation of their big idea. The Application folks are upset because now they have a weird one-off turd in their codebase.
👏 All together now
Getting technology adopted requires cultivating all four layers. To connect Capabilities to Applications, we first need the Interfaces layer that undergoes a significant amount of tinkering, with a non-trivial amount of use case exploration that helps map out the potential space of solutions that the new technology can actually solve. Then, we need the Frameworks layer to capture and embody the practices and patterns that trace the shortest paths across the explored space.
This is exactly what is playing out with the large language models. While ChatGPT is getting all the attention, the actual interesting work is happening at the Frameworks layer that sits on top of the large language model Interfaces layer: the OpenAI, Anthropic, and PaLM APIs.
The all-too-common trough of disillusionment that many technological innovations encounter can be described as the period of time between the Capability layer becoming available and the Interfaces and Frameworks layers filling in to support the Applications layer.
For instance, if you want to make better guesses about the future of the most recent AI spring, pay attention to what happens with projects like LangChain, AutoGPT, and other tinkering adventures – they are the ones accumulating the recipes and practices that will form the foundation of the Frameworks layer. They will be the ones defining the shape of the Applications layer.
Here’s the advice I would give to any team developing a nascent technology:
Once the Capabilities layer exists, immediately focus on developing the Interfaces layer. For example, if you have a cool new way to connect devices wirelessly, offer an API for it.
Do make sure that your Interfaces layer encourages tinkering. Make the API as simple as possible, but still powerful enough to be interesting. Invest into capping the downside (misuse, abuse, etc.). For example, start with an invitation-only or rate-limited API.
Avoid the comforting idea that just playing with the Interfaces layer within your team or organization constitutes tinkering. Seek out a diverse group of tinkerers. Example: opt for a public preview program rather than an internal-only hackathon.
Prepare for the long slog of evolving the Interfaces layer. Consider maintaining the Interfaces layer as a permanent investment. Grow expertise on how to maintain the layer effectively.
Once the Interfaces layer usage starts growing, watch for the emergence of the Frameworks layer. Seed it with your own patterns and frameworks, but expect them not to take root. There will be other great tool or library ideas that you didn’t come up with. Give them all love.
Do invest in growing a healthy Frameworks layer. If possible, assume the role of the Frameworks layer facilitator and patron. Garden this layer and support those who are doing truly interesting things. Weed out grift and adversarial players. At the very least, be very familiar with the Frameworks landscape. As I mentioned before, this layer defines the shape of Applications to come.
Do build applications that utilize the technology, but only to learn more about the Frameworks layer. Use these insights to guide changes in Interfaces and Capabilities layer.
Be patient. The key to finding valuable opportunities is in being present when these opportunities come up – and being the most prepared to pursue these opportunities.
If you orient your work around these four layers, you might find that the rhythm of the pattern begins to work for you, rather than against you.
Now that I play a bit more with open source projects, more observations crystallize into framings, and more previous experiences start making sense. I guess that’s the benefit of having done this technology thing for a long time – I get to compost all of my learnings and share the yummy framings that grow on top of them.
One such framing is the distinction between two stances that projects have in regard to bad code committed into the repository: the fix-forward stance and the rollback stance.
“Bad code” in this scenario is usually the code that breaks something. It could be that the software we’re writing becomes non-functional. It could be as subtle as a single unit test begins to fail. Of course, we try to ensure that there are strong measures to prevent bad code from ever sneaking into the repository. However, no matter how much continuous integration infrastructure we surround ourselves with, bad code still occasionally makes it through.
When the project has a fix-forward stance, when the bad code is found, we keep moving forward, fixing the code with further commits.
In the rollback stance, we identify and immediately revert the offending commit, removing the breakage.
⏩ The fix-forward stance
The fix-forward stance tends to work well in smaller projects, where there is a high degree of trust and collaboration between the members of the project. The breakage is treated as a “fire”, and everyone just piles on to try and repair the code base.
One way to think of the fix-forward stance is that it places the responsibility of fixing the bad code on the collective shoulders of the project members.
One of my favorite memories from working on the WebKit project were the “hyatt landed” moments, when one of the founding members of the project would land a massive chunk of code that introduces a cool new feature or capability. This chunk usually broke a bunch of things, and members of the project would jump on putting out the fires, letting the new code finish cooking in the repository.
The obvious drawback of the fix-forward stance is that it can be rather randomizing. Fixing bad code and firefighting can be exhilarating for a few times, but grows increasingly frustrating, especially as the project grows in size and membership.
Another drawback of fixing forward is that it’s very common for more bad code to be introduced while fighting the fire, resulting in a katamari ball of bugs and a prolonged process of deconstructing this ball and weeding out all the bugs.
🔁 The rollback stance
This is where the rollback stance becomes more appealing. In this stance, the onus of responsibility for the breakage is on the individual contributor. If my code is deemed to be the culprit, it is simply ejected from the repository, and it’s on me to figure out the source of the brokenness.
In projects with the rollback stance, there is often a regular duty of “sheriffing” where an engineer or two are deputized to keep an eye on the build to spot effects of bad commits, hunt them down, and roll them back. The sheriff usually has the power to “close the tree”, where no new code is allowed to land until the problematic commit was reverted.
It is not fun to get a ping from a sheriff, letting me know that my patch was found to be the latest suspect in the crimes against the repository. There’s usually a brief investigation, with the author pleading for innocence, and a quick action of removing the commit from the tree.
The key advantage of the rollback stance is that it’s trustless in its nature, and so it scales rather well to large teams with diverse degrees of engagement. It doesn’t matter if I am a veteran who wrote most of the code in the project or someone who is making their first commit in hobby time – everyone is treated in the same way.
However, there are also several drawbacks. First, it could take a while for complicated changes to land. I’ve seen my colleagues orchestrate intricate multi-part maneuvers to ensure that all dependencies are properly adjusted and do not introduce breakages in the process.
There is also a somewhat unfortunate downside of the trustless environment: because it is on me to figure out the problem, it can be rather isolating. What could have been a brief firefighting swarm in a fix-forward project can turn into a long lonely slog of puzzling over the elusive bug. This tends to particularly affect less experienced and introverted engineers, who may spend weeks or even months trying to land a single patch, becoming more and more dejected with each rollback.
Similarly, it takes nuance and personal awareness to be an effective sheriff. A sheriff must constantly balance between quick action and proper diagnosis. Quite often, actually innocent code gets rolled out, while the problematic bits remain — or the sheriff loses large chunks of time while trying to diagnose the problem too deeply, and thus holding up the entire project. While working on Chromium, I’ve seen folks who are genuinely good at this job – and folks who I would rather not be sheriffing at all.
Because it is trustless, a rollback stance can easily lead to confrontational and zero-sum dynamics. Be very careful here and cultivate the spirit of collaboration and sense of community, lest we end up with a project where everyone is out for themselves.
📚 Lessons learned
Which stance should you pick for your next project? I would say it really depends on the culture you’d like to hold up as the ideal for the project.
If this is a small tight-knit group of folks who already work together well, the fix-forward stance is pretty effective. Think startups, skunkworks, or prototyping shops that want to stay small and nimble.
If you’d like your project to grow and accept many contributors, a rollback stance is likely the right candidate – as long as it is combined with strong community-building effort.
What about mixing the two? My intuition is that a combination of both stances can work within the same project. For example, some of the more stable, broader bits of the project could adopt the rollback stance, and the more experimental parts could be in a fix-forward stance. As long as there is a clean dependency separation between them, this setup might work.
One thing to avoid is inconsistent application of the stance. For example, if for our project, we decide that seasoned contributors could be allowed to use the fix-forward stance, and the newcomers would be treated with rollbacks, we’ll have a full mess on our hands. Be consistent and be clear about the stance of your project – and stick with it.
To make the asymptote and value niches framing a bit more concrete, let’s apply it to the most fun (at least for me) emergent new area of developer experience: the various developer tools and services that are cropping up around large language models (LLMs).
As the first step, let’s orient. The layer above us is AI application developers. These are folks who aren’t AI experts, but are instead experienced full-stack developers who know how to build apps. Because of all the tantalizing promise of something new and amazing, they are excited about applying the shiny new LLM goodness.
The layer below us is the LLM providers, who build, host, and serve the models. We are in the middle, the emerging connective tissue between the two layers. Alright – this looks very much like a nice layered setup!
Below is my map of the asymptotes. This is not a complete list by any means, and it’s probably wrong. I bet you’ll have your own take on this. But for the purpose of exercising the asymptotes framing, it’ll do.
🚤 Performance
I will start with the easiest one. It’s actually several asymptotes bundled into one. Primarily because they are so tied together, it’s often difficult to tell which one we’re actually talking about. If you have a better way to untangle this knot, please go for it.
Cost of computation, latency, availability – all feature prominently in conversations with AI application developers. Folks are trying to work around all of them. Some are training smaller models to save costs. Some are sticking with cheaper models despite their more limited capabilities. Some are building elaborate fallback chains to mitigate LLM service interruptions. All of these represent opportunities for AI developer tooling. Anyone who can offer better-than-baseline performance will find a sound value niche.
Is this a firm asymptote or a soft one? My guess is that it’s fairly soft. LLM performance will continue to be a huge problem until, one day, it isn’t. All the compute shortages will continue to be a pain for a while, and then, almost without us noticing, they will just disappear, as the lower layers of the stack catch up with demand, reorient, optimized – in other words, do that thing they do.
If my guess is right, then if I were to invest around the performance asymptote, I would structure it in a way that would keep it relevant after the asymptote gives. For example, I would probably not make it my main investment. Rather, I would offer performance boosts as a complement to some other thing I am doing.
🔓 Agency
I struggled with naming this asymptote, because it is a bit too close to the wildly overused moniker of “Agents” that is floating around in AI applications space. But it still seems like the most appropriate one.
Alex Komoroske has an amazing framing around tools and services, and it describes the tension perfectly here. There is a desire for LLMs to be tools, not services, but the cost of making and serving a high-quality model is currently too high.
The agency asymptote clearly interplays with the performance asymptote, but I want to keep it distinct, because the motivations, while complementary, are different. When I have agency over LLMs, I can trace the boundary around it – what is owned by me, and what is not. I can create guarantees about how it’s used. I can elect to improve it, or even create a new one from scratch.
This is why we have a recent explosion of open source models, as well as the corresponding push to run models on actual user devices – like phones. There appears to be a lot of AI developer opportunities around this asymptote, from helping people serve their models to providing tools to train them.
Is this value niche permanent or temporary? I am just guessing here, but I suspect that it’s more or less permanent. No matter how low the costs and latency, there will be classes of use cases where agency always wins. My intuition is that this niche will get increasingly smaller as the performance asymptote gets pushed upward, but it will always remain. Unless of course, serving models becomes so inexpensive that they could be hosted from a toaster. Then it’s anyone’s guess.
💾 Memory
LLMs are weird beasts. If we do some first-degree sinning and pretend that LLMs are humans, we would notice that they have the long-term memory (the datasets on which they were trained) and the short-term memory (the context window), but no way to bridge the two. They’re like that character from Memento: know plenty of things, but can’t form new memories, and as soon as the context window is full, can’t remember anything else in the moment.
This is one of the most prominent capability asymptotes that’s given rise to the popularity of vector stores, tuning, and the relentless push to increase the size of the context window.
Everyone wants to figure out how to make an LLM have a real memory – or at least, the best possible approximation of it. If you’re building an AI application and haven’t encountered this problem, you’re probably not really building an AI application.
Based on how I see it, this is a massive value niche. Because of the current limitation of how the models are designed, something else has to compensate for its lack of this capability. I fully expect a lot of smart folks to continue to spend a lot of time trying to figure out the best memory prosthesis for LLMs.
What can we know about the firmness of this asymptote? Increasing the size of the context window might work. I want to see whether we’ll run into another feature of the human mind that we take for granted: separation between awareness and focus. A narrow context window neatly doubles as focus – “this is the thing to pay attention to”. I can’t wait to see and experiment with the longer context windows – will LLMs start experiencing the loss of focus as their awareness expands with the context window?
Overall, I would position the slider of the memory asymptote closer to “firm”. Until the next big breakthrough with LLM design properly bridges the capability gap, we’ll likely continue to struggle with this problem as AI application developers. Expect proliferation of tools that all try to fill this value niche, and a strong contentious dynamic between them.
📐 Precision
The gift and the curse of an LLM is the element of surprise. We never quite know what we’re going to get as the prediction plays out. This gives AI applications a fascinating quality: we can build a jaw-dropping, buzz-generating prototype with very little effort. It’s phenomenally easy to get to the 80% or even 90% of the final product.
However, eking out even a single additional percentage point comes at an increasingly high cost. The darned thing either keeps barfing in rare cases, or it is susceptible to trickery (and inevitable subsequent mockery), making it clearly unacceptable for production. Trying to connect the squishy, funky epistemological tangle that is an LLM to the precise world of business requirements is a fraught proposition – and thus, a looming asymptote.
If everyone wants to ship an AI application, but is facing the traversal of the “last mile” crevasse, there’s a large opportunity for a value niche around the precision asymptote.
There are already tools and services being built in this space, and I expect more to emerge as all those cool prototypes we’re all seeing on Twitter and Bluesky struggle to get to shipping. Especially with the rise of the agents, when we try to give LLMs access to more and more powerful capabilities, it seems that this asymptote will get even more prominent.
How firm is this asymptote? I believe that it depends on how the LLM is applied. The more precise the outcomes we need from the LLM, the more challenging they will be to attain. For example, for some use cases, it might be okay – or even a feature! – for an LLM to hallucinate. Products built to serve these use cases will feel very little of this asymptote.
On the other hand, if the use case requires an LLM to act in an exact manner with severe downside of not doing so, we will experience precision asymptote in spades. We will desperately look for someone to offer tools or services that provide guardrails and telemetry to keep the unruly LLM in check, and seek security and safety solutions to reduce abuse and incursion incidents.
I have very little confidence in a technological breakthrough that will significantly alleviate this asymptote.
🧠 Reasoning
One of the key flaws in confusing what LLMs do with what humans do comes from the underlying assumption that thinking is writing. Unfortunately, it’s the other way around. Human brains appear to be multiplexed cognition systems. What we presume to be a linear process is actually an emergent outcome within a large network of semi-autonomous units that comprise our mind. Approximating thinking and reasoning as spoken language is a grand simplification – as our forays into using LLMs as chatbots so helpfully point out.
As we try to get the LLMs to think more clearly and more thoroughly, the reasoning asymptote begins to show up. Pretty much everyone I know who’s playing with LLMs is no longer using just one prompt. There are chains of prompts and nascent networks of prompts being wired to create a slightly better approximation of the reasoning process. You’ve heard me talk about reasoning boxes, so clearly I am loving all this energy, and it feels like stepping toward reasoning.
So far, all of this work happens on top of the LLMs, trying to frame the reasoning and introduce a semblance of causal theory. To me, this feels like a prime opportunity at the developer tooling layer.
This asymptote also seems fairly firm, primarily because of the nature of the LLM design. It would take something fundamentally different to produce a mind-like cognition system. I would guess that, unless such a breakthrough is made, we will see a steady demand and a well-formed value niche for tools that help arrange prompts into graphs of flows between them. I could be completely wrong, but if that’s the case, I would also expect the products that aids in creating and hosting these graphs will be the emergent next layer in the LLM space, and many (most?) developers will be accessing LLMs through these products. Just like what happened with jQuery.
There are probably several different ways to look at this AI developer experience space, but I hope this map gives you: a) a sense of how to apply the asymptotes and value niches framing to your problem space and b) a quick lay of the land of where I think this particular space is heading.
I have been thinking lately about a framing that would help clarify where to invest one’s energy while exploring a problem space. I realized that my previous writing about layering might come in handy.
This framing might not work for problem spaces that aren’t easily viewed in terms of interactions between layers. However, if the problem space can be viewed in such a way, we can then view our investment of energy as an attempt to create a new layer on top of an existing one.
Typically, new layers tend to emerge to fill in the insufficient capabilities of the previous layers. Just like the jQuery library emerged to compensate for consistency in querying and manipulating the document object model (DOM) across various browsers, new layers tend to crop up where there’s a distinct need for them.
This happens because of the fairly common dynamic playing out at the lower layer: no matter how much we try, we can’t get the desired results out of the current capabilities of that layer. Because of this growing asymmetry of effort-to-outcome in the dynamic, I call it “the asymptote” – we keep trying harder, but get results that are about the same.
Asymptotes can be soft and firm.
Firm asymptotes typically have something to do with the laws of physics. They’re mostly impossible to get around. Moore’s law appears to have run into this asymptote as the size of a transistor could no longer get any smaller.
Soft asymptotes tend to be temporary and give after enough pressure is applied to them. They are felt as temporary barriers, limitations that are eventually overcome through research and development.
One way to look at the same Moore’s law is that while the size of the transistor has a firm asymptote, all the advances in hardware and software keep pushing the soft asymptote of the overall computational capacity forward.
When we think about where to focus, asymptotes become a useful tool. Any asymmetry in effort-to-outcome is usually a place where a new layer of opinion will emerge. When there’s a need, there’s potential value to be realized by serving that need. There’s a potential value niche around every asymptote. The presence of an asymptote represents opportunities: needs that our potential customers would love us to address.
Depending on whether the asymptotes are soft or firm, the opportunities will look differently.
When the asymptote is firm, the layer that emerges on top becomes more or less permanent. These are great to build a solid product on, but are also usually subject to strong five-force dynamics. Many others will want to try to play there, so the threat of “race to the bottom” will be ever-present. However, if we’re prepared for the long slog and have the agility to make lateral moves, this could be a useful niche to play in.
The jQuery library is a great example here. It wasn’t the first or last contender to make life easier for Web developers. Among Web platform engineers, there was a running quip about a new Web framework or library being born every week. Yet, jQuery found its place and is still alive and kicking.
When the asymptote is soft, the layer we build will need to be more mercurial, forced to adapt and change as the asymptote is pushed forward with new capabilities from the lower layer. These new capabilities of the layer below could make our layer obsolete, irrelevant – and sometimes the opposite.
A good illustration of the latter is how the various attempts to compile C++ into Javascript were at best a nerdy oddity – until WebAssembly suddenly showed up as a Web platform primitive. Nerdy oddities quickly turned into essential tools of the emergent WASM stack.
Putting in sweat and tears around a soft asymptote usually brings more sweat and tears. But this investment might still be worth it if we have an intuition that we’ll hit the jackpot when the underlying layer changes again.
Having a keen intuition of how the asymptote will shift becomes important with soft asymptotes. When building around a soft asymptote, the trick is to look ahead to where it will shift, rather than grounding in its current state. We still might lose our investment if we guess the “where” wrong, but we’ll definitely lose it if we assume the asymptote won’t shift.
To bring this all together, here’s a recipe for mapping opportunities in a given problem space:
Orient yourself. Does the problem space look like layers? Try sketching out the layer that’s below you (“What are the tools and services that you’re planning to consume? Who are the vendors in your value chain?”), the layer where you want to build something, and the layer above where your future customers are.
Make a few guesses about the possible asymptotes. Talk to peers who are working in or around your chosen layer. Identify areas that appear to exhibit the diminishing returns dynamic. What part of the lower layer is in demand, but keeps bringing unsatisfying results? Map out those guesses into the landscape of asymptotes.
Evaluate firmness/softness of each asymptote. For firm asymptotes, estimate the amount of patience, grit, and commitment that will be needed for the long-term optimization of the niche. For soft asymptotes, see if you have any intuitions on when and how the next breakthrough will occur. Decide if this intuition is strong enough to warrant investment. Aim for the next position of the asymptote, not the current one.
At the very least, the output of this recipe can serve as fodder for a productive conversation about the potential problems we could collectively throw ourselves against.