Limits of applying AI

I’ve been thinking about the boundaries of what’s possible when applying large language models (LLMs) to various domains. Here’s a framing that might not be fully cooked, but probably worth sharing. Help me make it better.

To set things up, I will use the limits lens from the problem understanding framework. Roughly, I identify three key limits that bound what’s possible for anyone who is put in front of a problem: capacity, or the actual ability to find effective solutions (one or more) to the problem; time, the velocity at which effective solutions are discovered; and finally, attachment, or the resistance to incorporate interesting information into their understanding of the problem to find solutions.

These three limits govern what we humans can do. Let’s see how these limits could be applied to LLMs.

Let me start with capacity. The fact that this limit applies to LLMs is fairly self-evident, and can be illustrated easily with the progress we have made in the last year. What was impossible just 18 months ago is now a widely accepted capability. The progression to larger models, larger context windows feels like a constant drumbeat – this is the capability limit being pushed further and further back. When will we run into the wall of the invisible asymptote? When will we be able to confidently say: “Aha! Here’s proof that LLMs can’t actually do that”? It’s not easy to tell. However, just like with all things of this world, this limit is still present with LLMs. We just haven’t quite found it yet.

Another interesting trick that LLMs have compared to humans is the ability to clone. A bot or an agent or a network of reasoning are easily reproducible. One does not need to spend a lifetime raising an LLM from infant to adulthood. Once a pattern is established, clones of this pattern are easy to produce. This is a significant source of capacity. Being more numerous is easy, which means that brute force can compensate for smarts in a pinch.

The final component of the limit of capacity is the computing power. The amount of energy it takes to make the LLM produce a potential answer to a given problem seems like an important factor. Again, we seem to be marching along the Moore’s Law curve here, and I expect each new breakthrough in atoms and bits to significantly push the capacity limit out.

Speaking of the future, let’s talk about the limit of time. I sprinkled the references to time into my examination of the limit of capacity. They seem closely interrelated. LLMs becoming more capable and efficient means that they will also be able to solve problems more effectively. At this moment of time, I am contemplating networks of reasoning boxes, which implies the era when invoking LLMs will take single milliseconds. This seems to fall out naturally of the limit of capacity advancing. There will likely be an asymptote we’ll hit with how fast an LLM can go, but we’re definitely not there yet.

The limit of attachment is the one that’s been most curious for me. As far as I can tell, LLMs don’t seem to have it at all. While people could get bored or tired, or have anxiety about doing things one way or another, express strong convictions and make rash decisions… LLMs don’t seem to have any of that. LLMs don’t have the “ooh, I wonder what <character> will do next on <TV show> episode“ mindworm. They don’t need to rush home to put their kid to bed. They have no desire to spend the day just hanging out with friends. LLMs are unattached. There is no inner shame that they have to protect at all cost. There are no values that they passionately embrace. The limit of attachment for LLMs appears to be a bottomless abyss – or a blue sky, if you’re into more positive metaphors.

Will the limit of attachment manifest itself for LLMs? Probably. There’s something very significant about this lack of the limit that indicates the vastness of the space we’re in.

So, what does this mean in practice? 

As I was saying to my colleagues, we’re at the 8-track tape stage of the whole LLM story. What we think is cool and amazing now will be viewed as silly and as quaint as the 8-track tapes just a few months in the future. Prepare for more tectonic shifts as the previously understood lines of the capacity limit are redrawn. Avoid early firm bets on the final shape of things in this space.

Also, we’re likely very close to the moment when we will have a decent representation of an “information worker”, probably as some sort of reasoning network. This worker will never get tired, will never complain about the problem being too pedestrian for their talents, will never slack off or quiet-quit. This worker will continuously improve and get better at the tasks that they are given, performing them more efficiently each time.

Looking at the LLM bounding box – or more accurately, the lack thereof – my intuition is that we’re going to see terraforming of entire industries, especially where information plumbing is a significant cost of doing business. I have no idea how they will change and what they will look like after the dust settles, but it’s very likely that this change will happen.

More importantly, as this change grinds into our established understanding of how things are, we are likely to grapple with it as a society. It is very possible that it is our own limits of attachment that we will try to impose on the LLMs, no matter how fruitless this effort will end up being in the long term.

Concept miner

Here’s a concrete example of a reasoning box that I’ve been talking about last week. It’s not super-flashy or cool – certainly not one of those viral demos, but it’s a useful example of recursively ground-truthing a reasoning box onto itself. The source code is here if you want to run it yourself.

The initial idea I wanted to play with was concept extraction: taking a few text passages and then turning them into a graph of interesting concepts that are present in the text. The vertices of the graph would be the concepts and the edges will represent the logical connections between them.

I started with a fairly simple prompt:

Analyze the text and identify all core concepts defined within the text and connections between them.

Represent the mental concept and connections between them as JSON in the following format:

{
"concept name": {
"definition": "brief definition of the concept",
"connections": [ a list of of concept names that connect with this concept ]
}
}

TEXT TO ANALYZE:
${input}

This is mostly a typical reasoning box – take in some framing, context, and a problem statement (“identify all core concepts”), and produce a structured output that reflects the reasoning. In this particular case, I am not asking for the chain of reasoning, but rather for a network of reasoning.

The initial output was nice, but clearly incomplete. So I thought – hey, what if I feed the output back into the LLM, but with a different prompt. In this prompt, I would ask it to refine the list of concepts:

Analyze the text and identify all core concepts defined within the text and connections between them.

Represent the mental concept and connections between them as JSON in the following format:

{
"concept name": {
"definition": "brief definition of the concept",
"connections": [ a list of of concept names that connect with this concept ]
}
}

TEXT TO ANALYZE:
${input}

RESPONSE:
${concepts}

Identify all additional concepts from the provided text that are not yet in the JSON response and incorporate them into the JSON response. Add only concepts that are directly mentioned in the text. Remove concepts that were not mentioned in the text.

Reply with the updated JSON response.

RESPONSE:

Notice what is happening here. I am not only asking the reasoning box to identify the concepts. I am also providing the outcome of its previous reasoning and asking to assess the quality of this reasoning.

Turns out, this is enough to spur a ground truth response in the reasoning box: when I run it recursively, the list of concepts grows and concept definitions get refined, while connections shift around to better represent the graph. I might start with five or six concepts in the first run, and then expand into a dozen or more. Each successive run improves the state of the concept graph.

This is somewhat different from the common agent pattern in reasoning boxes, where the outcomes of agent’s actions serve as the ground truth. Instead, the ground truth is static – it’s the original text passages, and it is the previous response that is reasoned about.  Think of it as the reasoning box making guesses against some ground truth that needs to be puzzled out and then repeatedly evaluating these guesses. Each new guess is based on the previous guess – it’s a path-dependent reasoning.

Eventually, when the graph settles down and no more changes are introduced, the reasoning box produces a fairly well-reasoned representation of a text passage as a graph of concepts.

We could maybe incorporate it into our writing process, and use it to see if the concepts in our writing connect in the way we desire, and if the definitions of these concepts are discernible. Because the reasoning box has no additional context, what we’ll see in the concept graph can serve as a good way to gauge if our writing will make sense to others.

We could maybe create multiple graphs for multiple passages and see if similar concepts emerge – and how they might connect. Or maybe use it to spot text written in a way that is not coherent and the concepts are either not well-formed or too thinly connected.

Or we could just marvel at the fact that just a few lines of code and two prompts give us something that was largely inaccessible just a year ago. Dandelions FTW.

Separating cargo from the train

I’ve been puzzling over a problem that many engineering teams face and came up with this metaphor. It’s situated in the general space of attachment and could probably apply to things other than engineering teams. 

Here’s the setup. Imagine that we’re leading a team whose objective is to rapidly explore some newly opened space. Everyone gets their little area of the space, and armed with enthusiasm and skill, the teams venture off into the unknown. A few months later, a weird thing happens: now we have N teams that build technology or products in basically the same space where they started.

Instead of exploring, the team just settled into the first interesting thing they found. Exploration collapsed into optimizing for the newly found value niche.

This might not necessarily be a bad thing. If the new space is ripe with opportunities or the team is incredibly lucky, they might have struck gold on the first try.  Except my experience tells me that most of the time, the full value of the niche is grossly overestimated, and the teams end up organizing themselves into settlers of a tiny “meh” value space.

The events that follow are fairly predictable. There is a struggle between us and the individual team leads to “align”, where the word “align” really stands for “what the heck are y’all doing?! we were supposed to be exploring!!” from us and “stop distracting us with your silly ideas! we have customers to serve and things to ship!” from the sub-teams. The team becomes stuck.

I have seen various ways in which the resolution plays out. There’s one with the uneasy compromise, where the “exploration team” kayfabe is played out at the higher levels (mostly in slide decks), and the sub-teams are just left to do their thing. There’s one with the leader making a “quake”: a swift reorg that leaves the sub-team leads without a path forward. There’s one where a new stealth sub-team is started to actually explore (you can guess what happens next).

The lens that really helps here is “something will get optimized”. When we have engineers, we have people whose literal job description includes organizing code into something that lasts. Like a car with unbalanced wheels, by default, engineers will veer toward elephant-land. Given no other optimization criteria, what will get optimized is the quality of the code base and the robustness of the technical solution that it offers.

The problem is, when exploring, we don’t need any of that. We need messy, crappy code that somewhat works to get a good sense of whether there’s a there there. And then we need to throw that code out or leave it as-is and move on to the next part of the space.

This is not at all natural and intuitive for engineering teams. There are no tests! Not even a code review process! This dependency was made by a random person in Nebraska! Madness!

By the way, the opposite of this phenomenon is also true. If our engineering team does not have this tendency toward building code that lasts, we probably don’t have an engineering team. We might have some coders or programmers, but no engineering.

To shift an engineering team to be more amenable to exploration, we need to shift the target of the optimization.

That’s where the cargo-and-train metaphor comes in. Let’s pretend that an engineering team is the train that delivers a cargo: a thing that it makes. The thing about cargo is that once it is delivered, it leaves the train and the train gets new cargo. Train is permanent. Cargo is transient.

To make our train go efficiently, we optimize for moving cargo as quickly as possible, and we optimize for keeping the train in its best condition. Figuring out which part of our work we optimize to keep and which one we optimize to move is what it’s all about.

If we follow this metaphor, there are two questions that an engineering team needs to ask itself: “What is our cargo?” and “What is our train?” We need to consciously separate our cargo from our train.

Which part of our business do we optimize to let go of as efficiently as possible, and which part of it do we keep and grow?

For a typical engineering team, the cargo is the software release and the code base is the train.

Each release is the snapshot of the code base at a certain state. Once that release cut is made, we mentally let go of it and start on the next release. Releasing well means being able to make a release cut like a Swiss train: always on time, with no hiccups.

The codebase is the train, since this is where releases come from. Codebase is the place where the product grows and matures. Our codebase is what we keep and improve and strive to make better with time. Terms like technical debt we engineers invent reflect our anxiety about succeeding at this process.

When the engineering team is asked to explore a new space, the answers to the two questions are like different.

It might very well be that the code we write is cargo. It’s just something we do as a byproduct of our exploration. We write a ton of prototypes, throw them in the wild, and see which ones stick.

What is the train then? My intuition is that it’s knowledge. After all, the whole point of exploration is mapping the unknown. If our continuous delivery of cargo – writing of prototypes – doesn’t light up more and more territory, we’re doing something wrong.

So when an engineering team is asked to explore a new space, we need to contemplate the cargo-and-train questions carefully and decide on our answers to them.

Then, we need to invest into making sure that everyone on the team optimizes for the right thing: the thing that we want to be our cargo is optimized to be delivered and let go of quickly, and the thing we want to be our train is carefully and lovingly grown and enriched with each delivery.

This includes everything from mission and vision where the cargo-and-train questions are clearly answered, but also into culture, incentives, and structure of the team. Remember – most of the default engineering practices and processes were designed for default engineering teams. Which means that if we’re setting out to explore, they will be working against us.

Reasoning boxes

This story begins with the introduction of metacognition to large language models (LLMs). In the LLM days of yore (like a few months ago), we just saw them as things we could ask questions and get answers back. It was exciting. People wrote think pieces about the future of AI and all that jazz.

But then a few extra-curious folks (this is the paper that opened my eyes) realized that you could do something slightly different: instead of asking for an answer, we could ask for the reasoning that might lead to the answer.

Instead of “where do I buy comfortable shoes my size?”, we could inquire: “hey, I am going to give you a question, but don’t answer it. Instead, tell me how you would reason about arriving at the answer. Oh, and give me the list of steps that would lead to finding the answer. Here’s the question: where do I buy comfortable shoes my size?

Do you sense the shift? It’s like an instant leveling up, the reshaping of the landscape. Instead of remaining hidden in the nethers of the model, the reasoning about the question is now out in the open. We can look at this reasoning and do what we would do with any reasoning that’s legible to us: examine it for inconsistencies and decide for ourselves if this reasoning and the steps supplied will indeed lead us toward the answer. Such legibility of reasoning is a powerful thing.

With reasoning becoming observable, we iterate to constrain and shape it. We could tell the LLM to only use specific actions of our choice as steps in the reasoning. We could also specify particular means of reasoning to use, like taking multiple perspectives or providing a collection of lenses to rely on.

To kick it up another notch, we could ask an LLM to reason about its own reasoning. We could ask it “Alright, you came up with these steps to answer this question. What do you think? Will these work? What’s missing?” As long as we request to provide the reasoning back, we are still in the metacognitive territory.

We could also give it the outcomes of some of the actions it suggested as part of the original reasoning and ask it to reason about these outcomes. We could specify that we tried one of the steps and it didn’t work. Or maybe that it worked, but made it impossible for us to go to the next step – and ask it to reason about that.

From the question-answering box, we’ve upleveled to the reasoning box.

All reasoning boxes I’ve noticed appear to have this common structure. A reasoning box has three inputs: context, problem, and framing. The output is the actual reasoning. 

The context is the important information that we believe the box needs to have to reason. It could be the list of the tools we would like it to use for reasoning, the log of prior attempts at reasoning (aka memory), information produced by these previous attempts at reasoning, or any other significant stuff that helps the reasoning process.

The problem is the actual question or statement that we would like our box to reason about. It could be something like the shoe-shopper question above, or anything else we would want to reason about, from code to philosophical dilemmas.

The final input is the framing. The reasoning box needs rails on which to reason, and the framing provides these rails. This is currently the domain of prompt engineering, where we discern resonant cues in the massive epistemological tangle that is LLM that give to the reasoning box the perspective we’re looking for. It usually goes like “You are a friendly bot that …” or “Your task is to…”. Framing is sort of like a mind-seed for the reasoning box, defining the kind of reasoning output it will provide.

Given that most of the time we would want to examine the reasoning in some organized way, the framing usually also constrains the output to be easily parsed, be it a simple list, CSV, or JSON.

A reasoning box is certainly a neat device. But by itself, it’s just a fun little project. What makes reasoning boxes useful is connecting them to ground truth. Once we connect a reasoning box to a ground truth, we get the real sparkles. Ground truth gives us a way to build a feedback loop.

What is this ground truth? Well, it’s anything that can inform the reasoning box about the outcomes of its reasoning. For example, in our shoe example, a ground truth could be us informing the box of the successes or failures of actions the reasoning box supplied as part of its reasoning.

If we look at it as a device, a ground truth takes one input and produces one output. The input is the reasoning and the output is the outcomes of applying this reasoning. I am very careful not to call ground truth “the ground truth”, because what truths are significant may vary depending on the kinds of reasoning we seek.

For example, and as I implied earlier, a reasoning box itself is a perfectly acceptable ground truthing device. In other words, we could connect two reasoning boxes together, feeding one’s output into another’s context – and see what happens. That’s the basics of the structure behind AutoGPT.

Connecting a reasoning box to a real-life ground truth is what most AI Agents are. They are reasoning boxes whose reasoning is used by a ground truthing device to take actions, like searching the web or querying data sources – and then feeding the outcomes of these actions back into the reasoning boxes. The ground truth connection is what gives reasoning boxes agency.

And I wonder if there’s more to this story?

My intuition is that that the reasoning box and a ground truthing device are the two kinds of blocks we need to build what I call “socratic machines”: networks of reasoning boxes and ground truthing devices that are capable of independently producing self-consistent reasoning. That is, we can now build machines that can observe things around them, hypothesize, and despite all of the hallucinations that they may occasionally incur, arrive at well-reasoned conclusions about them.

The quality of these conclusions will depend very much on the type of ground truthing these machines have and the kind of framing they are equipped with. My guess is that socratic machines might even be able to detect ground truthing inconsistencies by reasoning about them, kind of like how our own minds are able to create the illusion of clear vision despite only receiving a bunch of semi-random blobs that our visual organs supply. And similarly, they might be able to discern, repair and enrich insufficient framings, similar to how our minds undergo vertical development.

This all sounds outlandish even to me, and I can already spot some asymptotes that this whole mess may bump into. However, it is already pretty clear that we are moving past the age of chatbots and into the age of reasoning boxes. Who knows, maybe the age of socratic machines is next to come? 

Porcelains

My friend Dion asked me to write this down. It’s a neat little pattern that I just recently uncovered, and it’s been delighting me for the last couple of days. I named it “porcelains”, partially as an homage to spiritually similar git porcelains, partially because I just love the darned word. Porcelains! ✨ So sparkly.

The pattern goes like this. When we build our own cool thing on top of an existing developer surface, we nearly always do the wrapping thing: we take the layer that we’re building on top and wrap our code around it. In doing so, we immediately create another, higher layer. Now, the consumers of our thing are one layer up from the layer from which we started. This wrapping move is very intuitive and something that I used to do without thinking.

  // my API which wraps over the underlying layer.
  const callMyCoolService = async (payload) => {
    const myCoolServiceUrl = "example.com/mycoolservice";
    return await // the underlying layer that I wrap: `fetch`
    (
      await fetch(url, {
        method: "POST",
        body: JSON.stringify(payload),
      })
    ).json();
  };
  // ...
  // at the consuming call site:
  const result = await callMyCoolService({ foo: "bar" });
  console.log(result);

However, as a result of creating this layer, I now become responsible for a bunch of things. First, I need to ensure that the layer doesn’t have too much opinion and doesn’t accrue its cost for developers. Second, I need to ensure that the layer doesn’t have gaps. Third, I need to carefully navigate the cheesecake or baklava tension and be cognizant of the layer thickness. All of a sudden, I am burdened with all of the concerns of the layer maintainer.

It’s alright if that’s what I am setting out to do. But if I just want to add some utility to an existing layer, this feels like way too much. How might we lower this burden?

This is where porcelains come in. The porcelain pattern refers to only adding code to supplement the lower layer functionality, rather than wrapping it in a new layer. It’s kind of like – instead of adding new plumbing, put a purpose-designed porcelain fixture next to it.

Consider the code snippet above. The fetch API is pretty comprehensive and – let’s admit it – elegantly designed API. It comes with all kinds of bells and whistles, from signaling to streaming support. So why wrap it?

What if instead, we write our code like this:

  // my API which only supplies a well-formatted Request.
  const myCoolServiceRequest = (payload) =>
    Request("example.com/mycoolservice", {
      method: "POST",
      body: JSON.stringify(payload),
    });
  // ...
  // at the consuming call site:
  const result = await (
    await fetch(myCoolServiceRequest({ foo: "bar" }))
  ).json();
  console.log(result);

Sure, the call site is a bit more verbose, but check this out: we are now very clear what underlying API is being used and how. There is no doubt that fetch is being used. And our linter will tell us if we’re using it improperly.

We have more flexibility in how the results of the API could be consumed. For example, if I don’t actually want to parse the text of the API (like, if I just want to turn around and send it along to another endpoint), I don’t have to re-parse it.

Instead of adding a new layer of plumbing, we just installed a porcelain that makes it more shiny for a particular use case.

Because they don’t call into the lower layer, porcelains are a lot more testable. The snippet above is very easy to interrogate for validity, without having to mock/fake the server endpoint. And we know that fetch will do its job well (we’re all in big trouble otherwise).

There’s also a really fun mix-and-match quality to porcelain. For instance, if I want to add support for streaming responses to my service, I don’t need to create a separate endpoint or have tortured optional arguments. I just roll out a different porcelain:

  // Same porcelain as above.
  const myCoolServiceRequest = (payload) =>
    Request("example.com/mycoolservice", {
      method: "POST",
      body: JSON.stringify(payload),
    });
  // New streaming porcelain.
  class MyServiceStreamer {
    writable;
    readable;
    // TODO: Implement this porcelain.
  }
  // ...
  // at the consuming call site:
  const result = await fetch(
    myCoolServiceRequest({ foo: "bar", streaming: true })
  ).body.pipeThrough(new MyServiceStreamer());

  for await (const chunk of result) {
    process.stdout.write(chunk);
  }
  process.stdout.write("\n");

I am using all of the standard Fetch API plumbing – except with my shiny porcelains, they are now specialized to my needs.

The biggest con of the porcelain pattern is that the plumbing is now exposed: all the bits that we typically tuck so neatly under succinct and elegant API call signatures are kind of hanging out.

This might put some API designers off. I completely understand. I’ve been of the same persuasion for a while. It’s just that I’ve seen the users of my simple APIs spend a bunch of time prying those beautiful covers and tiles open just to get to do something I didn’t expect them to do. So maybe exposed plumbing is a feature, not a bug?

Innovation frontier

So we decided to innovate. Great! Where do we begin? How do we structure our innovation portfolio? There are so many possibilities! AI is definitely hot right now. But so are advances in green technology – maybe that’s our ticket? I heard there’s stuff happening with biotech, too. And I bet there are some face-melting breakthroughs in metallurgy…

With so much happening everywhere all at once, it could be challenging to orient ourselves and innovate intentionally – or at least with enough intention to convince ourselves that we’re not placing random bets. A better question: what spaces do we not invest into when innovating?

Here’s a super-simple framing that I’d found useful in choosing the space to innovate. It looks like a three-step process.

First, we need to know what our embodied strategy is. We need to understand what our capabilities are and where they will be taking us by default.

This is important, because some innovation may just happen as a result of us letting our embodied strategy play out. If we are an organization whose embodied strategy is strongly oriented toward writing efficient C++ code, then we are very likely to keep seeing amazing bits of innovation pop out in that particular space. We will likely lead some neat C++ standards initiatives and invent new cool ways to squeeze a few more drops of performance out of the code we write.

As I mentioned before, embodied strategy is usually not the same as stated strategies. I know very few teams who are brutally honest with themselves about what they are about. There’s usually plenty of daylight between what the organization states about where they’re going and where they are actually going. The challenge of step 1 is to pierce the veil of the stated strategy.

As you may remember from my previous essays, this understanding will also include knowing our strategy aperture. How broad is our organization’s cone of embodied strategy?

At the end of the first step, we already have some insight on the question above. Spaces well outside of our cone of embodied strategy are not reachable for us. They are the first to put into the discards pile. If we are an organization whose strengths are firmly in software engineering, attempting to innovate in hardware is mostly like throwing money away – unless of course we first grow our hardware engineering competency.

The second step is to understand our innovation frontier. The innovation frontier is a thin layer around our cone of embodied strategy. Innovation ideas at the outer edge of this frontier are the ones we’ve just discarded as unreachable. Ideas at the inner edge of the frontier are obviously going to happen anyway: they are part of the team’s embodied strategy.

It is the ideas within this frontier that are worth paying closer attention to. They are the “likely-to-miss” opportunities. Because they are still on the fringe of the embodied strategy, the organization is capable of realizing them, but is unlikely to do so – they are on the fringe, after all.

It is these opportunities that are likely going to sting a lot for a team when missed. They are the ones that were clearly within reach, but were ignored because of the pressing fires and general everyday minutiae of running core business. They are the ones that will disrupt the business as usual, because – when they are big enough – they will definitely reshape the future opportunities for the organization.

The innovation frontier is likely razor-thin for well-optimized and specialized organizations. The more narrow our strategy aperture, the less likely we will be to shift a bit to explore curious objects just outside of our main field of view.

In such cases, the best thing the leader of such an organization can do is to invest seriously into expanding their innovation frontier. Intentionally create spaces where thinking can happen at a slower pace, where wilder ideas can be prototyped and shared in a more dandelion environment. Be intentional about keeping the scope roughly within the innovation frontier, but add some fuzziness and slack to where these boundaries are.

The third step is to rearrange the old 70/20/10 formula and balance our innovation portfolio according to what we’ve learned above:

  • Put 70% into the ideas within the innovation frontier and the efforts to expand our innovation frontier.
  • Put 20% into the ideas that are within the strategy aperture.
  • Just in case we’re wrong about our understanding of our embodied strategy, put 10% into the ideas that are at the outer edge of the innovation frontier.

And who knows, my law of tightening strategy aperture could be proven wrong? Perhaps if an organization is intentional enough about expanding its innovation frontier, it could regain its ability to see and realize the opportunities that would have been previously unattainable?

Wait, did we forgo the whole notion of timelines in our innovation portfolio calculations? It’s still there, since the cone of embodied strategy does extend in time. It’s just not as significant as it was in the old formula. Why? That’s a whole different story and luckily, my friend Alex wrote this story down just a few days ago.

Learning from the nadir

I’ve talked before about traps: how to get into them, what they might look like, and even how to get out of them. This little story is about the developmental potential of traps.

To frame the story, I will draw another two-by-two. The axes signify our awareness of our limitations and our capacity to overcome them. To make things a bit more interesting, I will also turn this two-by-two 45 degrees clockwise, because I want to map it to another framing: the hero’s journey.

The axes form four quadrants that loosely correspond to the key segments of the hero’s journey. 

In the top quadrant, we are the happiest and perhaps even bored. We aren’t aware of any of the limitations that hinder us and we feel generally content with our abilities.

It is that boredom that gets us in trouble. At first reluctantly, but eventually with more gusto, we engage with a challenge and traverse into the right quadrant. This quadrant is characterized by weirdness. Campbell points out all kinds of odd stuff happening to us, from being visited by a quirky wizard to being tested in ways that make us unsure of ourselves.

In the right quadrant, we aren’t yet aware that the challenge exceeds our capacity to overcome it, so things feel bizarre, random, and generally not right. We might start a new job with enthusiasm, and after a few meetings, have our heads spinning, encountering unexpected politics and/or gargantuan technical debt: “What did I just get myself into?”

Eventually, as we puzzle things out, we arrive at the nadir of our journey, the bottom quadrant. We become aware of the fact that we’re in way over our heads. We are aware of our limitations and do not yet have the capacity to overcome them.

The bottom quadrant often feels like a trap. My colleagues and I sometimes apply the word “infohazard” to the insightful bits of knowledge that finally clear our vision and thrust us into this quadrant. It almost feels like it might have been better if we didn’t acquire that knowledge. Yeah, the previous quadrant was super-weird, but at least I didn’t feel so deficient in the face of the challenge.

This quadrant is also the most fertile ground for our inner development. When we have the right mindset, the awareness of our limitations creates a useful observation perch. Only when we are able to see our own limitations can we contemplate changing ourselves to overcome them.

This is not a given. Way too commonly and tragically, we never get to occupy this perch. Falling into the vicious cycle of not-learning, we form an inner false loop inside of our hero’s journey, spinning round and round inside of the bottom quadrant, and truly becoming trapped.

Whether we grasp onto the perch or not, one thing is guaranteed. The bottom quadrant is full of suffering. Even when we believe we’ve learned all there is to learn about self-development, and have all kinds of tools and tricks (and perhaps even write about it regularly) – the moment of discordance between what we’re seeing and what we believe will be inevitably painful.

It is on us to recognize that this pain can be processed in two ways: one is through the habitual entrapment of the not-learning cycle and the other one is by choosing to learn like crazy. Hanging on to a dear life to the perch of observation and examining our beliefs and recognizing flexibility in bits that we previously thought immovable.

Only then can we emerge into the left quadrant, where we are both aware of our limitations, but now have the capacity to overcome the challenge – and bring the boon back to the land of living, as Campbell would probably say.

How we engage with LLMs

It seems popular to write about generative AI and large language models (aka LLMs) these days. There are a variety of ways in which people make sense out of this space and the whole phenomenon of “artificial intelligence” – I use double-quotes here, because the term has gotten quite blurry semantically.

I’ve been looking for a way to make sense of all of these bubbling insights, and here’s a sketch of a framework that is based on the Adult Development Theory (ADT). The framework presumes that we engage with LLMs from different parts of our whole Selves, with some parts being at earlier stages of development and some parts at the later. I call these parts “Minds”, since to us, they feel like our own minds, each with its own level of complexity and attributes. They change rapidly within us, often without us noticing.

These minds are loosely based on the ADT stages: the earliest and least complex Opportunist Mind, the glue-of-society Socialized Mind, the make-things-work Expert Mind, and the introspective Achiever Mind.

🥇The Opportunist Mind

When we engage with an LLM with an Opportunist Mind, we are mostly interested in poking at it and figuring out where its weaknesses and strengths lie. We are trying to trick it, to reveal its secrets, be that initial prompts or biases. From this stance, we just want to figure out what it’s made of and how we could potentially exploit it. Twitter is abuzz with individuals making LLMs act in ways that are beneficial to illustrating their arguments. All of those are symptoms of the Opportunist Mind approach to this particular technology.

There’s nothing wrong with engaging an LLM in this way. After all, vigorous product testing makes for a better product. Just beware that an Opportunist Mind perch has a very limited view, and the quality of insights gained from it is generally low. I typically steer clear from expert analyses engaging with LLMs from this mind. Those might as well be generated by LLMs themselves.

👥The Socialized Mind

When the LLM becomes our DM buddy or a game playing partner, we are engaging with an LLM with a Socialized Mind. When I do that, there’s often a threshold moment when I start seeing an LLM as another human being, with thoughts and wishes. I find myself falling into habits of human relationship-building, with all of the rules and ceremonies of socializing. If you ever find yourself trying to “be nice” to an LLM chat bot, it’s probably your Socialized Mind talking.

At the core of this stance is — consciously or subconsciously — constructing a mental model of an LLM as that of a person. This kind of mental model is not unique to the Socialized Mind, but when engaging with this mind, we want to relate to this perception of a human, to build a connection with it.

This can be wonderful when held lightly. Pouring our hearts to a good listener convincingly played by an LLM can be rather satisfying. However, if we forget that our mental model is an illusion, we get into all sorts of trouble. Nowadays, LLMs are pretty good at pretending to be human, and the illusion of a human-like individual behind the words can be hard to shake off. And so we become vulnerable to the traps of “is it conscious/alive or not?” conversations. Any press publication or expert analysis in this vein is only mildly interesting to me, since the perch of the Socialized Mind is not much higher than that of the Opportunist Mind, and precludes seeing the larger picture.

🧰The Expert Mind

Our Expert Mind engages with an LLM at a utilitarian level. What can I get out of this thing? Can I figure out how the gears click on the inside — and then make it do my bidding? A very common signal of us engaging LLMs with our Expert Mind is asking for JSON output. When that’s the case, it is very likely we see the LLM as a cog in some larger machine of making. We spend a lot of time making the cog behave just right – and are upset when it doesn’t. A delightful example that I recently stumbled into is the AI Functions: a way to make an LLM pretend to execute a pretend function (specified only as input/output and a rough description of what it should do) and return its result.

Expert Minds are tinkerers – they produce actual prototypes of things other people can try and get inspired to do more tinkering. For this reason, I see Expert Mind engagements as the fertile ground for dandelion-like exploration of new idea spaces. Because they produce artifacts, I am very interested in observing Expert Mind engagements. These usually come as links to tiny Github repos and tweets of screen captures. They are the probes that map out the yet-unseen and shifting landscape, serving as data for broader insights.

📝The Achiever Mind

I wanted to finish my little story here, but there’s something very interesting in what looks like a potential Achiever Mind engagement. This kind of engagement includes the tinkering spirit of the Expert Mind and enriches it with the mental modeling of the Socialized Mind, transcending both into something more.

When we approach LLMs with the Achiever Mind, we recognize that the nature of this weird epistemological tangle created by an LLM creates opportunities that we can’t even properly frame yet. We can get even more interesting outcomes than the direct instruction-to-JSON approach of our Expert Mind engagement by considering this tangle and poking at it.

The ReAct paper shone the light at this kind of engagement for me. It revealed that, in addition to direct “do this, do that” requests, LLMs are capable of something that looks like metacognition: the ability to analyze the request and come up with a list of steps to satisfy the request. This discovery took someone looking at the same thing that everyone was looking at, and then carefully reframing what they are seeing into something entirely different.

Reframing is Achiever Mind’s superpower, and it comes in handy in wild new spaces like LLM applications. Metaphorically, if Expert Mind engagements explore the room in the house, Achiever Mind engagements find and unlock doors to new rooms. The unlocking of the room done by ReAct paper allowed a whole bunch of useful artifacts, from LangChain to Fixie to ChatGPT plugins to emerge. 

This story feels a bit incomplete, but has been useful for me to write down. I needed a way to clarify why I intuitively gravitate toward some bits of insight in the wild more than others. This framework helped me see that. I hope it does the same for you.

Sailors and Pirates

Here’s a fun metaphor for you. I’ve been chatting with colleagues about the behavior patterns and habits of leaders that I’ve been observing, and we recognized that there are two loose groups that we can see: sailors and pirates.

The sailors are part of the crew. They are following orders and making things that have been deemed important happen. Ordinary sailors have little agency: they are part of the larger machine that is intent on moving in a certain direction. Sailors higher in the power structure (and there is usually a power structure when sailors get together) have more agency. They have more freedom in how the things happen, but they are still held responsible for whether they happen or not.

Organization leaders who are sailors are subject to the primary anxiety of things being out of control. Their catastrophic scenario is that all this wonderful energy that they have in the people they lead is not applied effectively to the problem at hand. They wake up in cold sweat after dreaming of being lost or late, of being disoriented and bewildered in some chaotic mess. 

This makes them fairly easy to spot. Listen to how they talk. They will nearly always speak of the need to align, to make better decisions, to be more efficient and better coordinated. Sailor leaders love organizing things. For a sailor leader, neat is good.

Every organization needs sailors. Particularly in scenarios where we know where we are going, sailors are who will get you there.  They are the reliable folks who feel pride and honor to drive their particular ship (or part of the ship, no matter how small) toward the destination. Sailor leaders don’t have to be boring, but they prefer it that way. Excitement is best confined to the box where it doesn’t disrupt the forward movement.

Pirates are different. The word “pirate” conjures all kinds of imagery, some vividly negative. For our purposes, let’s take Jack Sparrow as the kind of pirate we’re talking about here.

As I mentioned, pirates are different. They loathe the orderly environment that the sailors thrive in. They yearn for a small ship that can move fast and make unexpected lateral moves.

Pirate’s driving anxiety is that of confinement. Whether consciously or not, their catastrophizing always involves being stuck. Their nightmares are filled with visions of being trapped or restrained, with no possibility of escape, of being pressed down by immovable weight.

Pirates seek options and choose to play in environments where the options are many. This is why we often find them in chaotic environments, though chaos is not something they may seek directly. It’s just that when there’s chaos, many of the variables that were previously thought to be constant become changeable. It’s that space that is opened up by the chaos-induced shifts that the pirates thrive in. And sometimes, often unwittingly, they will keep causing a little chaos – or a lot of it – to create that option space.

Pirate leaders are also not difficult to detect. They are usually the weird ones. They keep resisting the organization’s desire to be organized. They usually shun positions of power and upward movement in the hierarchies. For the saddest pirate is the one who climbed through the ranks to arrive at a highly prestigious, yet extremely sailor position.

Pirate leaders are known to inject chaos. If you’ve ever been to a meticulously planned and organized meeting, where its key participant throws the script away right at the beginning and takes it in a completely different direction – you’ve met a pirate leader.

It’s easy to see how sailors and pirates are oil and water. Sailors despise the pirate’s incessant bucking of the system. Pirates hate the rigid order of the sailors and their desire to reduce the available options. 

Then, why are pirates even found in organizations? Aren’t they better off in their Flying Dutchman somewhere, doing their pirate things?

The thing is, pirates need sailors. A shipful of pirates is not really a ship. With everyone seeking options, the thing ain’t going anywhere. Pirates need sailors who are happy to organize the boring details of the pirate adventure. And the more ambitious the adventure, the more sailors are needed.

Conversely, sailors need pirates. A ship that doesn’t have a single pirate isn’t a ship either – it’s an island. The most organized and neat state of a ship is static equilibrium. When a pirate captain leaves a ship, and no pirate steps up, the ship may look functional for a while, and even look nicer, all of the cannons shining of bright polish and sails finally washed and repaired.

But over time, it will become apparent that the reason for all these excellent looks is the lack of actual action. The safest, neatest course of action is to stay in place and preserve the glorious legends of the past.

The mutual disdain, combined with the mutual need creates a powerful tension. Every team and organization has it. The tension can only be resolved dynamically – what could have been the right proportion of pirates and sailors yesterday might not be the same today. Sometimes we could use fewer pirates, and other times, we need more of them.

To resolve this tension well, organizations need this interesting flexibility, where pirates and sailors aren’t identities, but roles. Especially in leadership, the ability to play both roles well is a valuable skill. Being able to assume the role flexibly depending on the situation gives us the capacity to be both pirates and sailors – and gives the organization a much higher chance of acting in accordance with its intentions.

The most effective pirate is a meta-pirate: someone who can be both a pirate and a sailor in the moment as a way to keep the opportunity space maximally open.

We all have this capacity. The reason I described the nightmare plots for the sailor and the pirate is to help you recognize them in your own dreams. Experienced both kinds? You are likely both a little bit of a pirate and a sailor at heart. If one is more common than the other, that’s probably the indicator of where you are leaning currently. So, if you’re looking to become a meta-pirate, that’s an indicator of where to focus the work of detaching the role from your identity.