Why AI orchestration

Why do I find the problem of AI patterns and more generally, AI orchestration so interesting that I literally started building a framework for it? Why do we even need graphs and chains in this whole AI thing? My colleagues with a traditional software engineering background have been asking me this question a lot lately.

Put very briefly, at the height of the current AI spring that we’re experiencing, orchestration is a crucial tool for getting AI applications to the shipping point.

To elaborate,  imagine that an idea for a software application takes a journey from inception to full realization through these two gates.

First, it needs to pass the “hey… this might just work” gate. Let’s call this gate the “Once” gate, since it’s exactly how many times we need to see our prototype work to get through it.

Then, it needs to pass through the “okay, this works reasonably consistently” gate. We’ll call it the “Mostly” gate to reflect the confidence we have in the prototype’s ability to work. It might be missing some features, lack in polish and underwhelm in performance benchmarks, but it is something we can give to a small group of trusted users to play with and not be completely embarrassed.

Beyond these two gates, there’s some shipping point, where the prototype – now a fully-fledged user experience – passes our bar for shipping quality and we finally release it to our users.

A mistake that many traditional software developers, their managers, and sponsors/investors make is that, when looking at AI-based applications, they presume the typical cadence of passing through these gates.

Let’s first sketch out this traditional software development cadence as a sequence below.

The “Once” gate plays a significant role, since it requires finding and coding up the first realization of the idea. In traditional software development, passing this gate means that there exists a kernel of a shipping product, albeit still in dire need of growing and nurturing.

The trip to the “Mostly” gate represents this process of maturing the prototype. It is typically less about ideation and mostly converging on the robust implementation of the idea. There may be some circuitous detours that await us, but more often than not, it’s about climbing the hill.

In traditional software development, this part of the journey is a matter of technical excellence and resilience. It requires discipline and often requires a certain kind of organizing skill. On more than one occasion, I’ve seen brilliant program managers brought in, who then help the team march toward their target with proper processes, burndown lists, and schedules. We grit our teeth and persevere, and are eventually rewarded with software that passes the shipping bar.

There’s still a lot of work to be done past that gate, like polish and further optimization. This is important work, but I will elide it from this story for brevity.

In AI applications, or at least mine and my friends/colleagues’ experiences with it, this story looks startlingly different. And definitely doesn’t fit into a neat sequential framing.

Passing the “Once” gate is often a matter of an evening project. Our colleagues wake up to a screencast of a thing that shouldn’t be possible, but somehow is. Everyone is thrilled and excited. Their traditional software developer instincts kick in: a joyful “let’s wrap this up and ship it!” is heard through the halls of the office.

Unfortunately, when we try to deviate even a little from the steps in the original screencast, we get perplexing and unsatisfying results. Uh oh.

We try boxing the squishy, weird nature of large language models into the production software constraints. We spend a lot of time playing with prompts, chaining them, tuning models, quantizing, chunking, augmenting – it all starts to feel like alchemy at some point. Spells, chants, and incantations. Maaaybe – maybe – we get to coax a model to do what we want more frequently. 

One of my colleagues calls it the “70% problem” – no matter how much we try, we can’t seem to get past our application producing consistent results  more than 70% of the time. Even by generous software quality standards, that’s not “Mostly”.

Getting to that next gate has little resemblance to the maturation process from traditional software development. Instead, it looks a lot more like the looping over and over back to “Once”, where we rework the original idea entirely and change nearly everything.

When working with AI applications, this capacity to rearrange everything and stay loose about the details of the thing we build, this design flexibility is what dramatically increases our chances of crossing to “Mostly” gate. 

Teams that hinge their success on adhering to the demo they sold to pass through the “Once” gate are much more likely to never see the next gate. Teams that decide that they can just lay down some code and improve iteratively – as traditional software engineering practices would suggest – are the ones who will likely work themselves into a gnarly spaghetti corner. At least today, for many cases – no matter how exciting and tantalizing, the “70% problem” remains an impassable barrier. We are much better off relying on an orchestration framework to give us the space to change our approach and keep experimenting.

This is a temporary state and it is not a novel phenomenon in technological innovation. Every new cycle of innovation goes through this. Every hype cycle eventually leads to the plateau of productivity, where traditional software development rules.

However, we are not at that plateau yet. My intuition is that we’re still climbing the slope toward the peak of inflated expectations. In such an environment, most of us will run into the “70% problem” barrier head-first. So, if you’re planning to build with large language models, be prepared to change everything many times over. Choose a robust orchestration framework to make that possible.

3 thoughts on “Why AI orchestration”

Leave a Reply

Discover more from Dimitri Glazkov

Subscribe now to keep reading and get access to the full archive.

Continue reading