One of my colleagues quipped recently that instruction tuning might not be about teaching large language models (LLMs) what to do, but rather about reminding LLMs of the wealth of wisdom they already have at their foundation. Here’s a rather unkempt vignette on what wisdom I would like an LLM to recall.
I would like to help a model remember how to solve problems that are a bit more complex, and require some rigor in reasoning. We are well underway in our first generation of LLM reasoning explorations, and I am looking a bit behind the current horizon.
Here is my hypothesis: an LLM that is capable of reasoning about a broad variety of problems would be biased to structure its reasoning in a certain way. That is, when we ask it to “think step by step”, it will rely on some predefined structure of these steps while performing a text completion.
If this hypothesis is true, reasoning structures will be at the core of dramatically improving the reasoning ability of the models. Put differently, I anticipate that the next cycle of model evolution will be about exploring the space of reasoning structures and finding more and more capable local maximas of such structures.
The future of instruction tuning might not be about the quantity or even quality of the data, but rather about the rigor of the overarching reasoning structure in the data sets.
To make my “reasoning structure” concept a bit more clear, I tried to conjure one up relying on the problem understanding framework that I’ve created a while back. This is just one rough sketch, but it should give you some idea of what the heck I am talking about.
🎱 Guessing the problem class
To quickly recap, I posit in my framework that there are three classes of problems: solved, solvable, and unsolvable.
LLMs are already able to address solved problems with moderate success. The implicit code execution shows a straightforward way to approach them: once the problem is deemed as something that was already solved before, we ask an LLM to write the minimal bit of code to produce the solution. Crunching numbers, reversing words, making charts, and various similar kinds of problems fall in this class.
LLMs are making decent progress on tackling solvable problems. Solvable problems have no well-established solution, but we know what the solution looks like – also known as “the ground truth”. The presence of this ground truth is what enables reasoning boxes, and their iterative approach to finding the solution.
The yet untapped potential of LLMs is in making sense of unsolvable problems. Unsolvable problems are highly adaptive, and often appear to be solving us rather than the other way around. Things like relationships, organizations, politics, and ecosystems all fall into this class of problems – and usually, we don’t end up solving them. Instead, we navigate through them as best we can. A good way to picture an unsolvable problem is thinking of an open ocean. We can’t solve the high seas, but we can navigate them.
The key to navigating an unsolvable problem is in investing a bit of time into sensing the various forces inside of the problem that make it impossible to solve once and for all. I call this process “framing the problem”. A successful framing produces one or more solvable problems that when solved, help us navigate the unsolvable problem.
Let’s use this simple taxonomy as the foundation of a reasoning structure.
The structure will likely begin with making a guess at the class of the problem: are we looking at a solved, solvable, or unsolvable problem? I experimented a bit with asking LLMs about Cynefin, and was pleasantly surprised by the quality of these guesses. Perhaps this is the foundational wisdom of the model that my colleague was talking about?
Based on the outcome of the initial guess, the structure will guide the reasoning to branch and engage with the problem differently based on its class.
🔨 Handling solved problems
For solved problems, an LLM would attempt to just answer it, or write a bit of code as I mentioned above – just as most question-answering LLM-based applications do today. Like so:
For this problem class, this is not as much as solving the problem as retrieving the solution from the depth of the model’s wisdom.
🧩 Actually solving problems
If the problem is solvable, the reasoning structure will assume an iterative stance with two interlinked lines of reasoning.
The first line of reasoning tries to answer the question of “Has the problem been solved?”. This is the actual process of solving the problem.
The second line of reasoning tries to answer the question “Do I know everything I need to know to verify that the problem has been solved?”. This is the process of establishing the problem context, which determines the constraints that could be used to verify that the solution is correct. This context serves as the ground truth for the problem.
Establishing problem context is a problem in itself, and the reasoning structure will branch further into various techniques that could be effective. For example, the LLM might decide to write a suite of tests that, when run, will prove or disprove that the answer is correct. It could also opt to ask the user to provide feedback on the answer and learn from that feedback.
It might even choose an “inner monologue” technique, where the ground truth is present only in a socratic sense, with the LLM constantly questioning its own assumptions and making progress by spotting inconsistencies in them. As long as the assumptions are visible, there is an opportunity to question and correct them.
Whatever the choice, the reasoning structure will need to presume that both solving and context-finding processes are iterative. The LLM will no longer attempt to solve the problem in one completion, but rather focus on making progress toward the solution in multiple iterations and prefer frequent contact with the ground truth.
This stance is different from the typical “question-answer” one of the solved problem class. I’ve been able to get the current generation of models to do so with mixed results, and it feels like an important capability to develop, perhaps through instruction tuning.
The model will also need to be capable of keeping track of where in the reasoning structure it is. Given that solving a problem might take multiple completions, there needs to be something that helps the model orient in relation to the overall progress of solving the problem.
Question: Write a to-do list app Response: Writing a to-do list app is a solvable problem, because it is very clear when the app is built and whether it allows viewing the to-do list, as well as adding and removing items from the list. What is the information am I missing to solve this problem? I need to ask the following questions: - Other than showing a to-do list, adding, and removing items from the to-do list, what other features are important to you? - Should this app support multiple lists or is just one list sufficient? - On what device will you use this app?
Then, a few interactions later:
Question: Write a to-do list app Context: This is a solvable problem, and we are currently in the process of determining requirements. Previous interactions established the following requirements: - app will support only one to-do list - app will run on an iPhone - app will allow viewing one to-do list - it should be possible to add and remove items on the list - it should be possible to sort items on the list. Response: <completion continues here>
Today, this tracking is primarily done by recording the history of the problem-solving interactions, like a chat transcript. Most chat models will accept the history of the previous messages and are tuned to treat it as guidance for the text completion. I can’t wait to see the new, more effective methods of retaining problem context be discovered.
For instance, a more sophisticated reasoning structure might guide a model to detect that the problem could be split into multiple sub-problems that could be solved more easily, in parallel or in sequence. In such cases, the problem context would map into multiple sub-problems and eventually reduce back to one, once the sub-problems are solved. For example:
Question: Write an app that allows me to create a to-do list Response: Writing a to-do list app is a solvable problem, because it can be tested if the app: - successfully starts - allows viewing the to-do list - supports adding and removing items from the list. To simplify the problem, I will break it up into two sub-problems: - the backend that provides the to-do list API for storing and retrieving the list - the frontend that consumes this API. The context for the backend sub-problem will be: - write a backend for a to-do list app <detailed backend spec goes here> - write an API for the backend that supports three methods: retrieve the list, add a new item, and remove an existing item. <detailed spec of the API goes here> The context for the frontend sub-problem will be: - write a Web app frontend for a to-do list app - the fronted will support viewing to-do list on mobile devices, and specifically on Safari for iPhone <detailed frontend spec goes here> - the frontend will integrate with the following API <detailed spec of the API goes here>
🌀 Framing unsolvable problems
When the LLM guesses the problem as unsolvable, the reasoning structure guides the process toward discerning the tangle of forces that creates unsolvability. This part of the process is less about finding solutions and more about building a reasonable dynamic model of the problem.
I am not yet sure how such a model could be represented efficiently as text, but it will likely start with a list of statements that form into a causal graph. For example, here’s how a model may respond to a complex question:
Question: Should I stay in college or start my own AI company? Response: making a decision on whether to stay in college or start an AI company appears to be an unsolvable problem. There does not seem to be one right answer and it is hard to check the correctness of the answer. To better understand the forces at play, I will ask the following clarifying questions: - What is difficult about making this decision? - Why does it have to be an either-or decision? - What holds you back from starting a company? - What is so attractive about starting a company? - Why is leaving college necessary? - Why is staying in college important for you?
After a few rounds of questions and answers, the graph will begin to take shape, the LLM might identify a few leverage points: solvable problems within the larger unsolvable problem.
Question: Should I stay in college or start my own AI company? Context: This is a complex problem and we are currently in the process of building a model of the problem. Here are the edges of the model graph: - AI is very exciting, and there are a lot of cool opportunities - If I wait, I will miss these opportunities - To graduate successfully, I need to wait - To build a company, I need a lot of time - To build successful foundation for my life, I need to excel at school - To excel at school, I need a lot of time - Time is an extremely limited resource <graph edges continue> Response: Based on the graph I have so far, I believe that time might be a leverage point in this problem. I will recommend framing the original problem as that of optimizing for the most effective use of time and creating space for both college and company in the schedule. Suggested problem statement: Optimize my schedule to enable both staying in college and starting my own AI company.
Once the framing emerges, the reasoning structure will follow the recipe for the solvable problem that popped out. While the original problem remains unsolved, there are now concrete steps that the user can take to navigate it.
The dynamic model that is created by the process of navigating an unsolvable problem is a valuable artifact. In fact, it probably needs to be retained. Depending on the success and failure of the framing that was produced, it will need to be updated and refined.
🩹 Detecting problem classification mistakes
Because the first step of the reasoning structure is just a guess, the structure will need to provide guidance to detect the mistake and shift the problem into a different class.
Some of these mistakes aren’t as bad as the others. For example, mistaking a solved problem as solvable or a solvable problem as unsolvable will both result in inefficient use of resources, but produce a satisfying answer.
There is another category of mistakes where the solution is produced, but isn’t valid, like when a solvable problem was diagnosed as solved, and the attempt to solve the problem results in the wrong answer. I wonder if some sort of “inner critic” reasoning might be necessary even for solved problems to mitigate that.
The mistake that is a bit more troublesome is when the unsolvable problem was guessed to be solvable. It will look like inability to pin down constraints, creating a sense of divergence, rather than convergence toward the objective. Detecting this might be challenging, because the reasoning will continue finding new solvable sub-problems, and appear to make the infinite-regress sort of progress. Here, the reasoning structure will need some sort of a trip wire: a way to ask the model to lift its pencil and say “buddy, you’re spiraling out of control, this might be an unsolvable problem”.
😎 Mission impossible
Part of this essay was an exercise of trying to write a sketch of a reasoning structure in its entirety. Now, you and I could look at it and wonder whether or not something like a reasoning structure might even be feasible to imbue into a model.
It definitely feels a bit daunting. However, if the concept seems intriguing to you, please give it a try. Maybe hack together something very simple to start small, and expand from there. Who knows, maybe through this inquiry, the full depth of wisdom of a large language model will be revealed to us.