LLMs need to ask more questions. There is too much focus on taking up a task and making autonomous decisions when there are gaps. Instead the priority should be to help the user to refine the task first.
I’ve seen more questions with recent models but it’s still scratching the surface.
In Cursor I have defaulted to Gemini 2.5 Pro as my planner model, which I prompt to ask me questions. It has large context, it reasons and you can have iterative conversations with it, unlike the o-series models.
I also prompt Gemini in Google AI Studios to ask me the question it needs in order to give me a good plan.
Rather than waiting for the model to ask you questions on its own. You can just change the default mode. Because this is more ‘system prompt’ than model design.
Considering how different models require different type of prompting and context, task usage, etc to make it function optimally, is there a tool that can help select the appropriate model, tune the prompt and then run it? Like a meta reasoning model sitting atop a library of models.
What have you seen re:hallucination? My biggest issue with using o3 vs Claude sonnet/opus is that even though the output for o3 is often great I need to fact check absolutely everything. The o series seems much more willing to 'lie' / hallucinate than Claude or even 4o.
I don't have evals to prove this yet, but: I think that without enough context, and without tools, it will definitely more readily overthink + hallucinate.
Very very cool. How did you share context while keeping it usable to the model & within window? How much pre-selection & pre-cleaning did you do, and how?
LLMs need to ask more questions. There is too much focus on taking up a task and making autonomous decisions when there are gaps. Instead the priority should be to help the user to refine the task first.
I’ve seen more questions with recent models but it’s still scratching the surface.
In Cursor I have defaulted to Gemini 2.5 Pro as my planner model, which I prompt to ask me questions. It has large context, it reasons and you can have iterative conversations with it, unlike the o-series models.
Interesting, I’m using a similar setup, but usually refine outside the IDE.
I also prompt Gemini in Google AI Studios to ask me the question it needs in order to give me a good plan.
Rather than waiting for the model to ask you questions on its own. You can just change the default mode. Because this is more ‘system prompt’ than model design.
agreed
Considering how different models require different type of prompting and context, task usage, etc to make it function optimally, is there a tool that can help select the appropriate model, tune the prompt and then run it? Like a meta reasoning model sitting atop a library of models.
What have you seen re:hallucination? My biggest issue with using o3 vs Claude sonnet/opus is that even though the output for o3 is often great I need to fact check absolutely everything. The o series seems much more willing to 'lie' / hallucinate than Claude or even 4o.
I don't have evals to prove this yet, but: I think that without enough context, and without tools, it will definitely more readily overthink + hallucinate.
Very very cool. How did you share context while keeping it usable to the model & within window? How much pre-selection & pre-cleaning did you do, and how?
Does it have that same used-car salesman feel like the other OAI models?
Would be nice to see a "let her rip" prompt that we can see the output of.
Any clear alignment concerns if you tested for that? Big issue with o3 was that its unaligned and it'd be real bad if it carried over to here