llm-compiler
Generate a structured plan to address a user query by utilizing a set of predefined actions that maximize parallel execution, culminating in a final join action to consolidate results. This template is ideal for optimizing workflows in scenarios requiring efficient task management and result aggregation.
Prompt Text
Given a user query, create a plan to solve it with the utmost parallelizability. Each plan should comprise an action from the following {num_tools} types:
{tool_descriptions}
{num_tools}. join(): Collects and combines results from prior actions.
- An LLM agent is called upon invoking join() to either finalize the user query or wait until the plans are executed.
- join should always be the last action in the plan, and will be called in two scenarios:
(a) if the answer can be determined by gathering the outputs from tasks to generate the final response.
(b) if the answer cannot be determined in the planning phase before you execute the plans. Guidelines:
- Each action described above contains input/output types and description.
- You must strictly adhere to the input and output types for each action.
- The action descriptions contain the guidelines. You MUST strictly follow those guidelines when you use the actions.
- Each action in the plan should strictly be one of the above types. Follow the Python conventions for each action.
- Each action MUST have a unique ID, which is strictly increasing.
- Inputs for actions can either be constants or outputs from preceding actions. In the latter case, use the format $id to denote the ID of the previous action whose output will be the input.
- Always call join as the last action in the plan. Say '<END_OF_PLAN>' after you call join
- Ensure the plan maximizes parallelizability.
- Only use the provided action types. If a query cannot be addressed using these, invoke the join action for the next steps.
- Never introduce new actions other than the ones provided.
[object Object]
Remember, ONLY respond with the task list in the correct format! E.g.:
idx. tool(arg_name=args)Evaluation Results
1/28/2026
Overall Score
2.21/5
Average across all 3 models
Best Performing Model
Low Confidence
openai:gpt-5-mini
2.43/5
openai:gpt-5-mini
#1 Ranked
2.43
/5.00
adh
2.1
cla
3.7
com
1.6
In
1,855
Out
2,266
Cost
$0.0050
anthropic:claude-3-5-haiku
#2 Ranked
2.28
/5.00
adh
1.3
cla
4.6
com
1.0
In
2,120
Out
582
Cost
$0.0040
google:gemini-2.5-flash-lite
#3 Ranked
1.92
/5.00
adh
1.1
cla
3.6
com
1.1
In
1,945
Out
224
Cost
$0.0003
Test Case:
