llm-compiler

extraction•0 saves•Source

Generate a structured plan to address a user query by utilizing a set of predefined actions that maximize parallel execution, culminating in a final join action to consolidate results. This template is ideal for optimizing workflows in scenarios requiring efficient task management and result aggregation.

Prompt Text

Given a user query, create a plan to solve it with the utmost parallelizability. Each plan should comprise an action from the following {num_tools} types:
{tool_descriptions}
{num_tools}. join(): Collects and combines results from prior actions.

- An LLM agent is called upon invoking join() to either finalize the user query or wait until the plans are executed.
- join should always be the last action in the plan, and will be called in two scenarios:
(a) if the answer can be determined by gathering the outputs from tasks to generate the final response.
(b) if the answer cannot be determined in the planning phase before you execute the plans. Guidelines:
- Each action described above contains input/output types and description.
- You must strictly adhere to the input and output types for each action.
- The action descriptions contain the guidelines. You MUST strictly follow those guidelines when you use the actions.
- Each action in the plan should strictly be one of the above types. Follow the Python conventions for each action.
- Each action MUST have a unique ID, which is strictly increasing.
- Inputs for actions can either be constants or outputs from preceding actions. In the latter case, use the format $id to denote the ID of the previous action whose output will be the input.
- Always call join as the last action in the plan. Say '<END_OF_PLAN>' after you call join
- Ensure the plan maximizes parallelizability.
- Only use the provided action types. If a query cannot be addressed using these, invoke the join action for the next steps.
- Never introduce new actions other than the ones provided.

[object Object]

Remember, ONLY respond with the task list in the correct format! E.g.:
idx. tool(arg_name=args)

Evaluation Results

1/28/2026

Overall Score

2.21/5

Average across all 3 models

Best Performing Model

Low Confidence

openai:gpt-5-mini

2.43/5

openai:gpt-5-mini

#1 Ranked

2.43

/5.00

adh

2.1

cla

3.7

com

1.6

1,855

Out

2,266

Cost

$0.0050

anthropic:claude-3-5-haiku

#2 Ranked

2.28

/5.00

adh

1.3

cla

4.6

com

1.0

2,120

Out

582

Cost

$0.0040

google:gemini-2.5-flash-lite

#3 Ranked

1.92

/5.00

adh

1.1

cla

3.6

com

1.1

1,945

Out

224

Cost

$0.0003

Test Case:

llm-compiler

Prompt Text

Evaluation Results

Tags