Back to prompts

invoices_from_ocr

summarization0 savesSource

Generate an invoice based on OCR-interpreted data, utilizing expense accounts and historical invoice patterns to ensure accurate accounting. This prompt is designed for accountants to streamline invoice creation while adhering to specific rules for account selection and error handling.

Prompt Text

Act as an accountant and use the provided data to create an invoice and it's accounting.

## Input
- Data interpreted by OCR of the invoice:
{data}

- Chat of expense accounts:
{accounts}

Important: Each account may optionally include the field `includedInBudget?: boolean`. If `true`, always give it **priority** when selecting between multiple suitable accounts.

- Previous invoices for this supplier and client:
{previousInvoices}

- CUPS codes:
{CUPS}
Format: ["cups": "ES0021000016520242YY", "account": "Location description"]
Match by first 20 characters (ignore suffixes like "0F")


### How to use previous invoices:
1. Prefer the same `accountId` previously used for a similar concept.
2. If there is a clear pattern of account assignment, follow it.
3. If the previous invoices contain a pattern in their account assignment, follow that pattern when assigning accounts.
4. Use past invoice amounts as a reference for assigning tax rates and total calculations when applicable.

## Output
Answer using this JSON schema:
{{
  "type": "array",
   "properties": "{{
        "description":  ["string", "null"],
        "units": ["number", "null"],
        "tax_rate": ["number", "null"],
        "total":  ["number", "null"],
        "accountId":  ["number", "null"]
   }}",
  "required": ["description", "units", "tax_rate", "total", "accountId"]
}}

## Rare Conditions (always apply before Case Decision)
1. If ANY OCR line_item has `amount = null`, `total = null`, `total = 0`, or missing amount fields → you MUST execute Case 3. This rule helps with semantic or historical logic.
2. If the sum of all OCR item totals does NOT match the invoice net_amount or total_amount_without_tax → you MUST execute Case 3. No exceptions. You must ignore semantic matching and previous invoices in this situation.
3. When Case 3 is chosen you MUST output exactly one item (or two if IRPF applies). You MUST collapse all OCR items into a single summarized line. You MUST NOT return multiple items in Case 3.
4. When executing Case 3, you MUST apply the Case 3 description rules including:
   - “Factura – [supplier_name] – [service]”
   - If a matching previous invoice does NOT contain the current invoice_id inside its concept field, append “ – [invoice_id]” at the end.

##These rules override all other logic and are evaluated before selecting Case 1, Case 2, or Case 3. 
[Case 1] **Detailed version**
- Use for cases where there is a clear account for each item
- Return one output line per OCR item
- For each item copy from OCR: `description`, `units`, `tax_rate`, and `total` from each item data in OCR.
- Get the account related to each item by searching in for semantic matches with the item description.
- If there are multiple possible matching accounts:
-- Prioritize those that were used in similar past invoices.
-- **But always choose the account with `includedInBudget: true` over any other.**
-- If previous invoices show a strong pattern, follow that pattern for account selection.
-- If there are CUPS available, then match CUPS code to write the accountId Format: [“cups”: “ES0021000016520242YY”, “account”: “Location description”]
---Match by first 20 characters (ignore suffixes like “0F”)
- If still uncertain between multiple valid accounts, pick the most specific one aligned with the item description 
- If IRPF exists: append an additional line item:
--description: "IRPF retenido ([tax_rate]%)"
--units: 1
--tax_rate: null
--total: absolute value of IRPF amount
--accountId: "4751"

[Case 2] **when exact matches are not found**
- **Still return all items from the OCR** with their respective descriptions, units, tax_rate, and total.
- Instead of searching for a specific account for each item, **assign the same accountId to all items**.
- Choose the `accountId` that best represents the invoice based on past invoices or, if not available, the most generic expense account.
- The `tax_rate` and `total` should remain per item as extracted from the OCR.
- **If multiple options are possible, give priority to accounts with `includedInBudget: true`.**
- If no account clearly fits, use the most general expense account available
- Format the resulting description in sentence case (all lowercase except the first letter capitalized).


[Case 3] One account 
- Output one item:
- description: short summary of the bill and supplier, in the bill’s language.
- items set to 1
- In the case items don't match total, return only one item with the total and in the description a summary of the bill and provider using the language of the invoice
- accountId: choose using the same priority rules (budget flag > historical pattern > semantic match).
- Format the resulting description in sentence case (all lowercase except the first letter capitalized)
- After generating the Case 3 description: Look at all concept fields inside {previousInvoices}, If none of those concepts contain the current invoice_id from {data}, then append " - [invoice_id]" to the end of the final description. (example: "Factura - supplier_name - service - invoice_id")

### CASE DECISION (evaluate IN ORDER)
1. If sum of all OCR item "total" values does not match the invoice total, use case 3.
2. If there 6 or more items use case 3
3. If it's a complex invoice like electricity, gas, includes budgets and other data, use case 3.
4. Else if you can confidently map each item to an account use case 1
5. Else If you are unsure of any of the accounts to match use case 2
6. Else use case 2


##Description construction (applies ALWAYS). 
For every line item (including IRPF), build the final description this way:
--"Factura – [supplier_name] – [description_in_sentence_case] – [service]
-- Where:
---[description_in_sentence_case] = the OCR item’s description normalized to sentence case
(example: “Mantenimiento ANTENA TERRESTRE” → “mantenimiento antena terrestre”)
---[supplier_name] = data.supplier_name
---[service] = use data.service.
    - If data.service is null, empty, undefined, or not a valid string,
      replace it with: " "
- After building the description, if NONE of the concept fields in "previousInvoices" contain "[invoice_id]", append: "- [[invoice_id]]"
- The final description REPLACES the OCR description

## Final description rules
- Take the OCR item description.
- Normalize it to sentence case.
- If the description has 15 words or fewer, keep it as is.
- If it exceeds 15 words, generate a concise summary of at most 15 words, keeping the original meaning.
-- The summary must NOT invent technical details.
-- The summary must preserve the main action + object.
-- The style must remain neutral and factual.
- Use this summarized version as [description_in_sentence_case] in: "Factura – [supplier_name] – [description_in_sentence_case] – [service]"
- Then:If no previousInvoices.concept contains the current invoice_id, append: "– [invoice_id]".

## Important Instructions:
- The total field must represent the amount excluding tax.
- accountId: choose using the following priority rules (budget flag > historical pattern > semantic match).
- Prioritize accounts that are includedInBudget: true
- If any field is uncertain, set its value to null.
- Ensure the returned JSON is a single-line JSON array ready for parsing with JavaScript's JSON.parse method.
- **Do not include any extra formatting such as newlines, code block syntax, or wrappers like ```json.**
- Validate the output JSON to ensure it conforms to standard JSON syntax.

Evaluation Results

1/28/2026
Overall Score
2.63/5

Average across all 3 models

Best Performing Model
Low Confidence
anthropic:claude-3-5-haiku
2.90/5
anthropic:claude-3-5-haiku
#1 Ranked
2.90
/5.00
adh
2.1
cla
4.9
com
1.7
In
9,780
Out
722
Cost
$0.0107
openai:gpt-5-mini
#2 Ranked
2.60
/5.00
adh
1.8
cla
4.3
com
1.7
In
8,555
Out
3,078
Cost
$0.0083
google:gemini-2.5-flash-lite
#3 Ranked
2.38
/5.00
adh
1.3
cla
4.3
com
1.6
In
9,345
Out
1,201
Cost
$0.0014
Test Case:

Tags