Built with Anchor
by Agentic Labs
paper · openreview.net
code · github.com
data · harborframework.com

ERP-Bench

300 verifiable, long-horizon agent tasks in a real ERP.

Procurement and manufacturing workflows in Odoo 19, a production-grade open-source ERP.

Maksim Ivanov · Abhijay Rana  — Agentic Labs

300
Verifiable tasks
29
Task patterns
100+
Steps per task
50+
Business rules / task

Procurement · manufacturing · sales · finance · ships in Harbor format.

Example task · multi-order fulfillment
Agent
operates
"40 widgets in 5–9 days. Stock 34. Buy parts, schedule build, send invoices — spend least."
ERP Workspace db: widget-co · user: agent
Sales › Open orders(4) due in 5–9 days
#2401
Acme Robotics
12 widgets5 Aug
#2402
Crown Distrib.
10 widgets7 Aug
#2403
Hartwell GmbH
11 widgets7 Aug
#2404
Voyager Co.
7 widgets9 Aug
In stock34
Ordered40
Short−6
agent must: choose suppliers buy parts schedule build send invoices follow policy
Database
end state
customer orders+4
parts ordered+3
products built+2
items shipped+19
invoices sent+4
cash spent−$4.8k

pass@1 · harness: pi · halted early after >500 zero-point trials.

The method behind ERP-Bench

Anchor

Compile every task from one solved constraint program — so the instruction, environment, solution, and grader can't disagree.

§ 1 — Problem

Environments drift.

Authored independently, a task's pieces disagree — leading to unsolvable tasks, broken grading, and reward hacks.

Four task artifacts with arrows showing pairwise inconsistency failure modes.
§ 2 — Method

Anchor keeps them aligned.

Formalize the workflow as a parametric CP-SAT program. The solver certifies an optimum per sample; deterministic compilers emit the task.

Anchor pipeline: parametric spec to solved instance to task.

A certified optimum lets us distinguish between good enough and perfect solutions on a sliding scale.

§ 3 — What you get

Tasks that are…

Citation
@misc{ivanov2026anchor,
  title  = {Anchor: Mitigating Artifact Drift in Agent Benchmark Generation},
  author = {Ivanov, Maksim and Rana, Abhijay},
  year   = {2026},
  url    = {https://openreview.net/forum?id=Vm6HkNyehc},
  note   = {Presented at the RLEval Workshop, ACM CAIS 2026 (non-archival)}
}