How long does it take to build an AI agent
"How long?" is the second question after "how much?". And the honest answer is not "two weeks" and not "six months". For a production AI agent that does one real job for a real business, plan 6-10 weeks. Here is the week-by-week breakdown — what is happening, what is slowing down, what you control.
Week 1 — discovery + spec
I block 5-7 working days for this and I do not skip it. The week looks like:
- Day 1-2. Process interview. I sit with the people who do the job today and document every step, every decision, every edge case. Output: a 6-10 page process doc.
- Day 3. Stack decision. Which model goes where, which integrations, where data lives, where the agent runs.
- Day 4-5. Eval set. We write 30-80 example inputs with expected outputs. This is the contract — when the agent passes these, we ship.
- Day 6-7. Spec + budget lock. One document, signed by both sides. Scope-creep door closes here.
Skipping this week is the #1 reason projects double in duration. Without an eval set you do not know what "done" means, so you keep polishing forever.
Week 2-3 — MVP + first demo
Two weeks to ship something the client can break with their own hands. Goals:
- Week 2. Wire up the model, the system prompt, and the first 2-3 tool functions. Run against half of the eval set. Target: 70%+ pass rate.
- Week 3. Hook into one real integration (usually Telegram or the client's CRM). Live demo at the end of the week — the client puts in 20 real inputs, we look at outputs together.
The demo is honest, not polished. Things break. That is the point — what breaks here is what we fix in week 4.
Week 4-6 — integration + iteration
The longest, messiest phase. Where most projects slip. Every week:
- Add 1-2 integrations (database, ERP, payment, email, calendar). Each is half-day to two days depending on API quality.
- Add tool functions (send invoice, create lead, look up order). Each function gets tested, error-handled, logged.
- Tune prompts against new edge cases from real users.
- Eval re-run at the end of every week. We track pass rate over time and refuse to ship below 90%.
Typical pace: 2 integrations + 4 tool functions per week. Speed depends almost entirely on the quality of the client's existing APIs. A clean Stripe-style API is half a day. A 2017 PHP monolith with undocumented endpoints is two weeks of unwilling reverse engineering.
Week 7-8 — production hardening
This is the week most freelancers cut. Don't. Production hardening is what separates an agent that works in demo from one that survives Black Friday traffic.
- Rate limiting and retries — what happens when OpenAI is degraded for 4 hours? When the CRM is down? The agent should queue, retry, degrade gracefully, not panic.
- Observability — Langfuse or Helicone. Every agent turn logged, every tool call traced, every cost metered. Without this you are blind on day 30.
- Alerting — Slack ping when eval pass rate drops below 90%, when latency exceeds 5 sec p95, when token bill jumps unexpectedly.
- Documentation + handover — runbook for the team, one-pager for the owner, full architecture doc for whoever inherits the codebase.
Total: 6-8 weeks for "one good agent"
That is the honest baseline for a single production agent with 2-4 integrations and 6-10 tool functions. Multi-agent systems (3+ agents with shared memory and orchestration) add 3-6 weeks on top.
What slows projects down (in order of damage)
- Undecided stakeholders. "Wait, let me ask the CFO" — every time this happens, 3-5 working days lost. Name a decision-maker on day one.
- Legacy APIs without docs. "We have an API but nobody wrote down the endpoints." Add a week per integration.
- Scope creep. "Can it also do X?" Yes, in v2. Locking scope at end of week 1 prevents this from killing the timeline.
- Holding out for a perfect MVP. The demo at end of week 3 should be ugly. Polishing here delays integration weeks by 5-10 days for zero business value.
- Eval set churn. Client keeps changing what "good" means. Lock the eval set in week 1 and only expand it, never redefine it.
What speeds projects up
- A real eval set on day 5. You know when you are done. Cuts polish-phase by 50%.
- One decision-maker reachable on Telegram within one hour during work hours. Decisions in minutes instead of days.
- Pre-existing clean APIs. If your CRM has a documented REST or GraphQL layer, you save 2-3 weeks.
- A sandbox environment. The agent can break things in staging without touching production data. Lets us iterate 5× faster.
When 6 weeks is not enough
Some projects honestly need 10-14 weeks. Common reasons:
- 5+ external integrations, each with auth flow + rate limits.
- Regulated industry (fintech, healthcare, gov) — audit log, encryption, role-based access on every action.
- Self-hosted open model (Hermes, Llama) — adds 2-3 weeks of infrastructure work alone.
- Multi-language agent (UA + EN + RU + PL) — eval set needs to triple in size, latency budget gets tighter.
When 6 weeks is too long
Some scopes can ship in 2-3 weeks. A Telegram bot that answers FAQ from a knowledge base and books a call when the user is qualified — I have shipped that in 12 working days. Single integration, single scenario, small eval set, no compliance bar.
Want a real timeline for your case?
30-minute call. I listen to the scope, the integrations, the team bandwidth, and tell you honestly: 3 weeks, 6 weeks, or "you are not ready yet, do these three things first". No padding, no hedge-the-bet schedule.