AI Notes · Startups · Engineering

When to Build (Fine-tune) vs. Orchestrate Third-Party APIs

A startup decision framework for choosing between custom models and best-in-class LLM APIs — orchestrate first, fine-tune only when evidence supports it.

Fine-tuningLLM APIsDecision framework

Most AI startups face the same fork early: invest in fine-tuning and custom models, or orchestrate third-party APIs and compete on product. The wrong default is expensive — fine-tuning too early locks in cost, complexity, and maintenance before you have proof it matters.

Decision flowchart: seven steps from defining the problem through cost-benefit analysis, comparing fine-tune vs orchestrate third-party APIs, with traps to avoid and a golden rule — When to Build (Fine-tune) vs. Orchestrate Third-Party APIs — A Startup Decision Framework

The 7 steps

Text summary of the flowchart above.

1. Define the problem
State the outcome and success metrics before picking a stack.
2. Do I need an LLM?
If rules, search, traditional ML, or workflow automation suffice, skip the LLM.
3. Is it a core differentiator?
If not, orchestrate APIs and focus on product, UX, and data.
4. Start with third-party APIs
Test on real customer data. Measure accuracy, latency, cost, failures, security, and post-processing.
5. Where are the gaps?
If APIs meet the bar, optimize prompts, add RAG, and improve workflows.
6. Consider fine-tuning
Only when you need specific formatting, domain jargon, tone, extraction patterns, or consistency prompting cannot reach.
7. Cost-benefit analysis
Fine-tune only with clear ROI in accuracy, cost at scale, reliability, or defensibility.

When each approach wins

Fine-tune / build

Measurable accuracy improvement
Lower total cost at scale
Reliable, consistent outputs
Core differentiator and defensible advantage

Orchestrate APIs (default)

Faster time to market
Lower upfront cost and maintenance
Leverage the best models continuously
Focus on product, UX, and the data flywheel

Expensive traps to avoid

Fine-tuning too early — locking in cost and complexity before validating the need
Chasing small gains — tiny accuracy bumps that do not justify ongoing spend
Ignoring total cost of ownership — data labeling, training infra, evals, and monitoring add up
Owning the maintenance burden — models drift; you own updates, evals, and incident response
Overfitting to current needs— today's data may not generalize to tomorrow's use cases

The golden rule

Orchestrate first. Evaluate deeply. Fine-tune only when there is clear evidence of ROI in accuracy, cost, reliability, or defensibility.