AI Notes · Startups · Engineering
When to Build (Fine-tune) vs. Orchestrate Third-Party APIs
A startup decision framework for choosing between custom models and best-in-class LLM APIs — orchestrate first, fine-tune only when evidence supports it.
Most AI startups face the same fork early: invest in fine-tuning and custom models, or orchestrate third-party APIs and compete on product. The wrong default is expensive — fine-tuning too early locks in cost, complexity, and maintenance before you have proof it matters.

The 7 steps
Text summary of the flowchart above.
1. Define the problem
State the outcome and success metrics before picking a stack.
2. Do I need an LLM?
If rules, search, traditional ML, or workflow automation suffice, skip the LLM.
3. Is it a core differentiator?
If not, orchestrate APIs and focus on product, UX, and data.
4. Start with third-party APIs
Test on real customer data. Measure accuracy, latency, cost, failures, security, and post-processing.
5. Where are the gaps?
If APIs meet the bar, optimize prompts, add RAG, and improve workflows.
6. Consider fine-tuning
Only when you need specific formatting, domain jargon, tone, extraction patterns, or consistency prompting cannot reach.
7. Cost-benefit analysis
Fine-tune only with clear ROI in accuracy, cost at scale, reliability, or defensibility.
When each approach wins
Fine-tune / build
- Measurable accuracy improvement
- Lower total cost at scale
- Reliable, consistent outputs
- Core differentiator and defensible advantage
Orchestrate APIs (default)
- Faster time to market
- Lower upfront cost and maintenance
- Leverage the best models continuously
- Focus on product, UX, and the data flywheel
Expensive traps to avoid
- Fine-tuning too early — locking in cost and complexity before validating the need
- Chasing small gains — tiny accuracy bumps that do not justify ongoing spend
- Ignoring total cost of ownership — data labeling, training infra, evals, and monitoring add up
- Owning the maintenance burden — models drift; you own updates, evals, and incident response
- Overfitting to current needs— today's data may not generalize to tomorrow's use cases
The golden rule
Orchestrate first. Evaluate deeply. Fine-tune only when there is clear evidence of ROI in accuracy, cost, reliability, or defensibility.