AI Notes · Startups · Engineering

When to Build (Fine-tune) vs. Orchestrate Third-Party APIs

A startup decision framework for choosing between custom models and best-in-class LLM APIs — orchestrate first, fine-tune only when evidence supports it.

Fine-tuningLLM APIsDecision framework

Most AI startups face the same fork early: invest in fine-tuning and custom models, or orchestrate third-party APIs and compete on product. The wrong default is expensive — fine-tuning too early locks in cost, complexity, and maintenance before you have proof it matters.

Decision flowchart: seven steps from defining the problem through cost-benefit analysis, comparing fine-tune vs orchestrate third-party APIs, with traps to avoid and a golden rule
When to Build (Fine-tune) vs. Orchestrate Third-Party APIs — A Startup Decision Framework

The 7 steps

Text summary of the flowchart above.

  1. 1. Define the problem

    State the outcome and success metrics before picking a stack.

  2. 2. Do I need an LLM?

    If rules, search, traditional ML, or workflow automation suffice, skip the LLM.

  3. 3. Is it a core differentiator?

    If not, orchestrate APIs and focus on product, UX, and data.

  4. 4. Start with third-party APIs

    Test on real customer data. Measure accuracy, latency, cost, failures, security, and post-processing.

  5. 5. Where are the gaps?

    If APIs meet the bar, optimize prompts, add RAG, and improve workflows.

  6. 6. Consider fine-tuning

    Only when you need specific formatting, domain jargon, tone, extraction patterns, or consistency prompting cannot reach.

  7. 7. Cost-benefit analysis

    Fine-tune only with clear ROI in accuracy, cost at scale, reliability, or defensibility.

When each approach wins

Fine-tune / build

  • Measurable accuracy improvement
  • Lower total cost at scale
  • Reliable, consistent outputs
  • Core differentiator and defensible advantage

Orchestrate APIs (default)

  • Faster time to market
  • Lower upfront cost and maintenance
  • Leverage the best models continuously
  • Focus on product, UX, and the data flywheel

Expensive traps to avoid

  • Fine-tuning too early — locking in cost and complexity before validating the need
  • Chasing small gains — tiny accuracy bumps that do not justify ongoing spend
  • Ignoring total cost of ownership — data labeling, training infra, evals, and monitoring add up
  • Owning the maintenance burden — models drift; you own updates, evals, and incident response
  • Overfitting to current needs— today's data may not generalize to tomorrow's use cases

The golden rule

Orchestrate first. Evaluate deeply. Fine-tune only when there is clear evidence of ROI in accuracy, cost, reliability, or defensibility.