К основному содержимому
MMentorHub
Начать

Why retrieval beats fine-tuning when the corpus is alive.

Глава 4 из 12
Идёт урок
TX…most teams reach for fine-tuning first. It feels rigorous.
Why retrieval beats fine-tuning when the corpus is alive.
04:21
12:48
Конспект урока

When teams first reach for fine-tuning, they're usually answering the wrong question. The right question is rarely "how do I teach a model new facts?" — it's almost always "how do I keep the model honest about facts that change?"

The shape of the problem

If your knowledge base updates once a year, fine-tuning is fine. If it updates more than once a quarter, you'll spend the rest of your career re-training. Retrieval skips that loop entirely: index the new document, ship the change in seconds.

# A minimal retrieval loop
q_emb   = embed(query)
hits    = index.search(q_emb, k=8)
context = rerank(hits, query)[:3]
prompt  = template.format(context=context, q=query)
answer  = llm.complete(prompt)

What you give up

Latency, mostly — retrieval adds 80–200ms before the model starts generating. You also surrender some elegance: now you have an index, a re-ranker, and a prompt template to maintain. The tradeoff is almost always worth it. Almost.

Figure · Diagram: retrieval pipeline with vector store, BM25 hybrid, cross-encoder re-rank, LLM generator.

Three places this breaks: (1) the corpus has no clean chunks — long-form contracts, e.g. (2) the queries don't look like the documents — search engines famously suffer here (3) you need the model to generalize, not recite — retrieval can't help you with that.

Практика · 3 вопроса

Три небольшие задачи.

01

A bank's compliance team updates regulatory PDFs weekly. Choose retrieval or fine-tuning. Defend.

Подсказка · Think about the cost of a single stale answer.

02

Your queries are all single tokens ("PE ratio", "LIBOR"). Why might semantic search underperform here?

Подсказка · What's the dimensionality of a 1-token query embedding?

03

Sketch the cost curve for retrieval vs. fine-tuning across 1M, 10M, and 100M token corpora.

Подсказка · Re-training cost is roughly linear in corpus size; retrieval is roughly constant per query.

Собрано вISKRA.WORKSКейсХочу такой же →