Embedding pipeline sends too much context for local Ollama embedders #20
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The embedding pipeline sizes its per-input cap for hosted OpenAI
text-embedding-3-*(~8191 tokens / ~32 KiB English). When the embedding tier is pointed at a local Ollama model — e.g.nomic-embed-text(default 2048 tokens / ~8 KiB context) — we routinely send inputs larger than the model's context window. The model silently truncates (or, depending on Ollama version, errors), embedding quality degrades, and the local embedder spends much more compute than it needs to.Current behaviour
crates/ar-index/src/embed.rs:42-52:EMBED_INPUT_CAP_BYTES = 24 * 1024is hard-coded and explicitly justified by the OpenAI 8191-token limit in the doc comment.EMBED_BATCH_SIZE = 32is the only batching control; there is no aggregate token/byte cap per request.truncate_at_char_boundaryhappens at a flat byte boundary that is wildly wrong for embedders with smaller context windows (nomic-embed-text's defaultnum_ctx=2048is roughly 8 KiB, and we feed it up to 24 KiB).crates/ar-llm/src/openai.rs(the OpenAI-shaped client used for Ollama) sends nooptions.num_ctxand no Ollama-specific tuning; the per-input cap is the only guardrail.Why this matters
qwen3-coder:30breasoning +nomic-embed-textembedding on Ollama), every symbol whose snippet exceeds ~8 KiB exhausts the embedder's context window. The model emits an embedding that doesn't represent the full snippet, so RAG retrieval ranks worse on exactly the long functions where context matters most.Proposed direction
EMBED_INPUT_CAP_BYTESandEMBED_BATCH_SIZEconfigurable rather thanpub const. Reasonable env-var names:AR_EMBED_INPUT_CAP_BYTES,AR_EMBED_BATCH_SIZE.LlmProvider(e.g.embedding_context_window()returningOption<usize>) so the embed pass can size the cap from the actual model config when known.localhost:11434or any non-OpenAI host), passoptions.num_ctxexplicitly so the server doesn't fall back to its default 2048 silently.Out of scope