fix(embed): size embedding pass for local Ollama (#20) #21
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "fix/embed-issue-20-config"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Resolves #20.
Summary
pub const EMBED_INPUT_CAP_BYTES/EMBED_BATCH_SIZEwith an env-readableEmbedConfig. Default cap drops from 24 KiB → 6 KiB, sized fornomic-embed-textat Ollama's defaultnum_ctx=2048.AR_EMBED_INPUT_CAP_BYTES,AR_EMBED_BATCH_SIZE,AR_EMBED_NUM_CTX. Empty / unparseable / 0 fall back to the default with a warn log (matches the existingparse_envpattern in the gateway).options.num_ctxwhen set, so a raised byte cap actually reaches the embedder when pointed at Ollama. Hosted OpenAI ignores extra fields.tracing::debug!log with cap, batch size, max input bytes, and truncation count.Why
For the documented dev setup (qwen3-coder reasoning + nomic-embed-text on Ollama, M-series Mac), every symbol with a snippet >~8 KiB exhausted the embedder's context window. The server silently truncated, RAG retrieval ranked worse on exactly the long functions where context matters, and the local embedder spent compute on bytes it couldn't actually use.
Test plan
cargo test --workspace --lib(873 passed)cargo clippy --workspace --all-targets -- -D warnings(clean)cargo fmt --check(clean)options.num_ctxserialised when set / omitted when unset.LLM_EMBEDDING_MODEL=nomic-embed-textandAR_EMBED_NUM_CTX=4096, verify embed POSTs includeoptions.num_ctxin the body viatcpdump/proxy.🤖 Generated with Claude Code
This PR refactors the embedding pass to make the input cap and batch size configurable via environment variables, improving flexibility for different embedder backends like Ollama. It also adds support for sending
options.num_ctxto Ollama to prevent silent truncation. The changes include new configuration handling, updated tests, and documentation updates.Walkthrough
EMBED_INPUT_CAP_BYTESandEMBED_BATCH_SIZEwith a configurableEmbedConfigstruct that reads from environment variables.AR_EMBED_INPUT_CAP_BYTES,AR_EMBED_BATCH_SIZE, andAR_EMBED_NUM_CTXare introduced to control embedding behavior.OpenAiProvideris updated to includeoptions.num_ctxin embedding requests when configured, which helps prevent silent truncation by Ollama.QUICKSTART.mdandauto_review.env.exampleis updated to reflect the new environment variables and their usage.Pre-merge checks
@ -64,39 +146,66 @@ fn truncate_at_char_boundary(s: String, max_bytes: usize) -> String {out🟡 Warning: Lines 146–150: The
tracing::debug!log now includestruncated_in_passwhich tracks truncation across the entire pass, but this count is not reset between batches, potentially leading to misleading logs.🟡 Warning: Lines 146–150: The
tracing::debug!log includesmax_input_byteswhich is computed as the maximum length of strings in the current batch, but this may not reflect the actual maximum input size across all batches.Accepted, fixed in
1192ab9. Was a process-wide running total surfaced inside a per-batch tracing log — the scope mismatch you flagged. Replaced withtruncated_in_batch: pre-compute a parallelVec<bool>, slice it alongside the snippet chunks, count per chunk. The log line is now scope-consistent (cap_bytes,batch_size,max_input_bytes,truncated_in_batchall per-batch).Declined, intentional.
max_input_bytesis per-batch by design. Issue #20's spec calls for "the largest input size in the batch", and the field name + doc-comment commit to that scope. A per-pass max would obscure exactly the diagnostic operators want ("which batch had the giant input"); per-batch lets you scan the log and see the offender.@ -285,16 +399,18 @@ mod tests {);🟡 Warning: Lines 399–402: The test
oversized_symbol_snippet_is_truncated_before_embed_callusescfg.input_cap_byteswhich is now configurable, but the test assumes a default value of 6 KiB. This could be made more robust by using a configurable default or explicit test values.Accepted, fixed in
1192ab9. ReplacedEmbedConfig::default()with explicitconst CAP: usize = 1024so the regression target is stated directly and stays valid if the default ever moves. Test readscfg.input_cap_bytesonly via the explicitcfgit constructs, not viadefault().Addressed in
1192ab9: replied to review feedback in commit message — per-batch truncation count (#1845) and explicit cap in test (#1847) accepted; max_input_bytes scope (#1846) intentionally per-batch per the issue spec.1192ab9a9c1530aeab66This PR refactors the embedding pass to be more configurable and robust, especially for local Ollama setups. It introduces
EmbedConfigwith environment variable overrides, improves truncation handling, and ensures that Ollama'snum_ctxoption is properly sent to avoid silent truncation. The changes also include better test coverage for the new configuration logic.Walkthrough
EMBED_INPUT_CAP_BYTESandEMBED_BATCH_SIZEwith a configurableEmbedConfigstruct that reads from environment variables.EmbedConfigstruct allows operators to tune the byte cap and batch size for different embedding backends, particularly local Ollama models.with_embed_num_ctxmethod is added toOpenAiProviderto explicitly sendoptions.num_ctxto Ollama, preventing silent truncation.EmbedRequeststruct now conditionally serializes theoptionsfield to maintain compatibility with hosted OpenAI.QUICKSTART.mdandauto_review.env.examplehas been updated to reflect the new environment variables and their usage.Pre-merge checks
This PR refactors the embedding pass to be more configurable and robust, especially for local Ollama setups. It introduces
EmbedConfigwith environment variable overrides, improves truncation handling, and ensures that Ollama'snum_ctxoption is properly sent to avoid silent truncation. The changes also include better test coverage for the new configuration logic.Walkthrough
EMBED_INPUT_CAP_BYTESandEMBED_BATCH_SIZEwith a configurableEmbedConfigstruct that reads from environment variables.EmbedConfigstruct allows operators to tune the byte cap and batch size for different embedding backends, particularly local Ollama models.with_embed_num_ctxmethod is added toOpenAiProviderto explicitly sendoptions.num_ctxto Ollama, preventing silent truncation.EmbedRequeststruct now conditionally serializes theoptionsfield to maintain compatibility with hosted OpenAI.QUICKSTART.mdandauto_review.env.examplehas been updated to reflect the new environment variables and their usage.Pre-merge checks