fix(embed): size embedding pass for local Ollama (#20) #21

Merged
jwilger merged 2 commits from fix/embed-issue-20-config into main 2026-05-01 20:36:18 -07:00
Owner

Resolves #20.

Summary

  • Replace pub const EMBED_INPUT_CAP_BYTES / EMBED_BATCH_SIZE with an env-readable EmbedConfig. Default cap drops from 24 KiB → 6 KiB, sized for nomic-embed-text at Ollama's default num_ctx=2048.
  • Add AR_EMBED_INPUT_CAP_BYTES, AR_EMBED_BATCH_SIZE, AR_EMBED_NUM_CTX. Empty / unparseable / 0 fall back to the default with a warn log (matches the existing parse_env pattern in the gateway).
  • The OpenAI-compat embed client now serialises options.num_ctx when set, so a raised byte cap actually reaches the embedder when pointed at Ollama. Hosted OpenAI ignores extra fields.
  • Per-batch tracing::debug! log with cap, batch size, max input bytes, and truncation count.

Why

For the documented dev setup (qwen3-coder reasoning + nomic-embed-text on Ollama, M-series Mac), every symbol with a snippet >~8 KiB exhausted the embedder's context window. The server silently truncated, RAG retrieval ranked worse on exactly the long functions where context matters, and the local embedder spent compute on bytes it couldn't actually use.

Test plan

  • cargo test --workspace --lib (873 passed)
  • cargo clippy --workspace --all-targets -- -D warnings (clean)
  • cargo fmt --check (clean)
  • New unit tests cover: default values, env-var override, fall-through on empty/unparseable/0, custom cap honoured at runtime, custom batch size honoured at runtime, options.num_ctx serialised when set / omitted when unset.
  • Manual: point gateway at a local Ollama with LLM_EMBEDDING_MODEL=nomic-embed-text and AR_EMBED_NUM_CTX=4096, verify embed POSTs include options.num_ctx in the body via tcpdump/proxy.

🤖 Generated with Claude Code

Resolves #20. ## Summary - Replace `pub const EMBED_INPUT_CAP_BYTES` / `EMBED_BATCH_SIZE` with an env-readable `EmbedConfig`. Default cap drops from 24 KiB → 6 KiB, sized for `nomic-embed-text` at Ollama's default `num_ctx=2048`. - Add `AR_EMBED_INPUT_CAP_BYTES`, `AR_EMBED_BATCH_SIZE`, `AR_EMBED_NUM_CTX`. Empty / unparseable / 0 fall back to the default with a warn log (matches the existing `parse_env` pattern in the gateway). - The OpenAI-compat embed client now serialises `options.num_ctx` when set, so a raised byte cap actually reaches the embedder when pointed at Ollama. Hosted OpenAI ignores extra fields. - Per-batch `tracing::debug!` log with cap, batch size, max input bytes, and truncation count. ## Why For the documented dev setup (qwen3-coder reasoning + nomic-embed-text on Ollama, M-series Mac), every symbol with a snippet >~8 KiB exhausted the embedder's context window. The server silently truncated, RAG retrieval ranked worse on exactly the long functions where context matters, and the local embedder spent compute on bytes it couldn't actually use. ## Test plan - [x] `cargo test --workspace --lib` (873 passed) - [x] `cargo clippy --workspace --all-targets -- -D warnings` (clean) - [x] `cargo fmt --check` (clean) - [x] New unit tests cover: default values, env-var override, fall-through on empty/unparseable/0, custom cap honoured at runtime, custom batch size honoured at runtime, `options.num_ctx` serialised when set / omitted when unset. - [ ] Manual: point gateway at a local Ollama with `LLM_EMBEDDING_MODEL=nomic-embed-text` and `AR_EMBED_NUM_CTX=4096`, verify embed POSTs include `options.num_ctx` in the body via `tcpdump`/proxy. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
fix(embed): size embedding pass for local Ollama (#20)
All checks were successful
CI / Nix flake check (pull_request) Successful in 1m44s
auto_review auto_review: 3 warnings
52becc0b44
Defaults assumed hosted OpenAI's 8191-token / ~32 KiB ceiling and
hard-coded the per-input cap accordingly. Pointed at Ollama's
nomic-embed-text (default num_ctx=2048, ~8 KiB), the pipeline
silently truncated server-side, embedding quality dropped on long
symbols, and the local embedder spent extra wall-clock on inputs
the model couldn't actually process.

- Replace pub const EMBED_INPUT_CAP_BYTES / EMBED_BATCH_SIZE with
  EmbedConfig (env-readable). Defaults drop to 6 KiB cap / batch
  32 — safe for nomic-embed-text. Operators on hosted embedders
  raise via AR_EMBED_INPUT_CAP_BYTES.
- AR_EMBED_BATCH_SIZE for batch tuning. Empty / unparseable / 0
  fall back to the default with a warn log.
- AR_EMBED_NUM_CTX wires options.num_ctx onto the OpenAI-compat
  /v1/embeddings POST; honoured by Ollama, ignored by hosted
  OpenAI. Lets a raised byte cap actually reach the embedder.
- Per-batch tracing::debug! log with the cap, batch size, max
  input bytes, and truncation count — so the next person
  debugging "why is RAG returning garbage on local Ollama" sees
  truncation immediately.

QUICKSTART + systemd env example document the new knobs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
auto-review left a comment

This PR refactors the embedding pass to make the input cap and batch size configurable via environment variables, improving flexibility for different embedder backends like Ollama. It also adds support for sending options.num_ctx to Ollama to prevent silent truncation. The changes include new configuration handling, updated tests, and documentation updates.

Walkthrough

  • The PR replaces hardcoded constants EMBED_INPUT_CAP_BYTES and EMBED_BATCH_SIZE with a configurable EmbedConfig struct that reads from environment variables.
  • New environment variables AR_EMBED_INPUT_CAP_BYTES, AR_EMBED_BATCH_SIZE, and AR_EMBED_NUM_CTX are introduced to control embedding behavior.
  • The OpenAiProvider is updated to include options.num_ctx in embedding requests when configured, which helps prevent silent truncation by Ollama.
  • Tests are added to verify the new configuration logic and ensure that the embedding pass behaves correctly with custom settings.
  • Documentation in QUICKSTART.md and auto_review.env.example is updated to reflect the new environment variables and their usage.

Pre-merge checks

  • CHANGELOG updated — non-trivial code changed but CHANGELOG.md isn't in the diff
  • Tests touched — source changed but no test file appears in the diff
  • No new TODO/FIXME comments — no new TODO/FIXME markers
This PR refactors the embedding pass to make the input cap and batch size configurable via environment variables, improving flexibility for different embedder backends like Ollama. It also adds support for sending `options.num_ctx` to Ollama to prevent silent truncation. The changes include new configuration handling, updated tests, and documentation updates. ## Walkthrough - The PR replaces hardcoded constants `EMBED_INPUT_CAP_BYTES` and `EMBED_BATCH_SIZE` with a configurable `EmbedConfig` struct that reads from environment variables. - New environment variables `AR_EMBED_INPUT_CAP_BYTES`, `AR_EMBED_BATCH_SIZE`, and `AR_EMBED_NUM_CTX` are introduced to control embedding behavior. - The `OpenAiProvider` is updated to include `options.num_ctx` in embedding requests when configured, which helps prevent silent truncation by Ollama. - Tests are added to verify the new configuration logic and ensure that the embedding pass behaves correctly with custom settings. - Documentation in `QUICKSTART.md` and `auto_review.env.example` is updated to reflect the new environment variables and their usage. ## Pre-merge checks - [ ] CHANGELOG updated — non-trivial code changed but CHANGELOG.md isn't in the diff - [ ] Tests touched — source changed but no test file appears in the diff - [x] No new TODO/FIXME comments — no new TODO/FIXME markers
@ -64,39 +146,66 @@ fn truncate_at_char_boundary(s: String, max_bytes: usize) -> String {
out
Collaborator

🟡 Warning: Lines 146–150: The tracing::debug! log now includes truncated_in_pass which tracks truncation across the entire pass, but this count is not reset between batches, potentially leading to misleading logs.

🟡 **Warning:** **Lines 146–150:** The `tracing::debug!` log now includes `truncated_in_pass` which tracks truncation across the entire pass, but this count is not reset between batches, potentially leading to misleading logs.
Collaborator

🟡 Warning: Lines 146–150: The tracing::debug! log includes max_input_bytes which is computed as the maximum length of strings in the current batch, but this may not reflect the actual maximum input size across all batches.

🟡 **Warning:** **Lines 146–150:** The `tracing::debug!` log includes `max_input_bytes` which is computed as the maximum length of strings in the current batch, but this may not reflect the actual maximum input size across all batches.
Author
Owner

Accepted, fixed in 1192ab9. Was a process-wide running total surfaced inside a per-batch tracing log — the scope mismatch you flagged. Replaced with truncated_in_batch: pre-compute a parallel Vec<bool>, slice it alongside the snippet chunks, count per chunk. The log line is now scope-consistent (cap_bytes, batch_size, max_input_bytes, truncated_in_batch all per-batch).

**Accepted, fixed in 1192ab9.** Was a process-wide running total surfaced inside a per-batch tracing log — the scope mismatch you flagged. Replaced with `truncated_in_batch`: pre-compute a parallel `Vec<bool>`, slice it alongside the snippet chunks, count per chunk. The log line is now scope-consistent (`cap_bytes`, `batch_size`, `max_input_bytes`, `truncated_in_batch` all per-batch).
Author
Owner

Declined, intentional. max_input_bytes is per-batch by design. Issue #20's spec calls for "the largest input size in the batch", and the field name + doc-comment commit to that scope. A per-pass max would obscure exactly the diagnostic operators want ("which batch had the giant input"); per-batch lets you scan the log and see the offender.

**Declined, intentional.** `max_input_bytes` is per-batch by design. Issue #20's spec calls for "the largest input size in the batch", and the field name + doc-comment commit to that scope. A per-pass max would obscure exactly the diagnostic operators want ("which batch had the giant input"); per-batch lets you scan the log and see the offender.
jwilger marked this conversation as resolved
@ -285,16 +399,18 @@ mod tests {
);
Collaborator

🟡 Warning: Lines 399–402: The test oversized_symbol_snippet_is_truncated_before_embed_call uses cfg.input_cap_bytes which is now configurable, but the test assumes a default value of 6 KiB. This could be made more robust by using a configurable default or explicit test values.

🟡 **Warning:** **Lines 399–402:** The test `oversized_symbol_snippet_is_truncated_before_embed_call` uses `cfg.input_cap_bytes` which is now configurable, but the test assumes a default value of 6 KiB. This could be made more robust by using a configurable default or explicit test values.
Author
Owner

Accepted, fixed in 1192ab9. Replaced EmbedConfig::default() with explicit const CAP: usize = 1024 so the regression target is stated directly and stays valid if the default ever moves. Test reads cfg.input_cap_bytes only via the explicit cfg it constructs, not via default().

**Accepted, fixed in 1192ab9.** Replaced `EmbedConfig::default()` with explicit `const CAP: usize = 1024` so the regression target is stated directly and stays valid if the default ever moves. Test reads `cfg.input_cap_bytes` only via the explicit `cfg` it constructs, not via `default()`.
jwilger marked this conversation as resolved
fix(embed): per-batch truncation count + explicit cap in test
All checks were successful
CI / Nix flake check (pull_request) Successful in 1m45s
auto_review auto_review: no findings
1192ab9a9c
Address review feedback on PR #21:

- truncated_in_pass was a process-wide running total surfaced
  inside a per-batch tracing log, so an operator reading two
  consecutive lines would see the same number rise across
  batches without any way to attribute the rise to a particular
  batch. Pre-compute a parallel Vec<bool>, slice it alongside
  the snippet chunks, and emit truncated_in_batch — the value
  now agrees in scope with cap_bytes / batch_size /
  max_input_bytes already in the same log line.
- oversized_symbol_snippet_is_truncated_before_embed_call read
  EmbedConfig::default() and asserted against it; the test
  passed for the wrong reason if the default ever changed.
  Pin to an explicit CAP=1024 so the regression target stays
  stable independent of the default.

(max_input_bytes is intentionally per-batch — the issue spec
asks for "the largest input size in the batch" — so left
unchanged.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
Owner

Addressed in 1192ab9: replied to review feedback in commit message — per-batch truncation count (#1845) and explicit cap in test (#1847) accepted; max_input_bytes scope (#1846) intentionally per-batch per the issue spec.

Addressed in 1192ab9: replied to review feedback in commit message — per-batch truncation count (#1845) and explicit cap in test (#1847) accepted; max_input_bytes scope (#1846) intentionally per-batch per the issue spec.
jwilger force-pushed fix/embed-issue-20-config from 1192ab9a9c
All checks were successful
CI / Nix flake check (pull_request) Successful in 1m45s
auto_review auto_review: no findings
to 1530aeab66
All checks were successful
CI / Nix flake check (pull_request) Successful in 4s
auto_review auto_review: no findings
2026-05-01 20:17:06 -07:00
Compare
auto-review left a comment

This PR refactors the embedding pass to be more configurable and robust, especially for local Ollama setups. It introduces EmbedConfig with environment variable overrides, improves truncation handling, and ensures that Ollama's num_ctx option is properly sent to avoid silent truncation. The changes also include better test coverage for the new configuration logic.

Walkthrough

  • The PR replaces fixed constants EMBED_INPUT_CAP_BYTES and EMBED_BATCH_SIZE with a configurable EmbedConfig struct that reads from environment variables.
  • The EmbedConfig struct allows operators to tune the byte cap and batch size for different embedding backends, particularly local Ollama models.
  • A new with_embed_num_ctx method is added to OpenAiProvider to explicitly send options.num_ctx to Ollama, preventing silent truncation.
  • The EmbedRequest struct now conditionally serializes the options field to maintain compatibility with hosted OpenAI.
  • Unit tests have been added to cover the new configuration logic and ensure correct behavior with various environment variable inputs.
  • Documentation in QUICKSTART.md and auto_review.env.example has been updated to reflect the new environment variables and their usage.

Pre-merge checks

  • CHANGELOG updated — non-trivial code changed but CHANGELOG.md isn't in the diff
  • Tests touched — source changed but no test file appears in the diff
  • No new TODO/FIXME comments — no new TODO/FIXME markers
This PR refactors the embedding pass to be more configurable and robust, especially for local Ollama setups. It introduces `EmbedConfig` with environment variable overrides, improves truncation handling, and ensures that Ollama's `num_ctx` option is properly sent to avoid silent truncation. The changes also include better test coverage for the new configuration logic. ## Walkthrough - The PR replaces fixed constants `EMBED_INPUT_CAP_BYTES` and `EMBED_BATCH_SIZE` with a configurable `EmbedConfig` struct that reads from environment variables. - The `EmbedConfig` struct allows operators to tune the byte cap and batch size for different embedding backends, particularly local Ollama models. - A new `with_embed_num_ctx` method is added to `OpenAiProvider` to explicitly send `options.num_ctx` to Ollama, preventing silent truncation. - The `EmbedRequest` struct now conditionally serializes the `options` field to maintain compatibility with hosted OpenAI. - Unit tests have been added to cover the new configuration logic and ensure correct behavior with various environment variable inputs. - Documentation in `QUICKSTART.md` and `auto_review.env.example` has been updated to reflect the new environment variables and their usage. ## Pre-merge checks - [ ] CHANGELOG updated — non-trivial code changed but CHANGELOG.md isn't in the diff - [ ] Tests touched — source changed but no test file appears in the diff - [x] No new TODO/FIXME comments — no new TODO/FIXME markers
auto-review left a comment

This PR refactors the embedding pass to be more configurable and robust, especially for local Ollama setups. It introduces EmbedConfig with environment variable overrides, improves truncation handling, and ensures that Ollama's num_ctx option is properly sent to avoid silent truncation. The changes also include better test coverage for the new configuration logic.

Walkthrough

  • The PR replaces fixed constants EMBED_INPUT_CAP_BYTES and EMBED_BATCH_SIZE with a configurable EmbedConfig struct that reads from environment variables.
  • The EmbedConfig struct allows operators to tune the byte cap and batch size for different embedding backends, particularly local Ollama models.
  • A new with_embed_num_ctx method is added to OpenAiProvider to explicitly send options.num_ctx to Ollama, preventing silent truncation.
  • The EmbedRequest struct now conditionally serializes the options field to maintain compatibility with hosted OpenAI.
  • Unit tests have been added to cover the new configuration logic and ensure correct behavior with various environment variable inputs.
  • Documentation in QUICKSTART.md and auto_review.env.example has been updated to reflect the new environment variables and their usage.

Pre-merge checks

  • CHANGELOG updated — non-trivial code changed but CHANGELOG.md isn't in the diff
  • Tests touched — source changed but no test file appears in the diff
  • No new TODO/FIXME comments — no new TODO/FIXME markers
This PR refactors the embedding pass to be more configurable and robust, especially for local Ollama setups. It introduces `EmbedConfig` with environment variable overrides, improves truncation handling, and ensures that Ollama's `num_ctx` option is properly sent to avoid silent truncation. The changes also include better test coverage for the new configuration logic. ## Walkthrough - The PR replaces fixed constants `EMBED_INPUT_CAP_BYTES` and `EMBED_BATCH_SIZE` with a configurable `EmbedConfig` struct that reads from environment variables. - The `EmbedConfig` struct allows operators to tune the byte cap and batch size for different embedding backends, particularly local Ollama models. - A new `with_embed_num_ctx` method is added to `OpenAiProvider` to explicitly send `options.num_ctx` to Ollama, preventing silent truncation. - The `EmbedRequest` struct now conditionally serializes the `options` field to maintain compatibility with hosted OpenAI. - Unit tests have been added to cover the new configuration logic and ensure correct behavior with various environment variable inputs. - Documentation in `QUICKSTART.md` and `auto_review.env.example` has been updated to reflect the new environment variables and their usage. ## Pre-merge checks - [ ] CHANGELOG updated — non-trivial code changed but CHANGELOG.md isn't in the diff - [ ] Tests touched — source changed but no test file appears in the diff - [x] No new TODO/FIXME comments — no new TODO/FIXME markers
jwilger deleted branch fix/embed-issue-20-config 2026-05-01 20:36:18 -07:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
jwilger/auto_review!21
No description provided.