Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a full "assessment" subsystem: DB models and migration, dataset upload/validation, batch payload builders and submitters for OpenAI/Google, async polling/processing, SSE event broker, export utilities, mappers/crud/service layers, FastAPI routes, and cron/router integration. Changes
Sequence DiagramsequenceDiagram
participant Client
participant API as API Routes
participant Service as Assessment Service
participant DB as Database
participant Provider as Batch Provider
participant Storage as Object Storage
participant Cron as Cron Poller
Client->>API: POST /assessment (dataset, configs)
API->>Service: start_assessment(request)
Service->>DB: create Assessment (manager)
Service->>DB: resolve configs
loop per config
Service->>DB: create AssessmentRun
Service->>Provider: submit_batch(payload)
Provider-->>Service: batch_job_id
Service->>DB: update run -> processing
end
Service-->>Client: AssessmentResponse
Cron->>DB: fetch pending runs
Cron->>Provider: check_batch_status(job_id)
alt batch complete
Provider-->>Cron: results
Cron->>Storage: upload_raw_results()
Cron->>Service: parse_assessment_output
Cron->>DB: update run -> completed
Cron->>API: publish SSE (results_ready)
else batch failed
Cron->>DB: update run -> failed
else in-progress
Cron-->>Cron: keep processing
end
Client->>API: GET /assessment/{id}/results?format=csv|json|xlsx
API->>Service: build_export_response
Service->>Storage: fetch_results()
Service-->>Client: StreamingResponse (file)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (15)
backend/app/assessment/utils/parsing.py-25-26 (1)
25-26:⚠️ Potential issue | 🟡 MinorUse explicit
Nonechecks to avoid masking zero token counts.
usage.get("input_tokens") or usage.get("prompt_tokens")will fall through to the alternate key when the primary key is present but0. While provider payloads with zero input/output tokens are uncommon, the fallthrough also makes the precedence between the two aliases inconsistent (only when truthy). Prefer an explicitNonecheck so the_tokenskey always wins when present.♻️ Proposed fix
- input_tokens = usage.get("input_tokens") or usage.get("prompt_tokens") - output_tokens = usage.get("output_tokens") or usage.get("completion_tokens") + input_tokens = ( + usage.get("input_tokens") + if usage.get("input_tokens") is not None + else usage.get("prompt_tokens") + ) + output_tokens = ( + usage.get("output_tokens") + if usage.get("output_tokens") is not None + else usage.get("completion_tokens") + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/utils/parsing.py` around lines 25 - 26, The current token extraction uses truthy fallback which treats 0 as missing; replace the "or" fallbacks with explicit None checks so that input_tokens prefers usage["input_tokens"] when the key exists (even if 0) and only uses usage["prompt_tokens"] when usage.get("input_tokens") is None, and do the analogous change for output_tokens so usage["output_tokens"] takes precedence over usage["completion_tokens"]; update the two assignments in parsing.py that set input_tokens and output_tokens accordingly (use explicit is not None checks against usage.get(...)).backend/app/models/user.py-81-81 (1)
81-81:⚠️ Potential issue | 🟡 MinorPATCH
/meshould compute and return features like GET/me, or document the inconsistency.The
featuresfield defaults to[]in most endpoints but is populated by flag computation only in GET/me. This creates an inconsistency for PATCH/me(line 97), which returns the user directly without computing features—clients calling PATCH/mewill receivefeatures: []even if the user has feature flags enabled, contradicting the behavior of GET/me.Recommend either:
- Computing features in PATCH
/mebefore returning, or- Documenting that
featuresreflects the current state only after GET/me.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/user.py` at line 81, PATCH `/me` returns the User model with features defaulting to [] because it doesn't run the same flag computation used by GET `/me`; modify the PATCH `/me` handler to compute and populate the features field before returning (reuse the same flag computation logic used by GET `/me`, or extract that logic into a helper like compute_user_features(user) and call it from both GET `/me` and PATCH `/me`) so the returned User.features matches GET `/me`.backend/app/api/permissions.py-92-101 (1)
92-101:⚠️ Potential issue | 🟡 MinorMisleading 403 detail when organization context is absent.
When
org_id is None(caller has no organization on the auth context), the message still reads "is not enabled for this organization", which conflates "feature disabled" with "no organization scope". Consider distinguishing the two for clearer client-side handling.🛡️ Suggested differentiation
- if org_id is None or not is_enabled( - session=session, - flag=flag, - organization_id=org_id, - project_id=project_id, - ): - raise HTTPException( - status_code=403, - detail=f"Feature '{flag.name}' is not enabled for this organization.", - ) + if org_id is None: + raise HTTPException( + status_code=403, + detail=f"Feature '{flag.name}' requires an organization context.", + ) + if not is_enabled( + session=session, + flag=flag, + organization_id=org_id, + project_id=project_id, + ): + raise HTTPException( + status_code=403, + detail=f"Feature '{flag.name}' is not enabled for this organization.", + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/permissions.py` around lines 92 - 101, The current 403 message conflates missing org context and a disabled feature; update the check around org_id and is_enabled in the block containing org_id, is_enabled(...), and the raised HTTPException so that if org_id is None you raise a 403 (or appropriate 400/401 if preferred) with a clear message like "No organization context provided" (referencing org_id), otherwise keep the 403 with detail f"Feature '{flag.name}' is not enabled for this organization." (use session, project_id and flag.name as currently used); ensure the two paths are distinct and use the same HTTPException symbol for consistency.backend/app/assessment/cron.py-144-182 (1)
144-182:⚠️ Potential issue | 🟡 MinorFailure-path itself is unprotected — a single bad cleanup aborts the entire poll.
update_assessment_run_status(commits, may raise on DB error / rollback) andrecompute_assessment_status(raisesValueErrorif the assessment was concurrently deleted) are both invoked inside theexcepthandler with no further protection. If either raises, the exception escapes the per-run loop and skips every remaining assessment in this cron tick.Wrap the cleanup so a failure on one run can't stall the cron:
🛡️ Proposed defensive cleanup
except Exception as e: logger.error( "[poll_all_pending_assessment_evaluations] " f"Failed run {run.id} | " ... exc_info=True, ) - update_assessment_run_status( - session=session, - run=run, - status="failed", - error_message="Processing failed. Check server logs for details.", - ) - refreshed_assessment = recompute_assessment_status( - session=session, assessment_id=assessment.id - ) + try: + update_assessment_run_status( + session=session, + run=run, + status="failed", + error_message="Processing failed. Check server logs for details.", + ) + refreshed_assessment = recompute_assessment_status( + session=session, assessment_id=assessment.id + ) + except Exception as cleanup_err: + logger.error( + "[poll_all_pending_assessment_evaluations] " + f"Cleanup failed for run {run.id} | error={cleanup_err}", + exc_info=True, + ) + failed += 1 + continue failure_result = { ... } all_results.append(failure_result) assessment_event_broker.publish( _build_callback_payload( refreshed_assessment, run, failure_result, ) ) failed += 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/cron.py` around lines 144 - 182, The except handler for run processing must be made defensive so cleanup errors don't abort the whole poll: wrap calls to update_assessment_run_status, recompute_assessment_status, and assessment_event_broker.publish (and any related cleanup logic that may commit/raise) in their own try/except block(s), catch and log any exceptions (including the run.id, run.run_name and assessment_id) with exc_info=True, ensure you still append a failure_result and increment failed for this run even if cleanup fails, and swallow the exception so the per-run loop continues to the next run.backend/app/assessment/models.py-32-36 (1)
32-36:⚠️ Potential issue | 🟡 Minor
id: int = SQLField(default=None, ...)mismatches the type — should beint | None.Both
Assessment.idandAssessmentRun.idare typedintbut defaulted toNone(until the DB assigns one on insert). Pydantic v2 + SQLModel will raise validation errors when constructing instances pre-flush, and static type-checkers will (correctly) flag every place that readsassessment.idassuming non-None when the row is unflushed.🛠️ Proposed fix
- id: int = SQLField( + id: int | None = SQLField( default=None, primary_key=True, sa_column_kwargs={"comment": "Unique identifier for the assessment"}, ) @@ - id: int = SQLField( + id: int | None = SQLField( default=None, primary_key=True, sa_column_kwargs={"comment": "Unique identifier for the assessment run"}, )Also applies to: 144-148
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/models.py` around lines 32 - 36, The id fields are declared as `id: int = SQLField(default=None, ...)` but default None means the type should allow None; update the type annotations of Assessment.id and AssessmentRun.id to be nullable (e.g., `int | None` or `Optional[int]`) to match the SQLField default and avoid Pydantic/typing errors, and then review any call sites that assume non-None to handle the unflushed case or assert/coerce as needed.backend/app/assessment/mappers.py-10-22 (1)
10-22:⚠️ Potential issue | 🟡 MinorType signature doesn't match call sites —
textmay beNone.
normalize_llm_textis declared astext: str -> str, but it's called withkaapi_params.get("instructions")(lines 100, 185), which returnsOptional[str]. The body already handles non-str via theisinstance(text, str)early return, so widening the annotation correctly will keep the code honest and silence type-checkers.🛠️ Proposed fix
-def normalize_llm_text(text: str) -> str: - if not isinstance(text, str) or not text: - return text +def normalize_llm_text(text: str | None) -> str | None: + if not isinstance(text, str) or not text: + return text🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` around lines 10 - 22, The function normalize_llm_text has a too-narrow type signature: change its annotation from def normalize_llm_text(text: str) -> str to accept and return Optional[str] (e.g., def normalize_llm_text(text: Optional[str]) -> Optional[str]) and add the necessary Optional import from typing; this matches call sites like kaapi_params.get("instructions") and the existing early-return behavior that handles non-str/None values.backend/app/api/routes/features.py-26-30 (1)
26-30:⚠️ Potential issue | 🟡 MinorMissing return type annotation on helper.
_parse_flag_or_400returns aFeatureFlagenum value but has no return annotation.As per coding guidelines: "Always add type hints to all function parameters and return values in Python code".
🛠️ Proposed fix
-def _parse_flag_or_400(flag_key: str): +from app.core.feature_flags import FeatureFlag as FeatureFlagEnum + +def _parse_flag_or_400(flag_key: str) -> FeatureFlagEnum: try: return parse_feature_flag(flag_key) except ValueError as exc: raise HTTPException(status_code=400, detail=str(exc)) from exc(Use whatever import path matches your existing
parse_feature_flagreturn type.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/features.py` around lines 26 - 30, _add a return type annotation to the helper _parse_flag_or_400 indicating it returns the FeatureFlag enum returned by parse_feature_flag; update the function signature to include -> FeatureFlag and import FeatureFlag from its module (or use the correct fully-qualified import used elsewhere) so the helper's return type is explicitly annotated and matches parse_feature_flag's return type.backend/app/assessment/batch.py-107-135 (1)
107-135:⚠️ Potential issue | 🟡 MinorWorkbook handle is leaked on exceptions.
openpyxl.load_workbook(...)opens a file handle (especially inread_only=Truemode where it streams). Ifnext(rows_iter, None), the column build, or the row iteration raises,wb.close()is skipped and the underlying ZIP/file handle is leaked until GC. Wrap the body intry/finallyor use a context-manager-style pattern.🛠️ Proposed fix
- wb = openpyxl.load_workbook(io.BytesIO(content), read_only=True, data_only=True) - ws = wb.active - if ws is None: - wb.close() - return [] - - rows_iter = ws.iter_rows(values_only=True) - header = next(rows_iter, None) - if header is None: - wb.close() - return [] - - columns = [str(h) if h is not None else f"col_{i}" for i, h in enumerate(header)] - result = [] - for row in rows_iter: - if row and any(cell is not None for cell in row): - row_dict = { - columns[i]: str(cell) if cell is not None else "" - for i, cell in enumerate(row) - if i < len(columns) - } - result.append(row_dict) - - wb.close() - return result + wb = openpyxl.load_workbook(io.BytesIO(content), read_only=True, data_only=True) + try: + ws = wb.active + if ws is None: + return [] + + rows_iter = ws.iter_rows(values_only=True) + header = next(rows_iter, None) + if header is None: + return [] + + columns = [str(h) if h is not None else f"col_{i}" for i, h in enumerate(header)] + result: list[dict[str, str]] = [] + for row in rows_iter: + if row and any(cell is not None for cell in row): + result.append( + { + columns[i]: str(cell) if cell is not None else "" + for i, cell in enumerate(row) + if i < len(columns) + } + ) + return result + finally: + wb.close()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/batch.py` around lines 107 - 135, The workbook handle opened by openpyxl.load_workbook in _parse_excel_rows can leak if any subsequent operation raises; wrap workbook usage in a try/finally or context-manager pattern so wb.close() always runs: after creating wb, enter a try block that performs ws = wb.active, header = next(rows_iter, ...), column construction and row iteration, and put wb.close() in the finally (or use openpyxl.load_workbook’s context manager if available) to ensure the file/ZIP handle is closed on success or exception.backend/app/assessment/processing.py-37-70 (1)
37-70:⚠️ Potential issue | 🟡 Minor
_sanitize_json_outputdoesn't escape\bor\f.JSON spec (RFC 8259) treats every U+0000–U+001F character as illegal inside string values. The current implementation handles
\n,\r,\t, but model outputs occasionally contain literal\b(0x08) and\f(0x0C) — most often when they're hallucinated. If those slip through, the secondjson.loadswill still fail.Tightening the sanitizer to escape any
ord(ch) < 0x20while inside a string covers all illegal control characters in one shot.🛠️ Proposed fix
- elif in_string and ch == "\n": - result.append("\\n") - elif in_string and ch == "\r": - result.append("\\r") - elif in_string and ch == "\t": - result.append("\\t") - else: - result.append(ch) + elif in_string and ord(ch) < 0x20: + escapes = {"\n": "\\n", "\r": "\\r", "\t": "\\t", + "\b": "\\b", "\f": "\\f"} + result.append(escapes.get(ch, f"\\u{ord(ch):04x}")) + else: + result.append(ch)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/processing.py` around lines 37 - 70, The sanitizer _sanitize_json_output currently only replaces newline, carriage return and tab inside JSON strings and misses other control characters like \b and \f; update the loop so that when inside a string (in_string is True) and the current character is not an escaped character (escape_next is False) you check if ord(ch) < 0x20 and, if so, append a JSON-safe escape (e.g. a specific mapping for '\n','\r','\t','\b','\f' or generically append "\\u{ord(ch):04x}") instead of the raw character; keep the existing handling for backslash (escape_next) and quote toggling so previously escaped characters are preserved.backend/app/crud/feature_flag.py-37-63 (1)
37-63:⚠️ Potential issue | 🟡 MinorConcurrent create can leak
IntegrityError(500) instead of clean conflict.Between
get_feature_flag(...)andsession.commit(), two concurrent requests with the same(key, organization_id, project_id)can both pass the existence check and try to insert. The DB'suq_feature_flag_key_org_projectconstraint correctly rejects the second one, but as an uncaughtIntegrityError, which surfaces as HTTP 500 increate_feature_flag_route(it expectsNoneto translate to 409).Catching the integrity error and returning
None(or re-fetching) keeps behavior consistent with the existingNone-on-conflict contract.🛡️ Proposed fix
+from sqlalchemy.exc import IntegrityError @@ feature_flag = FeatureFlag( key=key, organization_id=organization_id, project_id=project_id, enabled=enabled, ) session.add(feature_flag) - session.commit() - session.refresh(feature_flag) - return feature_flag + try: + session.commit() + except IntegrityError: + session.rollback() + return None + session.refresh(feature_flag) + return feature_flag🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/feature_flag.py` around lines 37 - 63, The create_feature_flag function can raise an uncaught IntegrityError when two requests concurrently pass get_feature_flag and both attempt to insert the same (key, organization_id, project_id); wrap the insert/commit in a try/except catching sqlalchemy.exc.IntegrityError, call session.rollback() in the except, then return None (or optionally re-fetch via get_feature_flag and return that) so the function preserves the contract of returning None on conflict instead of letting a 500 bubble up; update create_feature_flag to import IntegrityError and handle the exception around session.commit()/session.add() accordingly.backend/app/assessment/routes.py-332-336 (1)
332-336:⚠️ Potential issue | 🟡 MinorMissing return type annotation on
event_stream.- async def event_stream(): + async def event_stream() -> AsyncIterator[str]: async for chunk in assessment_event_broker.subscribe(): yield chunk await asyncio.sleep(0)You'll need
from collections.abc import AsyncIterator(orfrom typing import AsyncIteratoron older Pythons).As per coding guidelines: "Always add type hints to all function parameters and return values in Python code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/routes.py` around lines 332 - 336, Add a return type annotation to the async generator by changing the signature of event_stream to specify AsyncIterator[bytes] (e.g. async def event_stream() -> AsyncIterator[bytes]:) and import AsyncIterator from collections.abc at the top of the module; keep the body using assessment_event_broker.subscribe() unchanged so the generator type matches the yielded chunk type.backend/app/assessment/batch.py-395-420 (1)
395-420:⚠️ Potential issue | 🟡 Minor
google_params["reasoning"]is set by the mapper but never consumed here.
map_kaapi_to_google_params(mappers.py line 205–207) writesgoogle_params["reasoning"] = reasoning, but the builder at lines 395–420 only readsinstructions,temperature,top_p,max_output_tokens,thinking_config, andoutput_schema. Thereasoningvalue is silently dropped and never included in the Gemini request.Either map
reasoninginto the appropriate Gemini field (e.g., merge withthinkingConfig) or remove the assignment from the mapper to eliminate dead code.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/batch.py` around lines 395 - 420, The google_params["reasoning"] value produced by map_kaapi_to_google_params is never used in the Gemini request builder; update the builder in batch.py to consume google_params["reasoning"] by merging it into the outgoing generationConfig (e.g., if thinking_config exists, combine or extend it with reasoning and set generation_config["thinkingConfig"] accordingly; if not, set generation_config["thinkingConfig"] = reasoning) so the reasoning intent is sent as part of the request, or alternatively remove the assignment of google_params["reasoning"] in map_kaapi_to_google_params to eliminate dead code.backend/app/assessment/mappers.py-93-106 (1)
93-106:⚠️ Potential issue | 🟡 MinorAdd guard clause for
modelbefore callinglitellm.supports_reasoning().If
kaapi_paramslacks"model", the code callslitellm.supports_reasoning(model="openai/None"), which returnsFalse(or raises an error depending on litellm version). This causes misleading warnings at lines 131–134 that claim the model "does not support reasoning" when the actual problem is that no model was specified.Fix
- support_reasoning = litellm.supports_reasoning(model=f"openai/{model}") + support_reasoning = ( + litellm.supports_reasoning(model=f"openai/{model}") if model else False + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` around lines 93 - 106, The code calls litellm.supports_reasoning with model=f"openai/{model}" even when model is None; add a guard that checks whether model is present in kaapi_params before invoking litellm.supports_reasoning: if model is truthy call support_reasoning = litellm.supports_reasoning(model=f"openai/{model}"), otherwise set support_reasoning = False (and optionally log or raise a clear "model missing" message) so you don't pass "openai/None" into supports_reasoning or produce misleading reasoning-support warnings; update any subsequent logic that uses support_reasoning accordingly (look for the model variable and support_reasoning usage in this function).backend/app/assessment/utils/export.py-395-417 (1)
395-417:⚠️ Potential issue | 🟡 Minor
assessment_id=run.assessment_id or 0uses 0 as a sentinel for runs without a parent assessment, inconsistent with other models.
AssessmentExportRow.assessment_idis typed asint, forcing the conversion ofNoneto0. However,AssessmentRunPublic,AssessmentRunSummary, and the source fieldrun.assessment_idall useint | None. ChangeAssessmentExportRow.assessment_idtoint | Noneto match the source type and existing patterns in the codebase.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/utils/export.py` around lines 395 - 417, AssessmentExportRow.assessment_id is currently coerced to 0 in the export (assessment_id=run.assessment_id or 0) and typed as int; change the AssessmentExportRow dataclass/typing to accept int | None for assessment_id and stop converting None to 0 by passing run.assessment_id directly in the export_rows append. Update the type annotation (AssessmentExportRow.assessment_id: int | None), any related imports/typing declarations, and adjust any callers that assumed 0 to handle None where necessary (references: AssessmentExportRow and the export loop building export_rows using run.assessment_id).backend/app/assessment/crud.py-275-292 (1)
275-292:⚠️ Potential issue | 🟡 MinorStatus taxonomy is inconsistent; "in_progress" is checked but never set, causing silent data loss.
processing_runsaccepts both"processing"and"in_progress", while the other counters and the model documentation accept single literals. The documented statuses are only "pending", "processing", "completed", "failed" — yet the aggregator checks for a fifth value that is never assigned anywhere in the codebase.If any run has a status outside the expected set (typo, legacy value, or other unexpected value), it is excluded from all four counters but still contributes to
total_runs = len(runs). When_determine_assessment_statusreceives mismatched totals, it falls through to"processing", silently masking the inconsistency.Recommend (a) consolidating on a single canonical set of statuses (StrEnum) enforced at assignment time, and (b) asserting when a run's status falls outside that set.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/crud.py` around lines 275 - 292, The aggregator is mixing an unexpected "in_progress" literal into processing_runs and not counting unexpected statuses, causing total_runs mismatch and silent masking in _determine_assessment_status; fix by consolidating to the canonical status set (pending, processing, completed, failed) and stop checking "in_progress" in the processing_runs calculation (only check r.status == "processing"), and add a validation/assertion step when iterating runs (e.g., validate_run_status or inline check inside the block that computes pending_runs/processing_runs/completed_runs/failed_runs) that logs or raises if r.status is not in the canonical set so invalid/legacy values are caught at assignment time rather than ignored. Ensure assessment.total_runs still equals len(runs) and that the counters sum to total_runs unless a validation error is raised; update references to processing_runs and _determine_assessment_status accordingly.
🧹 Nitpick comments (16)
backend/app/assessment/utils/parsing.py (1)
13-17: Consider hardening JSON parsing against malformed payloads.
json.loadson the array branch and per-line in the JSONL branch will raiseJSONDecodeErrorstraight to callers. If batch outputs from providers occasionally contain a malformed line, the entire result load fails. Depending on caller expectations, you may want to wrap line parsing with a try/except + logger warning so a single bad record doesn’t poison the whole run. Skip if upstream callers are already expected to handle this.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/utils/parsing.py` around lines 13 - 17, The JSON parsing currently calls json.loads directly on content (array branch using parsed) and per-line (JSONL branch), which will raise JSONDecodeError for malformed payloads; wrap the top-level json.loads(content) in a try/except JSONDecodeError and log a warning then return an empty list (or appropriate fallback), and for the JSONL branch catch JSONDecodeError around each json.loads(line) so you log the bad line and skip it instead of letting one bad record fail the whole parse; use the existing logger (or add one) to emit warnings that include the offending line and the exception.backend/app/assessment/validators.py (1)
53-67: Simplify by reading once and checkinglen(content).The
file.file.seek/tell/seekdance reaches into the underlyingSpooledTemporaryFilesynchronously, which works but bypasses Starlette's asyncUploadFileAPI and is brittle if the upload backend changes. Reading first and validating size on the bytes avoids the seek-around entirely and is also cheaper sinceread()is being called anyway. The 10 MB upper bound provides a natural memory cap.- file.file.seek(0, 2) - file_size = file.file.tell() - file.file.seek(0) - - if file_size > MAX_FILE_SIZE: + content = await file.read() + file_size = len(content) + + if file_size > MAX_FILE_SIZE: raise HTTPException( status_code=413, detail=f"File too large. Maximum size: {MAX_FILE_SIZE / (1024 * 1024):.0f}MB", ) if file_size == 0: raise HTTPException(status_code=422, detail="Empty file uploaded") - content = await file.read() return content, file_extNote: if you ever need to enforce the limit before reading (to prevent OOM on very large uploads), prefer streaming chunked reads with a running counter rather than the seek/tell approach — but for a 10 MB cap this is fine.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/validators.py` around lines 53 - 67, Replace the synchronous file.file.seek/tell/seek usage with a single async read: call await file.read() once (using UploadFile.read()), then use len(content) to enforce MAX_FILE_SIZE and detect empty uploads instead of file_size; remove references to file.file.seek/tell and keep the existing HTTPException behavior and return (content, file_ext) from the same validator function.backend/app/main.py (1)
70-75: Add a return type annotation tolifespan.Per coding guidelines for
**/*.py, all function parameters and return values should have type hints. Theapp: FastAPIparameter is annotated, but the return is not. Ruff'sARG001warning about unusedappis a false positive — FastAPI's lifespan protocol requires this signature.-@asynccontextmanager -async def lifespan(app: FastAPI): - """App startup / shutdown lifecycle.""" +@asynccontextmanager +async def lifespan(app: FastAPI) -> AsyncIterator[None]: + """App startup / shutdown lifecycle."""(Add
from collections.abc import AsyncIteratorto imports.) As per coding guidelines: "Always add type hints to all function parameters and return values in Python code".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/main.py` around lines 70 - 75, The lifespan function lacks a return type hint; update its signature to annotate the return as an async iterator (e.g., AsyncIterator[None]) and add the required import from collections.abc (import AsyncIterator) so the signature becomes async def lifespan(app: FastAPI) -> AsyncIterator[None]:; keep the existing app parameter annotation (to avoid Ruff false-positive) and no other behavior changes.backend/app/models/__init__.py (1)
100-100: Consider movingAssessment/AssessmentRunre-exports to a dedicated location or removing them entirely.The import of
AssessmentandAssessmentRunfromapp.assessment.modelsat line 100 is currently safe (no circular import exists), but consolidating external model re-exports in a central__init__can create tight coupling if the assessment subpackage ever needs to import fromapp.modelsfor FK relationships. A cleaner approach:
- Remove this re-export and have callers
from app.assessment.models import Assessment, AssessmentRundirectly (preferred — keeps subpackage models self-contained).- If table discovery is required for Alembic, register models in your
env.py/metadata module instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/__init__.py` at line 100, The __init__.py currently re-exports Assessment and AssessmentRun (from app.assessment.models) which can create tight coupling; remove the import line exporting Assessment and AssessmentRun from app.assessment.models in app.models.__init__ and update callers to import directly from app.assessment.models instead (or, if you need automatic table discovery for Alembic, register those models in your Alembic env.py/metadata module rather than re-exporting them here); ensure no remaining references rely on the central re-export and run tests/migrations to verify imports still resolve.backend/app/models/feature_flag.py (1)
61-78:FeatureFlagCreateandFeatureFlagUpdateare identical — collapse them.They define the exact same fields and semantics; either alias or have one inherit from the other so that future changes to one don't silently diverge.
♻️ Proposed simplification
class FeatureFlagCreate(SQLModel): key: str = Field(min_length=1, max_length=128) organization_id: int project_id: int | None = None enabled: bool -class FeatureFlagUpdate(SQLModel): - key: str = Field(min_length=1, max_length=128) - organization_id: int - project_id: int | None = None - enabled: bool +class FeatureFlagUpdate(FeatureFlagCreate): + """Same payload shape as create; kept distinct for OpenAPI clarity."""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/feature_flag.py` around lines 61 - 78, FeatureFlagCreate and FeatureFlagUpdate are identical; collapse them by introducing a single shared model (e.g., FeatureFlagBase or keep FeatureFlagCreate as canonical) and have the other name inherit or alias it to avoid duplication—update references to use the unified model (FeatureFlagCreate or FeatureFlagUpdate) and leave FeatureFlagDelete unchanged; ensure the shared model preserves the existing Field validations (key, organization_id, project_id, enabled) so future changes apply in one place.backend/app/assessment/events.py (1)
25-26: Use builtinTimeoutError(Ruff UP041).In Python 3.11+,
asyncio.TimeoutErroris an alias of the builtinTimeoutError. Per the project's Python 3.11+ stance and the static analysis hint, prefer the builtin.- except asyncio.TimeoutError: + except TimeoutError: yield ": keep-alive\n\n" continueAs per coding guidelines: "Use Python 3.11+ with type hints throughout the codebase".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/events.py` around lines 25 - 26, Replace the except that catches asyncio.TimeoutError with the builtin TimeoutError in the async block where you await queue.get() (the code that sets payload = await asyncio.wait_for(queue.get(), timeout=15)); locate the try/except around that await in events.py and change the exception class to TimeoutError to follow Python 3.11+ conventions and satisfy the Ruff UP041 lint.backend/app/core/feature_flags/__init__.py (1)
22-23: Misleading "unused" discard.
organization_idandproject_idare actually used below — onlyuser_idis unused. The_ = (organization_id, project_id, user_id)line falsely signals all three are unused. If the intent is to silence linters foruser_id, prefer renaming the parameter.♻️ Proposed cleanup
def is_enabled( session: Session, flag: FeatureFlag, organization_id: int, project_id: int | None = None, - user_id: int | None = None, + user_id: int | None = None, # noqa: ARG001 — reserved for future user-scoped flags ) -> bool: """Check whether *flag* is enabled for the given scope.""" - _ = (organization_id, project_id, user_id) resolved_flag = get_feature_flag_enabled(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/core/feature_flags/__init__.py` around lines 22 - 23, The tuple assignment "_ = (organization_id, project_id, user_id)" falsely marks all three parameters as unused; instead keep real usage of organization_id and project_id and only silence the unused user_id by renaming it (e.g. _user_id) or explicitly assigning _ = user_id, and remove the tuple assignment; update the function signature or the single-use discard so that organization_id and project_id are no longer masked as unused while only user_id is ignored (refer to the parameters organization_id, project_id, user_id and the surrounding check/flag function in __init__.py).backend/app/assessment/service.py (2)
57-62: RedundantUUID(str(...))round-trip.
run.config_idis already typed asUUID | Noneon the model; converting tostrand back toUUIDis a no-op (and would crash onNone, but you already validate non-None on lines 52–56).- configs.append( - AssessmentConfigRef( - config_id=UUID(str(run.config_id)), - config_version=run.config_version, - ) - ) + configs.append( + AssessmentConfigRef( + config_id=run.config_id, + config_version=run.config_version, + ) + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/service.py` around lines 57 - 62, The code unnecessarily wraps run.config_id in UUID(str(...)) before constructing AssessmentConfigRef; since run.config_id is already a UUID (and you validate it's non-None earlier in the block around lines 52–56), replace UUID(str(run.config_id)) with run.config_id when creating AssessmentConfigRef to avoid the redundant conversion and potential crash on None.
144-195: Redundantrecompute_assessment_statuscalls inside the per-run loop.
recompute_assessment_statusis invoked after each successful submit (line 178), after each failed submit (line 191), and again once at the end (line 195). For an assessment with N configs, that'sN + 1full re-aggregations of the same rows; only the final one is observable to clients (the others are immediately overwritten).Move the call out of the loop:
♻️ Proposed simplification
run = update_assessment_run_status( session=session, run=run, status="processing", batch_job_id=batch_job.id, total_items=batch_job.total_items, ) - recompute_assessment_status(session=session, assessment_id=assessment.id) except Exception as e: ... run = update_assessment_run_status( session=session, run=run, status="failed", error_message="Batch submission failed. Please try again or contact support.", ) - recompute_assessment_status(session=session, assessment_id=assessment.id) runs.append(run) recompute_assessment_status(session=session, assessment_id=assessment.id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/service.py` around lines 144 - 195, The code currently calls recompute_assessment_status() inside the per-config loop (after successful submit and in the except block) causing redundant re-aggregations; remove the two in-loop calls to recompute_assessment_status (the ones immediately after update_assessment_run_status in both the try and except paths around submit_assessment_batch and update_assessment_run_status) and keep only the single recompute_assessment_status(session=session, assessment_id=assessment.id) that runs once after the loop finishes so aggregation runs once for the final state.backend/app/api/routes/features.py (1)
33-170: Endpoints don't load Swagger descriptions from external markdown.None of these new endpoints (
/features,/feature-flagsGET/POST/PATCH/DELETE) supplydescription=load_description("features/<action>.md"). The assessment dataset routes in this same PR (e.g.,backend/app/assessment/routes.py:93) already follow this pattern.As per coding guidelines: "Load Swagger endpoint descriptions from external markdown files instead of inline strings using
load_description("domain/action.md")".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/features.py` around lines 33 - 170, The router endpoints get_features, get_feature_flags, create_feature_flag_route, patch_feature_flag, and remove_feature_flag are missing Swagger descriptions loaded from external Markdown; update each `@router`.<method> decorator to include description=load_description("features/<action>.md") (use names like "features/get.md", "features/list.md" or "features/create.md"/"patch.md"/"delete.md" matching your docs convention) so the OpenAPI docs pull content from external files rather than inline strings; ensure you import or have access to load_description and use the exact function names (get_features, get_feature_flags, create_feature_flag_route, patch_feature_flag, remove_feature_flag) to locate the decorators to modify.backend/app/assessment/processing.py (1)
253-253: Log prefix uses run id rather than function name.
log_prefix = f"[assessment_run={run.id}]"is useful context, but the repo convention is to lead with the function name. Combining both keeps the run-scoped context while satisfying the guideline.- log_prefix = f"[assessment_run={run.id}]" + log_prefix = f"[check_and_process_assessment][run={run.id}]"As per coding guidelines: "Prefix all log messages with the function name in square brackets:
logger.info(f"[function_name] Message ...")".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/processing.py` at line 253, Update the log_prefix assignment so it begins with the current function name in square brackets followed by the existing run-scoped context (i.e., include both the function name and assessment run id); change the log_prefix variable (currently set as log_prefix = f"[assessment_run={run.id}]") to the repo-standard pattern like "[<function_name>][assessment_run=<run.id>]" (use the actual function name where this line lives rather than the literal "<function_name>") so all subsequent logger calls follow the required convention.backend/app/assessment/routes.py (1)
348-436: RepetitiveAssessmentPublic(...)/AssessmentRunPublic(...)construction.
list_assessment_managers,get_assessment_manager,list_evaluations, andget_evaluationeach manually copy 14–15 fields one by one. Since bothAssessmentPublicandAssessmentRunPublicare pure Pydantic projections of fields already on the SQLModel rows, you can replace these blocks withAssessmentPublic.model_validate(a, from_attributes=True)(Pydantic v2) — or setmodel_config = ConfigDict(from_attributes=True)on the public schemas and pass the row directly.This shrinks four endpoints to one-liners and removes the risk of drift when fields are added.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/routes.py` around lines 348 - 436, The endpoints list_assessment_managers, get_assessment_manager (and the analogous list_evaluations/get_evaluation) repetitively map fields into AssessmentPublic/AssessmentRunPublic; replace the manual field-by-field construction by either enabling attribute parsing on the Pydantic models (set model_config = ConfigDict(from_attributes=True) on AssessmentPublic and AssessmentRunPublic) and pass the SQLModel row directly, or call AssessmentPublic.model_validate(a, from_attributes=True) (and AssessmentRunPublic.model_validate(...)) when building the APIResponse; update imports if needed and replace each block in list_assessment_managers, get_assessment_manager, list_evaluations, and get_evaluation with the single model_validate (or direct model instantiation) call to return the Pydantic projection.backend/app/assessment/crud.py (2)
196-227: Constrainstatusto known values.
status: strallows any string to be persisted toAssessmentRun.status, which couples directly into the aggregation inrecompute_assessment_status(see comment above). Tightening to aLiteral["pending", "processing", "completed", "failed", ...](or the same StrEnum used elsewhere) gives compile-time safety and prevents drift between writers and the aggregator.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/crud.py` around lines 196 - 227, The update_assessment_run_status function accepts status: str which lets arbitrary values be persisted to AssessmentRun.status and can drift from the aggregator logic in recompute_assessment_status; change the status parameter to the project-wide enum type (or a typing.Literal of the allowed values) used elsewhere (the same StrEnum/StatusEnum used for assessment runs) and update any call sites to pass that enum/value so only known statuses are stored and compile-time/type-checking will prevent drift; ensure type imports and annotations are updated where update_assessment_run_status is defined and referenced.
1-12: Place CRUD operations inbackend/app/crud/per project convention.This new module is a pure data-access layer for the
Assessment/AssessmentRuntables but lives atbackend/app/assessment/crud.pyinstead of the centralbackend/app/crud/package used elsewhere. Consider moving it (e.g. tobackend/app/crud/assessment.py) so future contributors can locate DB code consistently and so it composes with the existing CRUD test/fixtures conventions.Based on learnings: "Use CRUD pattern for database access operations located in
backend/app/crud/."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/crud.py` around lines 1 - 12, The CRUD code for Assessment and AssessmentRun lives in app.assessment.crud but should be placed in the central CRUD package; move the module into the project's crud package (e.g., create app.crud.assessment) and update any imports to reference app.crud.assessment instead of app.assessment.crud, keeping the same symbols (Assessment, AssessmentRun, logger, and any CRUD functions/classes) and ensure the new module exports the same public API so existing callers continue to work.backend/app/assessment/utils/export.py (2)
160-171: Hard-codedbase_fields.index("result_status")will raiseValueErrorif the field is renamed.If
AssessmentExportRowis later refactored to drop or renameresult_status, every CSV/XLSX export call crashes with an opaqueValueError. Consider guarding it (fall back tolen(base_fields)) or anchor the column ordering on a stable constant declared in this module so the dependency is explicit.♻️ Suggested guard
- output_idx = base_fields.index("result_status") + 1 # after result_status + try: + output_idx = base_fields.index("result_status") + 1 # place output_keys after result_status + except ValueError: + output_idx = len(base_fields)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/utils/export.py` around lines 160 - 171, The code uses base_fields.index("result_status") which will raise ValueError if that field is renamed; update the logic in the block that computes output_idx/fieldnames (referencing base_fields, output_idx, fieldnames, has_unparsed_output) to guard the lookup: either define and use a stable constant (e.g., RESULT_STATUS_FIELD) and check membership before calling index, or wrap the index call in a try/except and fall back to len(base_fields) (or another sensible insertion point) so exports don't crash; ensure the subsequent insertion of "output_raw" uses the guarded output_idx when computing the insert position.
200-228: Pin the Excel engine explicitly to"openpyxl".When
pd.ExcelWriter(buf)is called with aBytesIOobject, pandas infers the file format as .xlsx and selects an engine. Without an explicit engine parameter, pandas prefers xlsxwriter if installed, otherwise falls back to openpyxl. Since the codebase only declaresopenpyxl>=3.1.0as a dependency and the error message explicitly references openpyxl support, the engine should be pinned to ensure deterministic behavior that matches the stated runtime requirement.♻️ Suggested change
- with pd.ExcelWriter(buf) as writer: + with pd.ExcelWriter(buf, engine="openpyxl") as writer: data_frame.to_excel(writer, index=False, sheet_name="results")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/utils/export.py` around lines 200 - 228, The Excel writer call should explicitly pin the engine to openpyxl to ensure deterministic .xlsx output; update the pd.ExcelWriter usage (the block that creates buf, data_frame = pd.DataFrame(expanded, columns=excel_fields) and the with pd.ExcelWriter(buf) as writer: ... block) to pass engine="openpyxl" so pandas uses openpyxl rather than xlsxwriter or another engine, keeping the rest of the export flow (_drop_empty_columns, expanded, excel_fields, data_frame.to_excel(...)) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 091c4f08-e02a-48f6-b5b8-7905058d0c10
⛔ Files ignored due to path filters (1)
backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (34)
.gitignorebackend/app/alembic/versions/050_add_assessment_manager_table.pybackend/app/alembic/versions/055_create_feature_flag_table.pybackend/app/api/main.pybackend/app/api/permissions.pybackend/app/api/routes/cron.pybackend/app/api/routes/features.pybackend/app/api/routes/users.pybackend/app/assessment/__init__.pybackend/app/assessment/batch.pybackend/app/assessment/cron.pybackend/app/assessment/crud.pybackend/app/assessment/dataset.pybackend/app/assessment/events.pybackend/app/assessment/mappers.pybackend/app/assessment/models.pybackend/app/assessment/processing.pybackend/app/assessment/routes.pybackend/app/assessment/service.pybackend/app/assessment/utils/__init__.pybackend/app/assessment/utils/export.pybackend/app/assessment/utils/parsing.pybackend/app/assessment/validators.pybackend/app/core/feature_flags/__init__.pybackend/app/core/feature_flags/constants.pybackend/app/crud/feature_flag.pybackend/app/main.pybackend/app/models/__init__.pybackend/app/models/batch_job.pybackend/app/models/feature_flag.pybackend/app/models/llm/request.pybackend/app/models/user.pybackend/app/tests/api/test_permissions.pybackend/pyproject.toml
- Added `service.py` to handle assessment evaluation orchestration, including starting and retrying assessments. - Introduced utility functions for exporting assessment results in various formats (CSV, XLSX, JSON) in `export.py`. - Created parsing utilities for handling batch results in `parsing.py`. - Implemented validation for dataset file uploads in `validators.py`. - Updated batch job model to include assessment type. - Enhanced evaluation run model to reference parent assessments. - Added new fields for reasoning effort and summary preferences in LLM request model.
b0f5463 to
bce0944
Compare
…n model - Introduced AssessmentRun model to replace EvaluationRun for assessment-specific runs. - Updated CRUD operations to manage Assessment and AssessmentRun entities. - Refactored processing logic to accommodate new AssessmentRun structure. - Adjusted logging and error handling to reflect changes in run management. - Modified routes and services to utilize AssessmentRun for assessment evaluations. - Enhanced export functionality to support new AssessmentRun model. - Removed EvaluationRun references where applicable and ensured backward compatibility.
There was a problem hiding this comment.
Actionable comments posted: 10
♻️ Duplicate comments (3)
backend/app/assessment/dataset.py (2)
78-101:⚠️ Potential issue | 🟠 MajorResource leak:
wb.close()is not in afinallyblock.If
iter_rows/nextor the count comprehension raises (corrupt xlsx, unexpected types), control jumps toexceptat line 99 andwb.close()is never reached, leaking the underlying zip file handle. Move cleanup intofinally(or usecontextlib.closing).🛡️ Proposed fix
def _count_excel_rows(content: bytes) -> int: """Count data rows in an Excel file (excluding header).""" + wb = None try: import openpyxl wb = openpyxl.load_workbook(io.BytesIO(content), read_only=True, data_only=True) ws = wb.active if ws is None: return 0 rows_iter = ws.iter_rows(values_only=True) header = next(rows_iter, None) if header is None: - wb.close() return 0 - count = sum( + return sum( 1 for row in rows_iter if row and any(cell is not None for cell in row) ) - wb.close() - return count except Exception as e: logger.warning(f"[_count_excel_rows] Failed to count rows | {e}") return 0 + finally: + if wb is not None: + try: + wb.close() + except Exception: + pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/dataset.py` around lines 78 - 101, The _count_excel_rows function can leak the workbook handle because wb.close() is only called on normal return; ensure the workbook is always closed by moving wb.close() into a finally block (or use contextlib.closing or openpyxl.Workbook context manager) so that after creating wb = openpyxl.load_workbook(...) and before returning from any path (including exceptions) the workbook is closed; update the logic around ws, header, and the count comprehension to use this guaranteed cleanup and remove duplicate close calls inside the try block.
140-163:⚠️ Potential issue | 🟠 MajorSilent partial failure: dataset row persisted even when object-store upload fails.
_upload_file_to_object_storeswallows exceptions and returnsNone, after whichcreate_evaluation_datasetis invoked unconditionally — leaving an "EvaluationDataset" record whoseobject_store_urlisNone. Downstream batch processing (backend/app/assessment/batch.py::_load_dataset_rowsraisesValueError(f"Dataset {dataset.id} has no object_store_url")) will then fail mysteriously, and the caller ofupload_datasethas no signal of the failure.Fail loudly on upload failure rather than persisting an orphan metadata row.
🛡️ Proposed behavior
object_store_url = _upload_file_to_object_store( session=session, project_id=project_id, file_content=file_content, file_ext=file_ext, dataset_name=dataset_name, ) + if object_store_url is None: + raise HTTPException( + status_code=502, + detail="Failed to upload dataset to object storage. Please retry.", + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/dataset.py` around lines 140 - 163, The code currently calls _upload_file_to_object_store and then unconditionally creates the DB row via create_evaluation_dataset even when the upload fails or returns None; change upload_dataset so it treats upload failures as fatal: call _upload_file_to_object_store inside a try/except, propagate or raise a clear exception when the upload raises or when the returned object_store_url is falsy/None, and only call create_evaluation_dataset when object_store_url is a valid non-empty string; reference _upload_file_to_object_store and create_evaluation_dataset to locate the change and ensure no EvaluationDataset is persisted if the upload did not succeed.backend/app/assessment/events.py (1)
13-48:⚠️ Potential issue | 🟠 MajorUnbounded subscriber queues — slow/idle SSE clients can OOM the process.
asyncio.Queue()defaults to unlimited size, andput_nowaithere never blocks/drops. A subscriber that connects but stops draining (e.g., its TCP send buffer is backed up) will accumulate every published assessment event indefinitely. Under cron-driven publish bursts across many subscribers this is a real memory-pressure / reliability issue.Use a bounded queue with explicit overflow handling (drop oldest, or evict the slow subscriber).
🛡️ Proposed bounded queue with overflow drop
+_QUEUE_MAXSIZE = 256 + class AssessmentEventBroker: def __init__(self) -> None: self._subscribers: set[asyncio.Queue[dict]] = set() async def subscribe(self) -> AsyncIterator[str]: - queue: asyncio.Queue[dict] = asyncio.Queue() + queue: asyncio.Queue[dict] = asyncio.Queue(maxsize=_QUEUE_MAXSIZE) self._subscribers.add(queue) @@ for queue in list(self._subscribers): - queue.put_nowait(payload) + try: + queue.put_nowait(payload) + except asyncio.QueueFull: + logger.warning( + "[publish] Subscriber queue full, dropping oldest | type=%s", + payload.get("type"), + ) + try: + queue.get_nowait() + queue.put_nowait(payload) + except (asyncio.QueueEmpty, asyncio.QueueFull): + pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/events.py` around lines 13 - 48, The subscriber queues are unbounded (asyncio.Queue()) so slow/idle SSE clients can cause OOM; fix by creating bounded queues in subscribe (e.g., queue: asyncio.Queue[dict] = asyncio.Queue(maxsize=XX)) and update publish to handle overflow: if queue.put_nowait(payload) raises asyncio.QueueFull, either drop the oldest item from that queue (queue.get_nowait() then retry put_nowait) or treat the subscriber as dead by discarding the queue from self._subscribers and logging/removing it; change symbols: modify subscribe (queue creation) and publish (the for loop over self._subscribers and use of queue.put_nowait) to implement the chosen overflow strategy and add appropriate debug/info logs.
🧹 Nitpick comments (18)
backend/app/api/routes/cron.py (2)
79-95: Be defensive about the shape ofresultbefore mutating it.If
process_all_pending_evaluationsever returns something other than a dict (e.g.,Noneafter an internal error path),result["assessment"] = ...will raise aTypeErroroutside the innertry/except, and the original error context will be lost. A quickisinstance(result, dict)guard (or initializingresult = result or {}) keeps the merge robust.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/cron.py` around lines 79 - 95, The code mutates result assuming it's a dict which can raise a TypeError if process_all_pending_evaluations (or the variable result) is None or not a dict; guard and normalize it before merging by checking isinstance(result, dict) (or using result = result or {}) and then safely set result["assessment"] = assessment_result and update totals from assessment_result; reference the variables result and assessment_result and the surrounding merge block so the normalization happens immediately before the lines that assign result["assessment"] and update total_processed/total_failed/total_still_processing.
71-77: Lazy import inside the request handler — surface the reason or hoist it.The runtime
from app.assessment.cron import poll_all_pending_assessment_evaluationsinside the handler is typically a circular-import workaround. If that's the case, leave a brief comment explaining why; otherwise hoist the import to module scope so it's loaded once at startup rather than on every cron invocation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/cron.py` around lines 71 - 77, The inline runtime import of poll_all_pending_assessment_evaluations inside the cron request handler should be resolved: either hoist the import to module scope so the function is imported once at startup (remove the in-handler from app.assessment.cron import ... and add it at top-level), or if this was done to avoid a circular import, add a concise comment above the inline import explaining the circular-import reason and why the import must remain lazy. Ensure you keep the existing call to poll_all_pending_assessment_evaluations(session=session) unchanged and reference that symbol in your change.backend/app/assessment/cron.py (1)
73-76: Stray adjacent string concatenation produces an awkward log line.
"[poll_all_pending_assessment_evaluations] " "No active assessments found"(and the same pattern on line 75) creates a single string with two spaces between the prefix and the message. Drop the empty piece for a cleaner log.♻️ Proposed change
- logger.info( - "[poll_all_pending_assessment_evaluations] " "No active assessments found" - ) + logger.info( + "[poll_all_pending_assessment_evaluations] No active assessments found" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/cron.py` around lines 73 - 76, The log call in poll_all_pending_assessment_evaluations uses two adjacent string literals ("[poll_all_pending_assessment_evaluations] " "No active assessments found") which concatenates into a string with double space; update the logger.info invocation in cron.py to use a single combined string (e.g., "[poll_all_pending_assessment_evaluations] No active assessments found") or an f-string so the prefix and message are one contiguous string with a single space, ensuring the message reads cleanly.backend/app/assessment/events.py (1)
26-26: Use the builtinTimeoutErroralias.Per Ruff UP041,
asyncio.TimeoutErroris a deprecated alias of the builtinTimeoutError(since Python 3.11). Since this repo targets 3.11+, use the builtin directly.♻️ Proposed change
- except asyncio.TimeoutError: + except TimeoutError: yield ": keep-alive\n\n" continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/events.py` at line 26, Replace the deprecated alias asyncio.TimeoutError with the builtin TimeoutError in the exception handler inside backend.app.assessment.events (the except block currently using asyncio.TimeoutError); update the except clause in the function/method where that try/except appears (look for the "except asyncio.TimeoutError:" line) to use "except TimeoutError:" so it relies on the built-in exception instead of the asyncio alias.backend/app/alembic/versions/055_add_assessment_manager_table.py (1)
121-132: Consider DB-levelserver_defaultforinserted_at/updated_at.
inserted_atandupdated_atare NOT NULL but have noserver_default. The model relies on SQLModel'sdefault_factory=now, so any direct SQLINSERT(e.g., from psql, a fixture script, or an Alembic data migration) without these columns set will fail. Addingserver_default=sa.func.now()keeps both ORM and raw inserts safe.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/alembic/versions/055_add_assessment_manager_table.py` around lines 121 - 132, The migration defines NOT NULL columns inserted_at and updated_at without DB defaults, which breaks raw INSERTs; update the Column definitions for "inserted_at" and "updated_at" to include a database server default (server_default=sa.func.now()) and add server_onupdate=sa.func.now() for "updated_at" so the DB will populate timestamps for direct SQL inserts and automatic updates from the database side.backend/app/assessment/service.py (4)
10-10: Importing a private symbol (_resolve_config) across modules.The leading underscore signals
_resolve_configisbatch.py's implementation detail; importing it fromservice.pycouples the two modules to that internal contract. Either rename it (e.g.,resolve_config) and treat it as a real public helper, or move it to a sharedassessment._utils/assessment.configmodule so both call sites depend on a stable name.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/service.py` at line 10, service.py is importing the private function _resolve_config from batch.py which couples modules to an internal symbol; change this by promoting the helper to a public API (rename _resolve_config to resolve_config) or move it into a shared module (e.g., assessment.config or assessment_utils) and update both callers to import the new public name, then remove the underscore import from batch.py and service.py and ensure submit_assessment_batch and any other callers reference the new resolve_config location.
145-193:start_assessmentdoes multi-row creates/updates without an explicit transactional boundary — partial state on failure.Each
create_assessment,create_assessment_run, andupdate_assessment_run_statuscommits on its own (per the CRUD helpers). If something between iterations raises an unhandled error (e.g., a network blip duringsubmit_assessment_batchthat escapes thetry, or a DB error duringrecompute_assessment_status), the parentAssessmentplus a partial set ofAssessmentRunrows will be persisted with the user receiving a 500. The cron will then try to process orphanedpending/processingrows.Consider either (a) wrapping the orchestration in an explicit transaction with
session.begin()plus a single final commit, or (b) ensuring the failure path here marks the whole parent assessment asfailedwith a clearerror_messageso cron + the UI can reason about it. The current per-iterationtry/exceptonly handlessubmit_assessment_batchfailures, not the surrounding CRUD ops.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/service.py` around lines 145 - 193, The start_assessment flow performs multiple DB creates/updates (create_assessment, create_assessment_run, update_assessment_run_status, recompute_assessment_status) without an explicit transaction, allowing partial commits if an error occurs (e.g., during submit_assessment_batch or recompute_assessment_status); wrap the multi-step orchestration in a single transactional boundary (use session.begin() / session.begin_nested() as appropriate) so all create/update calls are atomic and commit once at the end, or alternatively, on any uncaught exception ensure you mark the parent Assessment as failed (update_assessment_run_status + update parent assessment status and error_message via the same session) before re-raising/returning to avoid leaving orphaned pending/processing runs; locate these changes in start_assessment and the exception handling around submit_assessment_batch/recompute_assessment_status to implement the chosen approach.
159-195: Per-iterationrecompute_assessment_statusis redundant and triples the DB writes on the parent row.Each loop iteration calls
recompute_assessment_statuson the success path (line 178) or the failure path (line 191), then there's a final one at line 195. For an N-config assessment, that's N+1 recomputes (each does aSELECTover runs + anUPDATE/COMMITon the parent). The final one alone is sufficient since the parent's status only needs to be correct after the loop completes — none of the per-iteration values are read between iterations.Drop the inner calls and rely on line 195.
♻️ Proposed change
run = update_assessment_run_status( session=session, run=run, status="processing", batch_job_id=batch_job.id, total_items=batch_job.total_items, ) - recompute_assessment_status(session=session, assessment_id=assessment.id) except Exception as e: ... run = update_assessment_run_status( session=session, run=run, status="failed", error_message="Batch submission failed. Please try again or contact support.", ) - recompute_assessment_status(session=session, assessment_id=assessment.id) runs.append(run) recompute_assessment_status(session=session, assessment_id=assessment.id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/service.py` around lines 159 - 195, The loop currently calls recompute_assessment_status inside both the try (after update_assessment_run_status) and the except blocks, causing redundant DB writes; remove the per-iteration calls to recompute_assessment_status within the try/except and keep only the final recompute_assessment_status after the loop completes so the parent's status is updated once; locate the calls around submit_assessment_batch, update_assessment_run_status, and the except block handling the exception for run processing and delete those inner recompute_assessment_status invocations while leaving the final recompute_assessment_status(session=session, assessment_id=assessment.id) at the end.
51-62:UUID(str(run.config_id))is a redundant round-trip.
AssessmentRun.config_idis already typed asUUID | Noneper the model, and theif not run.config_idguard above ensures it's aUUIDhere.UUID(str(uuid_obj))is a no-op conversion.♻️ Proposed change
configs.append( AssessmentConfigRef( - config_id=UUID(str(run.config_id)), + config_id=run.config_id, config_version=run.config_version, ) )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/service.py` around lines 51 - 62, The code is redundantly wrapping an already-UUID config_id with UUID(str(...)) in the loop building AssessmentConfigRef; change the AssessmentConfigRef call to use the existing UUID directly (e.g. config_id=run.config_id) and if the type checker complains, assert or cast the non-None type before use (e.g. assert run.config_id is not None then use run.config_id) so you remove the unnecessary UUID(str(run.config_id)) round-trip; update the instantiation in the loop that creates AssessmentConfigRef accordingly.backend/app/assessment/mappers.py (3)
10-22: Type hint inaccuracy:normalize_llm_textcan returnNone.The signature is
(text: str) -> str, but lines 11–12 returntextunchanged whentextis falsy or non-str— so aNoneinput yieldsNone. Callers at lines 100 and 185 passkaapi_params.get("instructions")(possiblyNone) and rely on this. Tighten the annotation.♻️ Proposed change
-def normalize_llm_text(text: str) -> str: - if not isinstance(text, str) or not text: +def normalize_llm_text(text: str | None) -> str | None: + if not isinstance(text, str) or not text: return text🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` around lines 10 - 22, The type hint for normalize_llm_text is inaccurate because it can accept and return None; update its annotation to accept Optional[str] and return Optional[str] (from typing import Optional), i.e. change def normalize_llm_text(text: str) -> str to def normalize_llm_text(text: Optional[str]) -> Optional[str], and keep the existing early-return behavior so callers that pass kaapi_params.get("instructions") (which may be None) are correctly typed.
25-59: Strict-schema/strip-schema helpers don't recurse intoanyOf/oneOf/allOf.
_ensure_openai_strict_schemaand_strip_additional_propertiesonly walkpropertiesanditems. JSON Schemas that compose subschemas viaanyOf/oneOf/allOf(or nested under$defs) won't haveadditionalProperties: falseinjected (OpenAI strict mode will reject them) and won't haveadditionalPropertiesstripped on the Google side. Worth handling these for parity with the OpenAI strict-schema spec.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` around lines 25 - 59, Both _ensure_openai_strict_schema and _strip_additional_properties only recurse into "properties" and "items", so they miss subschemas under composition keywords and definitions; update both functions to also traverse "anyOf", "oneOf", "allOf" (iterating and recursively processing each subschema when they're lists) and "$defs" / "definitions" (recursively processing dict values), ensuring you treat nested dicts/lists consistently (use _ensure_openai_strict_schema in the former and _strip_additional_properties in the latter) so additionalProperties is injected or removed throughout composed/nested schemas.
5-5: Avoid depending ongoogle.genai._transformers(private module).The leading underscore signals it's not part of the public API; minor SDK upgrades can rename, restructure, or remove it without notice, breaking assessment schema conversion silently. The Google Genai SDK now natively supports JSON Schema via the documented
response_json_schemaparameter (announced Nov 2025)—pass raw JSON Schema directly without converting throught_schema. Alternatively, hand-roll the conversion in_convert_json_schema_to_googleby implementing the transformation logic (uppercase types, resolve$reffrom$defs, convertanyOf[null, X]to nullable) and pin the SDK version explicitly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` at line 5, The import of google.genai._transformers in backend.app.assessment.mappers should be removed and the assessment->GenAI schema flow updated to not rely on the private module; instead pass the raw JSON Schema into the SDK's documented response_json_schema parameter when building requests, or implement a robust internal converter _convert_json_schema_to_google that performs the necessary transformations (uppercase/openAI-style type names, resolve $ref entries from $defs, convert anyOf [ "null", X ] into a nullable representation) and pin the SDK version if you must keep an internal transformer; update any references in this file (e.g., where genai_transformers was used) to call response_json_schema or the new _convert_json_schema_to_google helper.backend/app/assessment/batch.py (2)
289-304: Shallow copy ofopenai_paramsshares nestedtext/reasoning/toolsdicts across all rows.
body = dict(openai_params)is a shallow copy: every JSONL line built in this loop ends up referencing the same nestedtext.format.json_schema,reasoning, andtoolsobjects. Today nothing mutates them after this point, but any future code that does (e.g. per-row schema override) would silently affect every other row.Use
copy.deepcopy(openai_params)here (and on the Gemini side at the equivalent assignment) if these payloads are ever expected to diverge per row.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/batch.py` around lines 289 - 304, The current shallow copy "body = dict(openai_params)" causes nested objects (like text.format.json_schema, reasoning, tools) to be shared across all rows; replace that with a deep copy (use copy.deepcopy(openai_params)) when constructing body inside the loop so each JSONL row gets an independent payload; apply the same deep-copy pattern at the equivalent Gemini-side assignment as well to avoid cross-row mutation later (look for the "body" variable, "openai_params", and the jsonl_data.append block/BATCH_KEY usage to locate the spots to change).
138-161: Placeholder-in-value injection in_build_text_prompt.
prompt.replace("{col}", row[col])replaces all occurrences in sequence. If a column value itself contains another placeholder (e.g., columnnamehas the literal text{policy}), andpolicyis also a substituted column processed later, the value gets re-expanded. Equivalent to chained substitution. For Indic-text or user-uploaded data this is unlikely, but worth either documenting the contract or usingstr.format_mapwith a defaultdict to do a single-pass substitution.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/batch.py` around lines 138 - 161, The _build_text_prompt function currently does sequential replace which allows placeholder-in-value re-expansion (e.g., a column value containing "{other}" gets re-substituted when that column is processed later); update _build_text_prompt to perform a single-pass, safe substitution by building a mapping of column->normalized value and using a single-call formatting approach (e.g., str.format_map with a default/missing-safe dict) instead of iterative prompt.replace, ensuring you still call normalize_llm_text for each value before substitution; keep the original behavior for the no-template branch unchanged.backend/app/assessment/models.py (1)
32-36: Primary-key fields are typed as non-optional but default toNone.
id: int = SQLField(default=None, primary_key=True)is internally inconsistent with strict type checking — the value isNoneuntil the DB assigns it. SQLModel's idiomatic form isid: int | None = SQLField(default=None, primary_key=True).Low-priority cleanup; behavior is unchanged.
Also applies to: 144-148
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/models.py` around lines 32 - 36, The primary-key fields (e.g., the id field declared as id: int = SQLField(default=None, primary_key=True)) are typed non-optional while defaulting to None; update the type annotations to allow None (use int | None or Optional[int]) so the signature matches the SQLField(default=None) behavior, e.g., change id: int to id: int | None = SQLField(...); apply the same fix to the other primary-key field(s) with the same pattern elsewhere in the file (the block around the second id declaration).backend/app/assessment/crud.py (1)
255-315: Recompute is read‑modify‑write without locking — concurrent updates can leave stale counters.
recompute_assessment_statusreads child runs, computes counters, and writes them back to the parent. If two workers run this concurrently for the sameassessment_id(e.g. cron + an SSE-triggered recompute), the later commit overwrites the earlier one based on a possibly stale snapshot of child rows.Today's call sites appear to be single-threaded (cron + sequential processing), so this is informational. If parallel polling is ever introduced, consider
SELECT ... FOR UPDATEon the parent row, or recompute counters in SQL with a singleUPDATE ... FROM (SELECT ...)aggregation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/crud.py` around lines 255 - 315, recompute_assessment_status currently does a read-modify-write on the Assessment row which can be lost under concurrent recomputes; to fix, acquire a row lock on the parent Assessment (e.g., SELECT ... FOR UPDATE) before recomputing in recompute_assessment_status or replace the logic with a single atomic UPDATE ... FROM (SELECT ...) aggregation that computes totals from AssessmentRun and updates Assessment in one statement; reference the recompute_assessment_status function and the Assessment/AssessmentRun models and ensure you commit while holding the lock or use a transaction so concurrent workers cannot overwrite each other's results.backend/app/assessment/utils/export.py (1)
395-417:assessment_id=run.assessment_id or 0masks orphan rows.Coercing a missing
assessment_idto0produces an export row that looks like it belongs to assessment0. If the schema invariant is that exported runs always have a parent, fail loud instead; otherwise relaxAssessmentExportRow.assessment_idtoint | Noneinmodels.pyand pass the real value through.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/utils/export.py` around lines 395 - 417, The export is masking missing parent relationships by coercing assessment_id to 0; update the export to pass the real value (use assessment_id=run.assessment_id) and either (A) if a missing parent should be an error, add an explicit check in the export routine (raise ValueError or similar when run.assessment_id is None) before building AssessmentExportRow, or (B) if exports may legitimately have no parent, update AssessmentExportRow.assessment_id type in models.py to allow None (int | None) and keep passing run.assessment_id through; locate the construction of AssessmentExportRow in export_rows.append and implement one of these two fixes around that symbol.backend/app/assessment/processing.py (1)
482-486: LGTM on the wrapper, but the local import is doing real work — call it out.Importing
poll_all_pending_assessment_evaluationsinside the function avoids a circular import withapp.assessment.cron. A short inline comment will save the next maintainer a debugging session if they "clean up" the import to module scope.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/processing.py` around lines 482 - 486, The local import of poll_all_pending_assessment_evaluations inside poll_all_pending_assessments is intentional to avoid a circular import; add a concise inline comment above the from app.assessment.cron import poll_all_pending_assessment_evaluations line explaining that the import must remain local (not module-level) to prevent circular imports and to warn future maintainers not to move it to top-level.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/alembic/versions/055_add_assessment_manager_table.py`:
- Around line 15-16: The migration's revision identifier is not zero-padded and
mismatches the filename and down_revision; update the module-level variable
revision from "55" to "055" so it matches the filename and existing
down_revision = "054", keeping the three-digit convention used across migrations
and ensuring subsequent revisions align.
In `@backend/app/assessment/batch.py`:
- Around line 107-135: The workbook created in _parse_excel_rows (wb returned by
openpyxl.load_workbook) may never be closed if iter_rows, next(header) or the
row-to-dict conversion raises; wrap the workbook usage in a try/finally (or
context manager if available) so wb.close() is always called in the finally
block after opening wb and before returning, and move any early returns (e.g.,
when ws is None or header is None) into that try so they still hit the finally
cleanup; ensure variables referenced (wb, ws, rows_iter, header, result) are
initialized inside the try scope.
- Around line 481-569: The code is defaulting unknown provider_name to "openai"
via _NATIVE_PROVIDERS.get(..., "openai"), causing unknown providers to silently
use the OpenAI path; change this to fail fast by using
_NATIVE_PROVIDERS.get(provider_name) (no default) or explicitly checking if
provider_name not in _NATIVE_PROVIDERS and raising ValueError, then use
base_provider for the if/elif branches (referencing the variables base_provider
and provider_name and the mapping _NATIVE_PROVIDERS); ensure the unreachable
else that raises ValueError remains reachable so start_batch_job is only called
for known providers.
In `@backend/app/assessment/cron.py`:
- Around line 144-182: The cleanup steps inside the broad except should be
isolated so a secondary DB error can't abort the whole cron loop: wrap the calls
to update_assessment_run_status, recompute_assessment_status, the
construction/appending of failure_result, assessment_event_broker.publish (and
the failed += 1) in their own try/except block; on exception catch and
logger.error with context (include run.id/run.assessment_id/run.run_name) and
exc_info=True, then continue the for-loop without re-raising so one failed
cleanup won't stop processing remaining runs.
- Around line 41-61: The payload returned by _build_callback_payload nests
"type" inside APIResponse.success_response(...).data so events.py's subscribe()
(which reads payload.get("type", "message")) always falls back to "message"; fix
by promoting the event type to the root of the published payload: in
_build_callback_payload (or immediately after calling
APIResponse.success_response(...).model_dump()) ensure the returned dict
contains a top-level "type": "assessment.child_status_changed" key (while
keeping the detailed run/assessment info under "data"), so subscribe() can
detect the event without changing events.py.
In `@backend/app/assessment/crud.py`:
- Around line 192-223: The update_assessment_run_status function currently
treats None as "do not change", preventing callers from clearing fields; fix
this by introducing a unique sentinel (e.g., UNSET = object()) as the default
for error_message, batch_job_id, total_items, and object_store_url, then change
the checks to "is not UNSET" and assign the value (which may be None) to
run.error_message, run.batch_job_id, run.total_items, and run.object_store_url
inside update_assessment_run_status so callers can explicitly clear fields by
passing None; update any call sites accordingly to use the sentinel omission vs
explicit None semantics.
In `@backend/app/assessment/processing.py`:
- Line 253: Update the log_prefix to include the current function name so all
subsequent logger calls follow the project's convention; replace the line
setting log_prefix = f"[assessment_run={run.id}]" with something like log_prefix
= f"[{inspect.currentframe().f_code.co_name}][assessment_run={run.id}]" (and add
"import inspect" if missing) so functions like the one containing run.id will
prefix logs with the function name and the assessment_run id used in
logger.info/logger.error calls.
- Around line 198-231: In the Google/Gemini branch (the block handling
provider_name in ("google","google-native")), populate the "usage" field from
the response instead of hard-coding None (e.g., extract response.get("usage")
or, if Gemini batch responses use a different shape, parse the batch structure
to pull usage/metadata) and treat an empty string from
extract_text_from_response_dict(response) as an error by setting error="Empty
response output" (mirror the Anthropic branch behavior); update the logic around
response, text = extract_text_from_response_dict(response) and the
results.append entries to include the extracted usage and to set error when text
is empty.
In `@backend/app/assessment/utils/export.py`:
- Around line 60-66: The expansion loop in export.py silently overwrites input
columns that share names with export fields (e.g. output, error, row_id, run_id,
result_status, response_id) by setting new_row[k] from input_data then calling
new_row.update(row); modify the loop that builds new_row (the variables
row_payload, input_data, input_keys, new_row are present) to detect collisions
against a reserved_fields set and instead namespace conflicting input keys (e.g.
store as "input_<k>") or, alternatively, emit a single warning when any
collision is detected; ensure namespacing logic also avoids clobbering an
existing "input_<k>" in row and preserves original row fields when calling
new_row.update(row).
- Around line 422-433: The sort_export_rows function currently sorts by
row.row_id lexicographically which makes "row_10" come before "row_2"; update
sort_export_rows (and any callers expecting stable order like
load_export_rows_for_run / AssessmentExportRow) to extract the numeric index
from row.row_id (e.g. parse the suffix after "row_") and use that integer in the
sort key instead of the raw string, keeping the existing tie-breakers
(row.config_version or 0, row.run_id); ensure the parser handles missing/invalid
suffixes by falling back to 0 so sorting remains stable.
---
Duplicate comments:
In `@backend/app/assessment/dataset.py`:
- Around line 78-101: The _count_excel_rows function can leak the workbook
handle because wb.close() is only called on normal return; ensure the workbook
is always closed by moving wb.close() into a finally block (or use
contextlib.closing or openpyxl.Workbook context manager) so that after creating
wb = openpyxl.load_workbook(...) and before returning from any path (including
exceptions) the workbook is closed; update the logic around ws, header, and the
count comprehension to use this guaranteed cleanup and remove duplicate close
calls inside the try block.
- Around line 140-163: The code currently calls _upload_file_to_object_store and
then unconditionally creates the DB row via create_evaluation_dataset even when
the upload fails or returns None; change upload_dataset so it treats upload
failures as fatal: call _upload_file_to_object_store inside a try/except,
propagate or raise a clear exception when the upload raises or when the returned
object_store_url is falsy/None, and only call create_evaluation_dataset when
object_store_url is a valid non-empty string; reference
_upload_file_to_object_store and create_evaluation_dataset to locate the change
and ensure no EvaluationDataset is persisted if the upload did not succeed.
In `@backend/app/assessment/events.py`:
- Around line 13-48: The subscriber queues are unbounded (asyncio.Queue()) so
slow/idle SSE clients can cause OOM; fix by creating bounded queues in subscribe
(e.g., queue: asyncio.Queue[dict] = asyncio.Queue(maxsize=XX)) and update
publish to handle overflow: if queue.put_nowait(payload) raises
asyncio.QueueFull, either drop the oldest item from that queue
(queue.get_nowait() then retry put_nowait) or treat the subscriber as dead by
discarding the queue from self._subscribers and logging/removing it; change
symbols: modify subscribe (queue creation) and publish (the for loop over
self._subscribers and use of queue.put_nowait) to implement the chosen overflow
strategy and add appropriate debug/info logs.
---
Nitpick comments:
In `@backend/app/alembic/versions/055_add_assessment_manager_table.py`:
- Around line 121-132: The migration defines NOT NULL columns inserted_at and
updated_at without DB defaults, which breaks raw INSERTs; update the Column
definitions for "inserted_at" and "updated_at" to include a database server
default (server_default=sa.func.now()) and add server_onupdate=sa.func.now() for
"updated_at" so the DB will populate timestamps for direct SQL inserts and
automatic updates from the database side.
In `@backend/app/api/routes/cron.py`:
- Around line 79-95: The code mutates result assuming it's a dict which can
raise a TypeError if process_all_pending_evaluations (or the variable result) is
None or not a dict; guard and normalize it before merging by checking
isinstance(result, dict) (or using result = result or {}) and then safely set
result["assessment"] = assessment_result and update totals from
assessment_result; reference the variables result and assessment_result and the
surrounding merge block so the normalization happens immediately before the
lines that assign result["assessment"] and update
total_processed/total_failed/total_still_processing.
- Around line 71-77: The inline runtime import of
poll_all_pending_assessment_evaluations inside the cron request handler should
be resolved: either hoist the import to module scope so the function is imported
once at startup (remove the in-handler from app.assessment.cron import ... and
add it at top-level), or if this was done to avoid a circular import, add a
concise comment above the inline import explaining the circular-import reason
and why the import must remain lazy. Ensure you keep the existing call to
poll_all_pending_assessment_evaluations(session=session) unchanged and reference
that symbol in your change.
In `@backend/app/assessment/batch.py`:
- Around line 289-304: The current shallow copy "body = dict(openai_params)"
causes nested objects (like text.format.json_schema, reasoning, tools) to be
shared across all rows; replace that with a deep copy (use
copy.deepcopy(openai_params)) when constructing body inside the loop so each
JSONL row gets an independent payload; apply the same deep-copy pattern at the
equivalent Gemini-side assignment as well to avoid cross-row mutation later
(look for the "body" variable, "openai_params", and the jsonl_data.append
block/BATCH_KEY usage to locate the spots to change).
- Around line 138-161: The _build_text_prompt function currently does sequential
replace which allows placeholder-in-value re-expansion (e.g., a column value
containing "{other}" gets re-substituted when that column is processed later);
update _build_text_prompt to perform a single-pass, safe substitution by
building a mapping of column->normalized value and using a single-call
formatting approach (e.g., str.format_map with a default/missing-safe dict)
instead of iterative prompt.replace, ensuring you still call normalize_llm_text
for each value before substitution; keep the original behavior for the
no-template branch unchanged.
In `@backend/app/assessment/cron.py`:
- Around line 73-76: The log call in poll_all_pending_assessment_evaluations
uses two adjacent string literals ("[poll_all_pending_assessment_evaluations] "
"No active assessments found") which concatenates into a string with double
space; update the logger.info invocation in cron.py to use a single combined
string (e.g., "[poll_all_pending_assessment_evaluations] No active assessments
found") or an f-string so the prefix and message are one contiguous string with
a single space, ensuring the message reads cleanly.
In `@backend/app/assessment/crud.py`:
- Around line 255-315: recompute_assessment_status currently does a
read-modify-write on the Assessment row which can be lost under concurrent
recomputes; to fix, acquire a row lock on the parent Assessment (e.g., SELECT
... FOR UPDATE) before recomputing in recompute_assessment_status or replace the
logic with a single atomic UPDATE ... FROM (SELECT ...) aggregation that
computes totals from AssessmentRun and updates Assessment in one statement;
reference the recompute_assessment_status function and the
Assessment/AssessmentRun models and ensure you commit while holding the lock or
use a transaction so concurrent workers cannot overwrite each other's results.
In `@backend/app/assessment/events.py`:
- Line 26: Replace the deprecated alias asyncio.TimeoutError with the builtin
TimeoutError in the exception handler inside backend.app.assessment.events (the
except block currently using asyncio.TimeoutError); update the except clause in
the function/method where that try/except appears (look for the "except
asyncio.TimeoutError:" line) to use "except TimeoutError:" so it relies on the
built-in exception instead of the asyncio alias.
In `@backend/app/assessment/mappers.py`:
- Around line 10-22: The type hint for normalize_llm_text is inaccurate because
it can accept and return None; update its annotation to accept Optional[str] and
return Optional[str] (from typing import Optional), i.e. change def
normalize_llm_text(text: str) -> str to def normalize_llm_text(text:
Optional[str]) -> Optional[str], and keep the existing early-return behavior so
callers that pass kaapi_params.get("instructions") (which may be None) are
correctly typed.
- Around line 25-59: Both _ensure_openai_strict_schema and
_strip_additional_properties only recurse into "properties" and "items", so they
miss subschemas under composition keywords and definitions; update both
functions to also traverse "anyOf", "oneOf", "allOf" (iterating and recursively
processing each subschema when they're lists) and "$defs" / "definitions"
(recursively processing dict values), ensuring you treat nested dicts/lists
consistently (use _ensure_openai_strict_schema in the former and
_strip_additional_properties in the latter) so additionalProperties is injected
or removed throughout composed/nested schemas.
- Line 5: The import of google.genai._transformers in
backend.app.assessment.mappers should be removed and the assessment->GenAI
schema flow updated to not rely on the private module; instead pass the raw JSON
Schema into the SDK's documented response_json_schema parameter when building
requests, or implement a robust internal converter
_convert_json_schema_to_google that performs the necessary transformations
(uppercase/openAI-style type names, resolve $ref entries from $defs, convert
anyOf [ "null", X ] into a nullable representation) and pin the SDK version if
you must keep an internal transformer; update any references in this file (e.g.,
where genai_transformers was used) to call response_json_schema or the new
_convert_json_schema_to_google helper.
In `@backend/app/assessment/models.py`:
- Around line 32-36: The primary-key fields (e.g., the id field declared as id:
int = SQLField(default=None, primary_key=True)) are typed non-optional while
defaulting to None; update the type annotations to allow None (use int | None or
Optional[int]) so the signature matches the SQLField(default=None) behavior,
e.g., change id: int to id: int | None = SQLField(...); apply the same fix to
the other primary-key field(s) with the same pattern elsewhere in the file (the
block around the second id declaration).
In `@backend/app/assessment/processing.py`:
- Around line 482-486: The local import of
poll_all_pending_assessment_evaluations inside poll_all_pending_assessments is
intentional to avoid a circular import; add a concise inline comment above the
from app.assessment.cron import poll_all_pending_assessment_evaluations line
explaining that the import must remain local (not module-level) to prevent
circular imports and to warn future maintainers not to move it to top-level.
In `@backend/app/assessment/service.py`:
- Line 10: service.py is importing the private function _resolve_config from
batch.py which couples modules to an internal symbol; change this by promoting
the helper to a public API (rename _resolve_config to resolve_config) or move it
into a shared module (e.g., assessment.config or assessment_utils) and update
both callers to import the new public name, then remove the underscore import
from batch.py and service.py and ensure submit_assessment_batch and any other
callers reference the new resolve_config location.
- Around line 145-193: The start_assessment flow performs multiple DB
creates/updates (create_assessment, create_assessment_run,
update_assessment_run_status, recompute_assessment_status) without an explicit
transaction, allowing partial commits if an error occurs (e.g., during
submit_assessment_batch or recompute_assessment_status); wrap the multi-step
orchestration in a single transactional boundary (use session.begin() /
session.begin_nested() as appropriate) so all create/update calls are atomic and
commit once at the end, or alternatively, on any uncaught exception ensure you
mark the parent Assessment as failed (update_assessment_run_status + update
parent assessment status and error_message via the same session) before
re-raising/returning to avoid leaving orphaned pending/processing runs; locate
these changes in start_assessment and the exception handling around
submit_assessment_batch/recompute_assessment_status to implement the chosen
approach.
- Around line 159-195: The loop currently calls recompute_assessment_status
inside both the try (after update_assessment_run_status) and the except blocks,
causing redundant DB writes; remove the per-iteration calls to
recompute_assessment_status within the try/except and keep only the final
recompute_assessment_status after the loop completes so the parent's status is
updated once; locate the calls around submit_assessment_batch,
update_assessment_run_status, and the except block handling the exception for
run processing and delete those inner recompute_assessment_status invocations
while leaving the final recompute_assessment_status(session=session,
assessment_id=assessment.id) at the end.
- Around line 51-62: The code is redundantly wrapping an already-UUID config_id
with UUID(str(...)) in the loop building AssessmentConfigRef; change the
AssessmentConfigRef call to use the existing UUID directly (e.g.
config_id=run.config_id) and if the type checker complains, assert or cast the
non-None type before use (e.g. assert run.config_id is not None then use
run.config_id) so you remove the unnecessary UUID(str(run.config_id))
round-trip; update the instantiation in the loop that creates
AssessmentConfigRef accordingly.
In `@backend/app/assessment/utils/export.py`:
- Around line 395-417: The export is masking missing parent relationships by
coercing assessment_id to 0; update the export to pass the real value (use
assessment_id=run.assessment_id) and either (A) if a missing parent should be an
error, add an explicit check in the export routine (raise ValueError or similar
when run.assessment_id is None) before building AssessmentExportRow, or (B) if
exports may legitimately have no parent, update
AssessmentExportRow.assessment_id type in models.py to allow None (int | None)
and keep passing run.assessment_id through; locate the construction of
AssessmentExportRow in export_rows.append and implement one of these two fixes
around that symbol.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 072190d8-1f9b-4ad9-9f98-d507cb197d57
📒 Files selected for processing (21)
backend/app/alembic/versions/055_add_assessment_manager_table.pybackend/app/api/main.pybackend/app/api/routes/cron.pybackend/app/assessment/__init__.pybackend/app/assessment/batch.pybackend/app/assessment/cron.pybackend/app/assessment/crud.pybackend/app/assessment/dataset.pybackend/app/assessment/events.pybackend/app/assessment/mappers.pybackend/app/assessment/models.pybackend/app/assessment/processing.pybackend/app/assessment/routes.pybackend/app/assessment/service.pybackend/app/assessment/utils/__init__.pybackend/app/assessment/utils/export.pybackend/app/assessment/utils/parsing.pybackend/app/assessment/validators.pybackend/app/models/__init__.pybackend/app/models/batch_job.pybackend/app/models/llm/request.py
✅ Files skipped from review due to trivial changes (5)
- backend/app/models/batch_job.py
- backend/app/assessment/init.py
- backend/app/api/main.py
- backend/app/assessment/utils/parsing.py
- backend/app/models/init.py
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/app/models/llm/request.py
- backend/app/assessment/utils/init.py
- backend/app/assessment/routes.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/assessment/dataset.py`:
- Around line 22-26: Remove support for legacy .xls by deleting the ".xls" entry
from the _MIME_TYPES dict and from ALLOWED_EXTENSIONS in validators.py so .xls
files are rejected at validation (do not route them to openpyxl). Also update
the validation routine that checks ALLOWED_EXTENSIONS to explicitly reject
".xls" (and return a clear validation error). Finally, harden the parsing code
that calls openpyxl (the read/parse functions in dataset.py and batch.py where
openpyxl is invoked) by replacing the broad silent except that returns 0 rows
with either re-raising the caught InvalidFileException or logging the exception
and returning a proper error response so legacy-file parsing failures aren’t
swallowed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7f0dd174-276b-4d14-9951-73c8a226e4b3
📒 Files selected for processing (3)
backend/app/assessment/batch.pybackend/app/assessment/dataset.pybackend/app/models/llm/request.py
✅ Files skipped from review due to trivial changes (1)
- backend/app/assessment/batch.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/models/llm/request.py
…ssessment modules
…ous evaluation processing
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (7)
backend/app/assessment/events.py (2)
36-48: Tighten thepayloadtype annotation.
dictwithout parameters loses information for callers and static analyzers. Since payloads are JSON-serialized and the type field is read via.get("type"), preferdict[str, Any]to match the queue type and align with the Python 3.11+ typing convention used elsewhere in the codebase.♻️ Proposed fix
- def publish(self, payload: dict) -> None: + def publish(self, payload: dict[str, Any]) -> None:Add the import at the top:
-from collections.abc import AsyncIterator +from collections.abc import AsyncIterator +from typing import AnyOptionally also update the queue annotation on Line 13 / 16 to
asyncio.Queue[dict[str, Any]]for consistency.As per coding guidelines: "Always add type hints to all function parameters and return values in Python code" and "Use Python 3.11+ with type hints throughout the codebase".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/events.py` around lines 36 - 48, Update the publish method signature to use a precise payload type: change payload: dict to payload: dict[str, Any] and add from typing import Any to the module imports; also update any asyncio.Queue annotations used for subscribers (e.g., asyncio.Queue[dict[str, Any]] or the _subscribers collection element type) so the queue and publish agree on dict[str, Any] payloads, keeping the logic unchanged.
24-24: Use builtinTimeoutErrorinstead of the deprecatedasyncio.TimeoutErroralias.Since Python 3.11,
asyncio.TimeoutErroris a deprecated alias for the builtinTimeoutError(per Ruff UP041). Given the codebase targets Python 3.11+, prefer the builtin.♻️ Proposed fix
- except asyncio.TimeoutError: + except TimeoutError: yield ": keep-alive\n\n" continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/events.py` at line 24, Replace the deprecated asyncio.TimeoutError alias with the builtin TimeoutError in the exception handler: change the except clause currently written as "except asyncio.TimeoutError" in backend/app/assessment/events.py to "except TimeoutError" and remove any now-unused reference to asyncio if that's the only usage in this module; ensure the surrounding function/method (the exception handling block in events.py) still behaves the same.backend/app/tests/api/routes/test_cron.py (1)
28-47: Consider asserting the newassessmentmerge behavior and aligning the mock shape with the production contract.The three updated tests patch
poll_all_pending_assessment_evaluationsbut never assert on the new behavior the cron handler adds (merging assessment counts into the totals and exposingresult["assessment"]). With all assessment counts mocked to0, the existingtotal_*assertions would still pass even if the merge logic inevaluation_cron_jobregressed (e.g., assessment counts ignored, key renamed, exception swallowed intoassessment_error). Adding at least one case where assessment counts are non-zero would actually exercise the new code path this PR is testing.Additionally, the real
poll_all_pending_assessment_evaluationsalways returnstotalanddetailsalongside the three counters (seebackend/app/assessment/cron.pylines 189-195). The mock omitting those fields drifts from the production contract; if a future change readsassessment["details"]orassessment["total"], these tests will silently keep passing.♻️ Suggested tightening (apply to one or more scenarios)
with patch( "app.api.routes.cron.process_all_pending_evaluations", new=AsyncMock(return_value=mock_result), ), patch( "app.assessment.cron.poll_all_pending_assessment_evaluations", new=AsyncMock( - return_value={"processed": 0, "failed": 0, "still_processing": 0} + return_value={ + "total": 2, + "processed": 1, + "failed": 1, + "still_processing": 0, + "details": [], + } ), ): response = client.get( f"{settings.API_V1_STR}/cron/evaluations", headers={"X-API-KEY": superuser_api_key.key}, ) assert response.status_code == 200 data = response.json() assert data["status"] == "success" - assert data["total_processed"] == 5 - assert data["total_failed"] == 0 - assert data["total_still_processing"] == 1 + # Verify assessment counts are merged into the aggregated totals + assert data["total_processed"] == 5 + 1 + assert data["total_failed"] == 0 + 1 + assert data["total_still_processing"] == 1 + 0 + assert data["assessment"]["processed"] == 1 + assert data["assessment"]["failed"] == 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/api/routes/test_cron.py` around lines 28 - 47, Update the tests that patch poll_all_pending_assessment_evaluations to return a realistic shape (include "processed", "failed", "still_processing" with at least one non-zero count plus the production fields "total" and "details") and then assert that evaluation_cron_job/cron handler merges those assessment counts into the response (check response.json()["assessment"] exists and that total_processed/total_failed/total_still_processing include the assessment counts). Ensure the mock uses AsyncMock(return_value=...) for poll_all_pending_assessment_evaluations and add assertions referencing the merged totals and the assessment sub-dictionary.backend/app/assessment/crud.py (1)
1-12: CRUD module should be placed underbackend/app/crud/per project convention.This file is located at
backend/app/assessment/crud.py, but the project convention places CRUD modules inbackend/app/crud/(as noted in learnings and confirmed by the existingbackend/app/crud/directory). If your team is intentionally adopting feature-folder organization for the assessment subsystem, document this exception inCLAUDE.mdor contributor guidelines to keep the convention consistent for reviewers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/crud.py` around lines 1 - 12, The CRUD module for Assessment is in the wrong location; move backend/app/assessment/crud.py into the shared CRUD package under backend/app/crud/ (keep filename crud.py) so that the Assessment and AssessmentRun operations live alongside other CRUD modules; update any imports that reference app.assessment.crud to app.crud.crud (or equivalent) and ensure logger = logging.getLogger(__name__) and references to Assessment and AssessmentRun continue to resolve; if you intentionally want feature-folder organization instead, add a clear exception note to CLAUDE.md or the contributor guidelines describing why assessment CRUD lives under app/assessment and list the impacted symbols (Assessment, AssessmentRun, functions in crud.py) so reviewers are aware.backend/app/assessment/mappers.py (3)
117-118: Dead branch —"null"is no longer a validsummaryvalue.
TextLLMParams.summarynow only accepts"auto" | "detailed" | "concise" | None. Thesummary == "null"check can never be True for validated params and only adds confusion. Drop the special-case.♻️ Suggested simplification
- if summary is not None: - reasoning_payload["summary"] = None if summary == "null" else summary + if summary is not None: + reasoning_payload["summary"] = summary🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` around lines 117 - 118, The `"null"` branch is dead because TextLLMParams.summary no longer allows "null"; inside the block that checks `if summary is not None` (where `reasoning_payload` and `summary` are used) remove the `summary == "null"` special-case and assign the `summary` value directly (e.g., set `reasoning_payload["summary"]` to `summary`), leaving None handling as-is; update any related comments if present.
96-98: Implicit aliasing ofreasoning→effortis non-obvious.
effort = kaapi_params.get("effort") or reasoningquietly promotesTextLLMParams.reasoning(Literallow/medium/high) into the OpenAIreasoning.effortslot wheneffortisn't set. The two fields have different declared enums and semantics (reasoningis described as “Reasoning configuration or instructions,” whileeffortis the OpenAI-specific reasoning effort). At minimum add a short comment explaining the fallback so future readers don’t conflate them; ideally, decide whetherreasoningis a deprecated alias foreffortand document/log it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` around lines 96 - 98, The current fallback "effort = kaapi_params.get('effort') or reasoning" silently reuses TextLLMParams.reasoning for the OpenAI-specific reasoning.effort, which conflates different enums/semantics; update the mapping in backend/app/assessment/mappers.py to make this explicit by either (a) adding a clear comment above the lines referencing kaapi_params explaining that reasoning is used as a fallback for effort and why, or (b) preferably implement an explicit conversion/logging path: if kaapi_params.get("effort") is missing but reasoning is present, map/translate reasoning to the allowed effort values (or log a deprecation/warning that reasoning is being used as an alias) before assigning to effort; reference the symbols kaapi_params, reasoning, effort and ensure any enum translation or warning message clarifies the intended semantics.
1-7: Unusedlogger.
logger = logging.getLogger(__name__)is defined but never used in this module. Either drop it, or replace the silentwarnings.append(...)calls with structured logs prefixed with the function name (per the repo logging convention) so warnings are observable in the runtime, not only in the returned tuple.As per coding guidelines: “Prefix all log messages with the function name in square brackets:
logger.info(f"[function_name] Message ...")”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` around lines 1 - 7, The module defines logger = logging.getLogger(__name__) but never uses it; update the module to either remove the unused logger variable or (preferred) replace the silent warnings.append(...) calls in the functions in this file with structured logging using that logger (e.g., logger.warning) and follow repo convention by prefixing messages with the function name in square brackets (use the function's actual name in the message), e.g., logger.warning(f"[function_name] descriptive warning..."); keep the existing return tuple behavior but emit warnings via logger instead of only appending to the warnings list so runtime logs are observable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/assessment/crud.py`:
- Around line 270-289: Remove the dead "in_progress" branch from status checks:
anywhere the code computes processing_runs or inspects AssessmentRun.status
using r.status in {"processing", "in_progress"} (e.g., the processing_runs
calculation and the other conditional in the same module) change the set to just
"processing". Update the expressions that sum or filter runs (symbol:
processing_runs) and any related conditionals that pass values into
_determine_assessment_status so they only check for "processing" to eliminate
the unreachable "in_progress" case.
In `@backend/app/assessment/mappers.py`:
- Around line 203-213: Remove the silent passthrough of the unused "reasoning"
field (delete the assignment to google_params["reasoning"]) and instead emit
warnings when unsupported fields appear: check kaapi_params for "effort" and
"summary" and call the module logger's warning (e.g., logger.warning) to notify
users those params are ignored; preserve the existing handling for
thinking_level -> google_params["thinking_config"] and output_schema ->
google_params["output_schema"] (using _convert_json_schema_to_google) unchanged.
- Around line 95-108: The OpenAI mapper must validate the required model early:
check kaapi_params.get("model") (variable model) and if it's missing/None
immediately return the same error behavior used in the Google mapper (i.e.,
surface a validation error rather than continuing); do this before calling
litellm.supports_reasoning(f"openai/{model}") so you never call
supports_reasoning with None and so model will be present when building
openai_params. Locate the OpenAI mapper's use of model, support_reasoning, and
openai_params and add that early guard/return to mirror the Google mapper's
validation logic.
In `@backend/app/models/llm/request.py`:
- Around line 38-43: Update the Field description for the summary attribute in
the request model to remove the obsolete "null/" wording; change the text that
currently reads "Use null/None to disable." to something like "Use None to
disable." so callers aren’t misled into passing the string "null" — locate the
summary: Literal["auto", "detailed", "concise"] | None = Field(...) definition
and edit only the description string accordingly.
---
Nitpick comments:
In `@backend/app/assessment/crud.py`:
- Around line 1-12: The CRUD module for Assessment is in the wrong location;
move backend/app/assessment/crud.py into the shared CRUD package under
backend/app/crud/ (keep filename crud.py) so that the Assessment and
AssessmentRun operations live alongside other CRUD modules; update any imports
that reference app.assessment.crud to app.crud.crud (or equivalent) and ensure
logger = logging.getLogger(__name__) and references to Assessment and
AssessmentRun continue to resolve; if you intentionally want feature-folder
organization instead, add a clear exception note to CLAUDE.md or the contributor
guidelines describing why assessment CRUD lives under app/assessment and list
the impacted symbols (Assessment, AssessmentRun, functions in crud.py) so
reviewers are aware.
In `@backend/app/assessment/events.py`:
- Around line 36-48: Update the publish method signature to use a precise
payload type: change payload: dict to payload: dict[str, Any] and add from
typing import Any to the module imports; also update any asyncio.Queue
annotations used for subscribers (e.g., asyncio.Queue[dict[str, Any]] or the
_subscribers collection element type) so the queue and publish agree on
dict[str, Any] payloads, keeping the logic unchanged.
- Line 24: Replace the deprecated asyncio.TimeoutError alias with the builtin
TimeoutError in the exception handler: change the except clause currently
written as "except asyncio.TimeoutError" in backend/app/assessment/events.py to
"except TimeoutError" and remove any now-unused reference to asyncio if that's
the only usage in this module; ensure the surrounding function/method (the
exception handling block in events.py) still behaves the same.
In `@backend/app/assessment/mappers.py`:
- Around line 117-118: The `"null"` branch is dead because TextLLMParams.summary
no longer allows "null"; inside the block that checks `if summary is not None`
(where `reasoning_payload` and `summary` are used) remove the `summary ==
"null"` special-case and assign the `summary` value directly (e.g., set
`reasoning_payload["summary"]` to `summary`), leaving None handling as-is;
update any related comments if present.
- Around line 96-98: The current fallback "effort = kaapi_params.get('effort')
or reasoning" silently reuses TextLLMParams.reasoning for the OpenAI-specific
reasoning.effort, which conflates different enums/semantics; update the mapping
in backend/app/assessment/mappers.py to make this explicit by either (a) adding
a clear comment above the lines referencing kaapi_params explaining that
reasoning is used as a fallback for effort and why, or (b) preferably implement
an explicit conversion/logging path: if kaapi_params.get("effort") is missing
but reasoning is present, map/translate reasoning to the allowed effort values
(or log a deprecation/warning that reasoning is being used as an alias) before
assigning to effort; reference the symbols kaapi_params, reasoning, effort and
ensure any enum translation or warning message clarifies the intended semantics.
- Around line 1-7: The module defines logger = logging.getLogger(__name__) but
never uses it; update the module to either remove the unused logger variable or
(preferred) replace the silent warnings.append(...) calls in the functions in
this file with structured logging using that logger (e.g., logger.warning) and
follow repo convention by prefixing messages with the function name in square
brackets (use the function's actual name in the message), e.g.,
logger.warning(f"[function_name] descriptive warning..."); keep the existing
return tuple behavior but emit warnings via logger instead of only appending to
the warnings list so runtime logs are observable.
In `@backend/app/tests/api/routes/test_cron.py`:
- Around line 28-47: Update the tests that patch
poll_all_pending_assessment_evaluations to return a realistic shape (include
"processed", "failed", "still_processing" with at least one non-zero count plus
the production fields "total" and "details") and then assert that
evaluation_cron_job/cron handler merges those assessment counts into the
response (check response.json()["assessment"] exists and that
total_processed/total_failed/total_still_processing include the assessment
counts). Ensure the mock uses AsyncMock(return_value=...) for
poll_all_pending_assessment_evaluations and add assertions referencing the
merged totals and the assessment sub-dictionary.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 259f6741-a7b6-438c-973c-4239e35527ce
📒 Files selected for processing (7)
backend/app/assessment/crud.pybackend/app/assessment/events.pybackend/app/assessment/mappers.pybackend/app/assessment/routes.pybackend/app/models/__init__.pybackend/app/models/llm/request.pybackend/app/tests/api/routes/test_cron.py
✅ Files skipped from review due to trivial changes (2)
- backend/app/models/init.py
- backend/app/assessment/routes.py
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…nctionality with new routes and models refactor: Clean up and optimize assessment-related code and imports test: Add tests for evaluation cron job and assessment polling chore: Update dependencies in pyproject.toml and uv.lock
…essing, and validators
- Updated assessment service to reject configs not tagged for assessment use. - Enhanced ConfigCrud to handle default tags and added methods for tag scope filtering. - Modified DocumentCrud to support tag filtering in read operations. - Adjusted models to ensure default tags are applied where necessary. - Added tests to verify behavior for default and explicit tag scenarios in config and document CRUD operations. - Improved error handling and messaging for invalid config tags in assessments.
…ent CRUD and test files
| """add tag column to config table | ||
|
|
||
| Revision ID: 056 | ||
| Revises: 055 |
There was a problem hiding this comment.
do we need two migrations?
| if not assessment: | ||
| raise HTTPException( | ||
| status_code=404, | ||
| detail=f"Assessment {assessment_id} not found or not accessible", | ||
| ) |
There was a problem hiding this comment.
this should be inside get_assessment_by_id so we dont have to write this check in other parts
| response_model=APIResponse[AssessmentDatasetResponse], | ||
| dependencies=[Depends(require_permission(Permission.REQUIRE_PROJECT))], | ||
| ) | ||
| async def upload_dataset( |
There was a problem hiding this comment.
while this is inside the assessment folder for better readbability can we write it like upload_assessment_dataset as upload_dataset can be confused with evaluation dataset. Though we should follow similar thing there so there is no confusion. Not something to be picked immediately but food for thoughts
| f | ||
| for f in AssessmentExportRow.model_fields.keys() | ||
| if f not in ("output", "input_data") |
There was a problem hiding this comment.
replace f with better variable name for readability
| for k in parsed: | ||
| if k not in seen_keys: | ||
| seen_keys[k] = None | ||
| output_keys.append(k) | ||
|
|
||
| if not output_keys: | ||
| # Keep original layout with output as a single column | ||
| fieldnames = input_col_names + list(AssessmentExportRow.model_fields.keys()) | ||
| fieldnames = [f for f in fieldnames if f != "input_data"] | ||
| return row_payload, fieldnames | ||
|
|
||
| # Build expanded rows | ||
| expanded: list[dict[str, Any]] = [] | ||
| for row, parsed in zip(row_payload, parsed_outputs, strict=True): | ||
| new_row = {k: v for k, v in row.items() if k != "output"} | ||
| if parsed: | ||
| for k in output_keys: | ||
| new_row[k] = parsed.get(k) | ||
| else: | ||
| for k in output_keys: | ||
| new_row[k] = None | ||
| if row.get("output") is not None: |
There was a problem hiding this comment.
lot of single alphabet variable which isn't clean to look or read
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| _IMAGE_MIME_BY_EXT = { |
There was a problem hiding this comment.
handles MIME type detection, base64 decoding, Google Drive URL normalization, and data-URL ]parsing should not be in CRUD so Move to services/assessment/utils/ maybe
| ["organization_id"], | ||
| ["organization.id"], | ||
| ondelete="CASCADE", | ||
| ), | ||
| sa.ForeignKeyConstraint( | ||
| ["project_id"], | ||
| ["project.id"], |
There was a problem hiding this comment.
name is missing as in line:79 above
| return sanitized or "assessment_results" | ||
|
|
||
|
|
||
| def _expand_input_columns( |
There was a problem hiding this comment.
add typehints in this file
| for k in input_data: | ||
| if k not in seen_keys: |
There was a problem hiding this comment.
use better variable names
| for k in input_keys: | ||
| col = f"input_{k}" if k in reserved_fields else k | ||
| key_map[k] = col | ||
|
|
||
| collisions = {k: v for k, v in key_map.items() if k != v} |
There was a problem hiding this comment.
better variable names
| default=None, | ||
| description="Reasoning configuration or instructions", | ||
| ) | ||
| effort: Literal["none", "minimal", "low", "medium", "high", "xhigh"] | None = Field( |
There was a problem hiding this comment.
It is by default none so would not be much impact ful for the regular llm-call
… safety - Added HTTPException raises for not found cases in `get_assessment_by_id`, `get_assessment_run_by_id`, and `get_assessment_dataset_by_id` functions to improve error handling. - Updated return types for several functions to ensure they always return the expected model instead of None. - Refactored logging statements in assessment evaluation polling to use parameterized logging for better readability and performance. - Introduced utility functions for attachment resolution, including MIME type detection and URL normalization. - Improved the handling of assessment run statuses and batch processing, ensuring proper error logging and status updates. - Enhanced test coverage for CRUD operations and mappers, ensuring that exceptions are raised and handled correctly. - Updated assessment models to use stricter type definitions for status fields and optional attributes.
…ments and core logic
…y in Assessment models
| ) | ||
| raise HTTPException( | ||
| status_code=500, | ||
| detail=f"Failed to save assessment dataset metadata: {e}", |
There was a problem hiding this comment.
in API response, we should return human readable error without leaking internal information about schema and constraints
This will be the error in this code
{
"success": false,
"error": {
"message": "Failed to save assessment dataset metadata: (psycopg2.errors.NotNullViolation) null value in column \"organization_id\" of relation \"evaluation_dataset\"
violates not-null constraint\nDETAIL: Failing row contains (null, 'My Dataset', 'description', 'ASSESSMENT', '{}', null, null, ...).\n[SQL: INSERT INTO evaluation_dataset
(name, description, type, dataset_metadata, ...) VALUES (%(name)s, %(description)s, ...)]\n[parameters: {'name': 'My Dataset', ...}]"
}
}
instead add this information in log and only return human readable error in API response
Summary
Target issue is #791
Explain the motivation for making this change. What existing problem does the pull request solve?
This PR introduces a shared assessment pipeline for partner programs, replacing manual and fragmented evaluation workflows. It adds multimodal dataset ingestion, config-driven batch assessment runs, status tracking, retries, cron-based processing, and exportable results for easier comparison and operations.
Note
tagcolumn with enum['ASSESSMENT']. Only configs and versions withtag = 'ASSESSMENT'will be shown in the assessment module. This separation is intentional, since assessment configs don’t include system instructions, and also don't want to show up the general configs in assessment config ui.Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Please add here if any other information is required for the reviewer.
Summary by CodeRabbit
New Features
Behavioral Change
Documentation
Tests