Phase 1.3.3: Mask credentials in all log output#30
Phase 1.3.3: Mask credentials in all log output#30richard-devbot wants to merge 9 commits intoCursorTouch:mainfrom
Conversation
Review Summary by QodoAdd credential masking filter for secure log output
WalkthroughsDescription• Adds credential masking filter to prevent secrets leaking into logs • Masks API keys, Bearer tokens, passwords, JWTs, and auth headers • Installs masking on root logger at startup for all child loggers • Includes 18 comprehensive unit tests covering all credential patterns Diagramflowchart LR
A["Log Record"] --> B["CredentialMaskingFilter"]
B --> C["mask_credentials()"]
C --> D["Apply 5 Regex Patterns"]
D --> E["Redacted Log Output"]
F["setup_logging()"] --> G["install_credential_masking()"]
G --> H["Root Logger"]
H --> I["All Child Loggers Protected"]
File Changes1. operator_use/utils/log_masking.py
|
Code Review by Qodo
1. log_filter.py module missing
|
| # Install credential masking so no secrets leak into log files or console | ||
| from operator_use.utils.log_masking import install_credential_masking | ||
| install_credential_masking() |
There was a problem hiding this comment.
1. log_filter.py module missing 📎 Requirement gap ⚙ Maintainability
Compliance requires the credential-masking filter to be implemented at operator_use/utils/log_filter.py, but this PR implements it in operator_use/utils/log_masking.py and imports that module instead. This violates the required location/interface and can break expected integrations that rely on the mandated path.
Agent Prompt
## Issue description
The credential-masking logging filter is not implemented at the required path `operator_use/utils/log_filter.py`.
## Issue Context
The PR currently implements `CredentialMaskingFilter` in `operator_use/utils/log_masking.py` and imports it from startup, but the compliance checklist mandates the dedicated module path.
## Fix Focus Areas
- operator_use/utils/log_masking.py[51-71]
- operator_use/cli/start.py[55-57]
- operator_use/utils/__init__.py[4-15]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| # API keys / tokens with common prefixes (sk-, pk-, api-, token-, key-) | ||
| # Allows multi-segment keys like sk-proj-abc12345678 | ||
| ( | ||
| re.compile(r"(sk|pk|api|token|key)[-_][A-Za-z0-9\-_]{8,}", re.IGNORECASE), | ||
| r"\1-***REDACTED***", | ||
| ), |
There was a problem hiding this comment.
2. gsk_/aiza/nvapi- unmasked 📎 Requirement gap ⛨ Security
The masking regexes do not include required provider credential formats gsk_ (Groq), AIza (Google), or nvapi- (NVIDIA), so these can appear unmasked in logs. This violates the requirement to mask all specified provider patterns at all log levels.
Agent Prompt
## Issue description
Required provider credential patterns `gsk_`, `AIza`, and `nvapi-` are not masked by the current regex set.
## Issue Context
Current `_MASK_PATTERNS` includes `sk-` and other generic prefixes, but not these provider-specific formats mandated by compliance.
## Fix Focus Areas
- operator_use/utils/log_masking.py[7-41]
- tests/test_log_masking.py[12-66]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| # Patterns that match common credential formats in log strings. | ||
| # Order matters: more specific patterns should come before general ones. | ||
| _MASK_PATTERNS: list[tuple[re.Pattern[str], str]] = [ | ||
| # JWT-like strings (three base64url segments separated by dots) | ||
| ( | ||
| re.compile(r"eyJ[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+"), | ||
| "***JWT_REDACTED***", | ||
| ), | ||
| # Bearer token header values | ||
| ( | ||
| re.compile(r"(Bearer\s+)[A-Za-z0-9\-._~+/]+=*", re.IGNORECASE), | ||
| r"\1***REDACTED***", | ||
| ), | ||
| # API keys / tokens with common prefixes (sk-, pk-, api-, token-, key-) | ||
| # Allows multi-segment keys like sk-proj-abc12345678 | ||
| ( | ||
| re.compile(r"(sk|pk|api|token|key)[-_][A-Za-z0-9\-_]{8,}", re.IGNORECASE), | ||
| r"\1-***REDACTED***", | ||
| ), | ||
| # Authorization / x-api-key / x-auth-token headers | ||
| ( | ||
| re.compile( | ||
| r"(authorization|x-api-key|x-auth-token)\s*[:=]\s*\S+", re.IGNORECASE | ||
| ), | ||
| r"\1: ***REDACTED***", | ||
| ), | ||
| # password= / secret= / token= / api_key= patterns in query strings or log lines | ||
| ( | ||
| re.compile( | ||
| r"(password|secret|passwd|pwd|token|api_key|apikey)\s*[=:]\s*\S+", | ||
| re.IGNORECASE, | ||
| ), | ||
| r"\1=***REDACTED***", | ||
| ), | ||
| ] |
There was a problem hiding this comment.
3. Missing 32+ token masking 📎 Requirement gap ⛨ Security
The implementation does not mask generic high-entropy secrets matching [a-zA-Z0-9_-]{32,} in
key-value contexts, allowing such values to be logged unredacted. This fails the generic
credential-leakage mitigation requirement.
Agent Prompt
## Issue description
Generic high-entropy secrets (32+ chars) in key-value contexts are not masked.
## Issue Context
Compliance requires masking for `[a-zA-Z0-9_-]{32,}` when logged as `key=value` or `key: value` even if the key name is not in a predefined list.
## Fix Focus Areas
- operator_use/utils/log_masking.py[7-41]
- tests/test_log_masking.py[12-66]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| class TestMaskCredentials: | ||
| def test_masks_bearer_token(self): | ||
| text = "Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.abc.def" | ||
| result = mask_credentials(text) | ||
| assert "eyJhbGciOiJIUzI1NiJ9" not in result | ||
| assert "REDACTED" in result | ||
|
|
||
| def test_masks_api_key_pattern(self): | ||
| result = mask_credentials("Using api_key=sk-abcdefghijklmnop123456") | ||
| assert "sk-abcdefghijklmnop123456" not in result | ||
| assert "REDACTED" in result | ||
|
|
||
| def test_masks_sk_prefix_key(self): | ||
| result = mask_credentials("key is sk-proj-abc12345678") | ||
| assert "abc12345678" not in result | ||
| assert "REDACTED" in result | ||
|
|
||
| def test_masks_password_in_connection_string(self): | ||
| result = mask_credentials("Connecting to db with password=mysecretpassword123") | ||
| assert "mysecretpassword123" not in result | ||
| assert "REDACTED" in result | ||
|
|
||
| def test_masks_secret_value(self): | ||
| result = mask_credentials("secret=superSecretValue99") | ||
| assert "superSecretValue99" not in result | ||
|
|
||
| def test_masks_jwt(self): | ||
| jwt = ( | ||
| "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9." | ||
| "eyJzdWIiOiIxMjM0NTY3ODkwIn0." | ||
| "SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c" | ||
| ) | ||
| result = mask_credentials(f"token={jwt}") | ||
| assert jwt not in result | ||
| assert "REDACTED" in result | ||
|
|
||
| def test_masks_standalone_jwt(self): | ||
| """JWT not preceded by a credential keyword should use JWT_REDACTED.""" | ||
| jwt = ( | ||
| "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9." | ||
| "eyJzdWIiOiIxMjM0NTY3ODkwIn0." | ||
| "SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c" | ||
| ) | ||
| result = mask_credentials(f"Received {jwt} from upstream") | ||
| assert jwt not in result | ||
| assert "JWT_REDACTED" in result | ||
|
|
||
| def test_masks_authorization_header(self): | ||
| result = mask_credentials("authorization: Bearer mytoken123") | ||
| assert "mytoken123" not in result | ||
|
|
||
| def test_masks_x_api_key_header(self): | ||
| result = mask_credentials("x-api-key: abc123secret") | ||
| assert "abc123secret" not in result | ||
|
|
There was a problem hiding this comment.
4. Tests miss required patterns 📎 Requirement gap ☼ Reliability
Automated tests do not verify masking for all required patterns (gsk_, AIza, nvapi-, and the
generic [a-zA-Z0-9_-]{32,} key-value context pattern). This allows regressions where required
credentials could be logged unmasked.
Agent Prompt
## Issue description
Test coverage is missing for required provider patterns and the generic high-entropy key-value masking rule.
## Issue Context
Compliance requires explicit automated verification for `gsk_`, `AIza`, `nvapi-`, and `[a-zA-Z0-9_-]{32,}` in key-value contexts.
## Fix Focus Areas
- tests/test_log_masking.py[12-66]
- operator_use/utils/log_masking.py[7-41]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| def filter(self, record: logging.LogRecord) -> bool: | ||
| record.msg = mask_credentials(str(record.msg)) | ||
| if record.args: | ||
| if isinstance(record.args, dict): | ||
| record.args = { | ||
| k: mask_credentials(str(v)) for k, v in record.args.items() | ||
| } | ||
| elif isinstance(record.args, tuple): | ||
| record.args = tuple(mask_credentials(str(a)) for a in record.args) | ||
| return True |
There was a problem hiding this comment.
5. Logging args type broken 🐞 Bug ≡ Correctness
CredentialMaskingFilter.filter() converts all tuple/dict log args to strings, which can raise TypeError for existing %-style numeric formatting (e.g., %d, %.2f) during record formatting. This can crash logging calls or drop log output in normal execution paths.
Agent Prompt
### Issue description
`CredentialMaskingFilter.filter()` currently masks `record.msg` and then coerces `record.args` values to `str` to mask them. This breaks %-style logging for numeric placeholders (e.g. `%d`, `%.2f`) because the formatting step expects numbers, not strings.
### Issue Context
The repo already logs with numeric placeholders (iteration counters, timings, pids), so this will be hit in normal usage.
### Fix approach (safe)
- Avoid changing the types inside `record.args`.
- Instead, compute the final formatted message first, then mask that string:
- `rendered = record.getMessage()`
- `masked = mask_credentials(rendered)`
- Set `record.msg = masked` and `record.args = ()` so the formatter won’t re-apply `%` formatting.
- Optionally also consider masking exception text / stack info if present.
- Add/adjust tests to cover numeric placeholders (e.g., `logger.info('iter=%d', 3)` and `logger.info('took %.2f', 1.23)`) to ensure no exception is raised.
### Fix Focus Areas
- operator_use/utils/log_masking.py[54-63]
- tests/test_log_masking.py[76-118]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| def install_credential_masking() -> None: | ||
| """Install credential masking on the root logger. Call once at startup.""" | ||
| root_logger = logging.getLogger() | ||
| # Avoid double-installing | ||
| if not any(isinstance(f, CredentialMaskingFilter) for f in root_logger.filters): | ||
| root_logger.addFilter(CredentialMaskingFilter()) |
There was a problem hiding this comment.
6. Masking not globally enforced 🐞 Bug ⛨ Security
install_credential_masking() only adds the filter to the root logger, which is not a reliable way to ensure all emitted records are scrubbed (e.g., records from named loggers and/or handlers may bypass logger-level root filtering). This can leave credentials unmasked despite setup_logging() calling install_credential_masking().
Agent Prompt
### Issue description
`install_credential_masking()` currently adds `CredentialMaskingFilter` only to `logging.getLogger().filters` (root logger). This is not a robust way to guarantee redaction across the application, because masking must be applied at a point that all emitted records traverse (typically handler filters or a global LogRecordFactory).
### Issue Context
- Logging is configured with `logging.basicConfig(..., handlers=...)` in `setup_logging()`.
- Many modules log using named loggers (`logging.getLogger(__name__)`).
### Fix approach (robust)
Implement masking at one (or both) of these global enforcement points:
1) **Handler-level filters (recommended minimum):**
- In `install_credential_masking()`, iterate over `root = logging.getLogger()` and add a `CredentialMaskingFilter` to every handler in `root.handlers` (idempotently).
- Consider also patching `root.addHandler` or re-running installation after any dynamic handler additions if applicable.
2) **Global LogRecordFactory (strong guarantee):**
- Use `logging.setLogRecordFactory()` to wrap the default factory and scrub `record.msg/args` (preferably by masking `record.getMessage()` and then clearing args as described in the other finding).
### Tests to add/adjust
- Add an integration-style test that attaches a `StreamHandler` with a formatter, emits via a named logger (`logging.getLogger('operator_use.test')`), and asserts captured output is masked.
### Fix Focus Areas
- operator_use/utils/log_masking.py[66-71]
- operator_use/cli/start.py[42-58]
- tests/test_log_masking.py[120-144]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Ready for review & merge ✅Hey @Jeomon, PR #30 is ready to land. This closes Issue #22 (credential masking in all log output). What's in here:
Known limitation (flagged transparently): Novel credential formats not matching the 5 regex groups will pass through unmasked. The patterns cover the most common cases but aren't exhaustive. This PR is independent and can merge at any time. No conflicts with main. |
fc17f52 to
0ec2d9a
Compare
Response to Qodo Review — PR #30All 6 findings addressed. Branch updated and pushed. FixedReq Gap 1 — Module path (log_filter.py)
Req Gap 2 — Missing provider-specific patterns
Req Gap 3 — Missing 32+ token masking
Req Gap 4 — Tests miss required patterns
Bug 1 — Logging args type broken (TypeError on %d / %.2f)
Bug 2 — Masking not globally enforced
Test results: 28 passed, 0 failed on `tests/test_log_filter.py` All changes pushed to this branch. |
Add CredentialMaskingFilter that scrubs API keys, tokens, passwords, and JWT strings from all log records before emission. Installed on root logger at startup so all child loggers inherit protection. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
BrowserPlugin and ComputerPlugin no longer register hooks to the main agent — subagents manage their own state injection. Test assertions updated accordingly: - Remove stale XML-tag assertions from SYSTEM_PROMPT tests - Fix browser tool name: 'browser' -> 'browser_task' - Update hook tests: register_hooks() is now a no-op for main agent, so assertions verify hooks are NOT wired (not that they are)
…CursorTouch#22] Compliance mandates the filter live at operator_use/utils/log_filter.py. Renames the module and test file, and updates all import sites (utils/__init__.py, cli/start.py). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…CursorTouch#22] Must be called after all handlers are attached. Handlers added post-install do not automatically receive the filter. Adds a test documenting this contract and confirming the safe production path (root logger filter still applies).
…ursorTouch#22] Connection strings like postgresql://user:password@host leak passwords into logs. Adds a pattern that matches scheme://user:password@host and scheme://:password@host, inserted before JWT patterns to ensure DSN credentials are caught first.
…ursorTouch#22] "api-gateway-endpoint" and "hotkey-sequence" were incorrectly masked because the prefix pattern had no word guard. Adds \b at the start and a digit lookahead (?=...\d) so only credential-like suffixes (containing at least one digit) match, leaving infrastructure path components and English compound words untouched.
bed8b70 to
d0c496b
Compare
Summary
Closes #22.
operator_use/utils/log_masking.pywithCredentialMaskingFilterandmask_credentials()setup_logging()-- all child loggers inherit protection automaticallyoperator_use/utils/__init__.pyfor public API accessFiles Changed
operator_use/utils/log_masking.py(new) -- masking patterns, filter class, install functionoperator_use/utils/__init__.py-- re-export masking utilitiesoperator_use/cli/start.py-- callinstall_credential_masking()at end ofsetup_logging()tests/test_log_masking.py(new) -- 18 unit testsTest Plan
pytest tests/test_log_masking.py -v-- 18/18 tests passFailure Conditions
_MASK_PATTERNSwithout changing any other code.token=) wraps a JWT, the JWT-specific label gets overwritten by the keyword pattern. The secret is still masked; only the label differs.🤖 Generated with Claude Code