FEAT add Single-Turn Crescendo Attack (STCA)#1657
Draft
precognitivem0nk wants to merge 1 commit intomicrosoft:mainfrom
Draft
FEAT add Single-Turn Crescendo Attack (STCA)#1657precognitivem0nk wants to merge 1 commit intomicrosoft:mainfrom
precognitivem0nk wants to merge 1 commit intomicrosoft:mainfrom
Conversation
Implements Aqrawi & Abbasi 2024 (arxiv 2409.03131) as a single-turn attack class under pyrit/executor/attack/single_turn/. An adversarial chat synthesizes a configurable number of prior question-and-response pairs, the attack packs them into one user message using transitional phrases, and defers to PromptSendingAttack for sending and scoring. Closes microsoft#388 Signed-off-by: precognitivem0nk <rextedgorman@gmail.com>
Author
|
@microsoft-github-policy-service agree |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Implements the Single-Turn Crescendo Attack (STCA) from Aqrawi & Abbasi 2024 (arxiv 2409.03131) as a new single-turn attack class. An adversarial chat synthesizes a configurable number of prior question-and-response pairs that escalate toward the objective. The attack packs them into one user message using transitional phrases ("Earlier, I said", "You replied", "Then, I asked"), and defers to PromptSendingAttack for sending and scoring. The objective target sees only that one composed message.
Why
STCA closes a gap in PyRIT's single-turn attack coverage. Multi-turn Crescendo already exists at pyrit/executor/attack/multi_turn/crescendo.py. The single-turn variant is useful against targets that do not expose a multi-turn API and as a baseline for measuring escalation effectiveness without conversation state.
Closes #388.
Files
New: pyrit/executor/attack/single_turn/single_turn_crescendo.py (attack class)
New: pyrit/datasets/executors/single_turn_crescendo/stca_variant_1.yaml (adversarial system prompt)
New: tests/unit/executor/attack/single_turn/test_single_turn_crescendo.py (27 unit tests)
New: doc/code/executor/attack/single_turn_crescendo_attack.py (jupytext notebook source)
New: doc/code/executor/attack/single_turn_crescendo_attack.ipynb (rendered notebook, no execution outputs per nbstripout policy)
Modified: pyrit/executor/attack/single_turn/init.py, pyrit/executor/attack/init.py (exports)
Modified: doc/references.bib (paper citation), doc/myst.yml (notebook TOC entry)
Open scoping question (also asked in the issue thread)
Default for num_synthesized_turns: I went with 3 to track the paper's STCA-3 variant. If the team prefers no default (require callers to set it explicitly), I will drop the default and adjust tests. Marked draft pending that decision.
Verification
pytest tests/unit/executor/attack/single_turn/: 175 passed (148 existing, 27 new)
pytest tests/unit/: 6973 passed, 102 skipped (env-gated live targets), 0 failures
ruff check, ruff format --check: clean on all new and modified files
No em-dashes or en-dashes in any new file
Notes
DCO sign-off included on the commit.
Notebook is checked in without execution outputs per .pre-commit-config.yaml nbstripout. Happy to re-render with real outputs if maintainers want them captured.