Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

All notable changes to Chief are documented in this file.

## [Unreleased]

### Features
- New `bash.timeout` setting in `.chief/config.yaml` (and the Settings TUI under **Bash → Command timeout**) optionally caps the runtime of external bash commands invoked by Chief — currently `worktree.setup`. Accepts a Go duration string (`"30s"`, `"5m"`). **Default is no timeout** — setup commands run unbounded unless you opt in. When configured, setup commands are killed via process-group SIGKILL on Unix so child processes (`npm install` → `node`, etc.) do not leak.
- New `agent.watchdogTimeout` setting (Settings TUI: **Agent → Watchdog timeout**) makes the agent silence watchdog configurable. Previously hardcoded at 5 minutes — long, quiet acceptance-test runs (e.g. integration suites that produce no stdout for several minutes) would be killed. Set a higher value such as `"30m"` to allow them, or `"0s"` to disable the watchdog entirely. Default unchanged at 5 minutes.

## [0.7.0] - 2026-03-08

### Features
Expand Down
31 changes: 29 additions & 2 deletions docs/reference/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,13 @@ Chief stores project-level settings in `.chief/config.yaml`. This file is create

```yaml
agent:
provider: claude # or "codex", "opencode", or "cursor"
cliPath: "" # optional path to CLI binary
provider: claude # or "codex", "opencode", or "cursor"
cliPath: "" # optional path to CLI binary
watchdogTimeout: "20m" # silence threshold before Chief kills a hung agent
worktree:
setup: "npm install"
bash:
timeout: "" # empty = no timeout (default)
onComplete:
push: true
createPR: true
Expand All @@ -29,7 +32,9 @@ onComplete:
|-----|------|---------|-------------|
| `agent.provider` | string | `"claude"` | Agent CLI to use: `claude`, `codex`, `opencode`, or `cursor` |
| `agent.cliPath` | string | `""` | Optional path to the agent binary (e.g. `/usr/local/bin/opencode`). If empty, Chief uses the provider name from PATH. |
| `agent.watchdogTimeout` | string | `5m` | How long Chief will wait without **any** output from the agent before killing it as hung. Go duration string (e.g. `"5m"`, `"30m"`). Bump this if your acceptance criteria run long, quiet commands such as integration tests that produce no stdout for several minutes — the historical 5 minute default is what cuts those runs short. Set `"0s"` to disable the watchdog. Unparseable values fall back to the default. |
| `worktree.setup` | string | `""` | Shell command to run in new worktrees (e.g., `npm install`, `go mod download`) |
| `bash.timeout` | string | `""` (no timeout) | Maximum runtime for external bash commands invoked by Chief (currently `worktree.setup`), as a Go duration (e.g. `"30s"`, `"5m"`). Empty means no timeout — setup commands can run as long as needed. Unparseable or negative values are also treated as "no timeout" but surface a warning in the worktree spinner so a typo is not silently masked. |
| `onComplete.push` | bool | `false` | Automatically push the branch to remote when a PRD completes |
| `onComplete.createPR` | bool | `false` | Automatically create a pull request when a PRD completes (requires `gh` CLI) |

Expand All @@ -55,19 +60,41 @@ onComplete:
createPR: true
```

**Cap a flaky setup that occasionally hangs:**

```yaml
worktree:
setup: "npm install && docker compose build"
bash:
timeout: "30m" # kill the setup if it runs longer than 30 minutes
```

**Long-running test suites in acceptance criteria:**

```yaml
agent:
watchdogTimeout: "30m" # allow up to 30 minutes of silence (e.g. for slow integration tests)
```

> **Migration note:** the agent watchdog default is unchanged (5 minutes of silence kills the agent), but it is now configurable. If your acceptance tests run quietly for more than 5 minutes, raise `agent.watchdogTimeout`. The new `bash.timeout` is opt-in; setup commands have no timeout by default.

## Settings TUI

Press `,` from any view in the TUI to open the Settings overlay. This provides an interactive way to view and edit all config values.

Settings are organized by section:

- **Agent** — Watchdog timeout (string, editable inline; Go duration like `20m`)
- **Worktree** — Setup command (string, editable inline)
- **Bash** — Command timeout (string, editable inline; Go duration like `30s`, `5m`)
- **On Complete** — Push to remote (toggle), Create pull request (toggle)

Changes are saved immediately to `.chief/config.yaml` on every edit.

When toggling "Create pull request" to Yes, Chief validates that the `gh` CLI is installed and authenticated. If validation fails, the toggle reverts and an error message is shown with installation instructions.

When editing **Agent → Watchdog timeout** or **Bash → Command timeout**, the value is validated as a Go duration on save. Invalid or negative values are rejected inline (the editor stays open with an error message) so a typo cannot silently disable or fall back to the default. If a project's `config.yaml` is hand-edited with an invalid value, Chief uses the field's fallback (for `agent.watchdogTimeout`: 5 minutes; for `bash.timeout`: no timeout). For `bash.timeout`, the fallback also surfaces a one-line warning in the worktree spinner.

Navigate with `j`/`k` or arrow keys. Press `Enter` to toggle booleans or edit strings. Press `Esc` to close.

## First-Time Setup
Expand Down
101 changes: 101 additions & 0 deletions internal/config/config.go
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
package config

import (
"fmt"
"os"
"path/filepath"
"strings"
"time"

"gopkg.in/yaml.v3"
)
Expand All @@ -14,12 +17,110 @@ type Config struct {
Worktree WorktreeConfig `yaml:"worktree"`
OnComplete OnCompleteConfig `yaml:"onComplete"`
Agent AgentConfig `yaml:"agent"`
Bash BashConfig `yaml:"bash"`
}

// BashConfig holds settings for external bash commands invoked by Chief
// (currently the worktree setup command).
type BashConfig struct {
// Timeout is a Go duration string (e.g. "30s", "5m"). Empty disables
// the timeout (no upper bound on bash command runtime). Unparseable or
// negative values are also treated as "no timeout" and surface a
// warning via Config.BashTimeoutWarning.
Timeout string `yaml:"timeout"`
}

// BashTimeout returns the configured bash command timeout as a time.Duration.
// A return value of 0 means "no timeout" — callers (e.g. runSetupCommand) skip
// wrapping the command in a deadline context. Empty values, unparseable
// strings, and negative durations all return 0; BashTimeoutWarning describes
// the fallback for unparseable/negative inputs so a typo does not silently
// disable a configured limit.
//
// Nil-safe: returns 0 when c is nil.
func (c *Config) BashTimeout() time.Duration {
if c == nil {
return 0
}
// Default 0 = "no timeout": setup commands are unbounded unless the
// user opts in by configuring an explicit duration.
return parseDurationOrDefault(c.Bash.Timeout, 0)
}

// BashTimeoutWarning returns a human-readable warning when the configured
// bash.timeout value is non-empty but unparseable or negative. Returns "" when
// the value is empty, valid, or when c is nil.
func (c *Config) BashTimeoutWarning() string {
if c == nil {
return ""
}
v := strings.TrimSpace(c.Bash.Timeout)
if v == "" {
return ""
}
d, err := time.ParseDuration(v)
if err != nil {
return fmt.Sprintf("bash.timeout %q is not a valid duration; ignoring (no timeout)", v)
}
if d < 0 {
return fmt.Sprintf("bash.timeout %q is negative; ignoring (no timeout)", v)
}
return ""
}

// parseDurationOrDefault parses value as a Go duration. Empty input,
// unparseable input, and negative durations all return def. Surrounding
// whitespace is ignored. An explicit "0s" returns 0 — callers interpret 0
// according to their own semantics (e.g. "no timeout" / "watchdog disabled").
func parseDurationOrDefault(value string, def time.Duration) time.Duration {
v := strings.TrimSpace(value)
if v == "" {
return def
}
d, err := time.ParseDuration(v)
if err != nil || d < 0 {
return def
}
return d
}

// AgentConfig holds agent CLI settings (Claude, Codex, OpenCode, or Cursor).
type AgentConfig struct {
Provider string `yaml:"provider"` // "claude" (default) | "codex" | "opencode" | "cursor"
CLIPath string `yaml:"cliPath"` // optional custom path to CLI binary
// WatchdogTimeout bounds how long Chief will wait without any output
// from the agent before killing the process as hung. Go duration string
// (e.g. "5m", "30m"). Empty / unparseable values use
// DefaultAgentWatchdogTimeout. "0s" disables the watchdog.
//
// This is the right knob to bump when the agent runs long, quiet
// commands as part of acceptance criteria (e.g. integration test
// suites that produce no stdout for several minutes).
WatchdogTimeout string `yaml:"watchdogTimeout"`
}

// DefaultAgentWatchdogTimeout is applied when agent.watchdogTimeout is unset
// or unparseable. Kept in sync with loop.DefaultWatchdogTimeout — that one is
// what NewLoop initialises a fresh Loop with when no config is passed; this
// one is the value AgentWatchdogTimeout returns when the manager *does* have
// a config but the user did not configure the field. If you change one,
// change the other.
const DefaultAgentWatchdogTimeout = 5 * time.Minute

// AgentWatchdogTimeout returns the configured agent watchdog timeout.
// Empty, unparseable, and negative values all return DefaultAgentWatchdogTimeout
// so behaviour matches a fresh Loop initialised without config. An explicit
// "0s" returns 0, which loop.SetWatchdogTimeout interprets as "watchdog
// disabled".
//
// Nil-safe: returns DefaultAgentWatchdogTimeout when c is nil.
func (c *Config) AgentWatchdogTimeout() time.Duration {
if c == nil {
return DefaultAgentWatchdogTimeout
}
// Default DefaultAgentWatchdogTimeout (5m) preserves the historical
// hardcoded watchdog behaviour for users who don't configure the field.
return parseDurationOrDefault(c.Agent.WatchdogTimeout, DefaultAgentWatchdogTimeout)
}

// WorktreeConfig holds worktree-related settings.
Expand Down
141 changes: 141 additions & 0 deletions internal/config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@ package config
import (
"os"
"path/filepath"
"strings"
"testing"
"time"
)

func TestDefault(t *testing.T) {
Expand Down Expand Up @@ -62,6 +64,145 @@ func TestSaveAndLoad(t *testing.T) {
}
}

func TestBashTimeout(t *testing.T) {
cases := []struct {
name string
in string
want time.Duration
}{
{"empty disables timeout", "", 0},
{"valid seconds", "30s", 30 * time.Second},
{"valid minutes", "5m", 5 * time.Minute},
{"whitespace padded", " 5m ", 5 * time.Minute},
{"invalid disables timeout", "not-a-duration", 0},
{"negative disables timeout", "-10s", 0},
{"zero disables timeout", "0s", 0},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
cfg := &Config{Bash: BashConfig{Timeout: tc.in}}
got := cfg.BashTimeout()
if got != tc.want {
t.Errorf("BashTimeout(%q) = %v, want %v", tc.in, got, tc.want)
}
})
}
}

func TestBashTimeout_NilSafe(t *testing.T) {
var cfg *Config
if got := cfg.BashTimeout(); got != 0 {
t.Errorf("nil cfg BashTimeout() = %v, want 0", got)
}
if got := cfg.BashTimeoutWarning(); got != "" {
t.Errorf("nil cfg BashTimeoutWarning() = %q, want empty", got)
}
}

func TestBashTimeoutWarning_TrimsDisplayedValue(t *testing.T) {
cfg := &Config{Bash: BashConfig{Timeout: " garbage "}}
got := cfg.BashTimeoutWarning()
if got == "" {
t.Fatal("expected warning for unparseable value")
}
if !strings.Contains(got, `"garbage"`) {
t.Errorf("expected warning to quote trimmed value, got %q", got)
}
if strings.Contains(got, `" garbage "`) {
t.Errorf("expected leading/trailing whitespace stripped from warning, got %q", got)
}
}

func TestBashTimeoutWarning(t *testing.T) {
cases := []struct {
name string
in string
wantEmpty bool
}{
{"empty -> no warning", "", true},
{"valid -> no warning", "30s", true},
{"invalid -> warning", "not-a-duration", false},
{"negative -> warning", "-10s", false},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
cfg := &Config{Bash: BashConfig{Timeout: tc.in}}
got := cfg.BashTimeoutWarning()
if (got == "") != tc.wantEmpty {
t.Errorf("BashTimeoutWarning(%q) = %q, wantEmpty=%v", tc.in, got, tc.wantEmpty)
}
})
}
}

func TestAgentWatchdogTimeout(t *testing.T) {
cases := []struct {
name string
in string
want time.Duration
}{
{"empty uses default", "", DefaultAgentWatchdogTimeout},
{"valid minutes", "20m", 20 * time.Minute},
{"valid hours", "1h", time.Hour},
{"whitespace padded", " 20m ", 20 * time.Minute},
{"invalid falls back to default", "ten-minutes", DefaultAgentWatchdogTimeout},
{"negative falls back to default", "-5m", DefaultAgentWatchdogTimeout},
{"zero disables watchdog", "0s", 0},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
cfg := &Config{Agent: AgentConfig{WatchdogTimeout: tc.in}}
got := cfg.AgentWatchdogTimeout()
if got != tc.want {
t.Errorf("AgentWatchdogTimeout(%q) = %v, want %v", tc.in, got, tc.want)
}
})
}
}

func TestAgentWatchdogTimeout_NilSafe(t *testing.T) {
var cfg *Config
if got := cfg.AgentWatchdogTimeout(); got != DefaultAgentWatchdogTimeout {
t.Errorf("nil cfg AgentWatchdogTimeout() = %v, want %v", got, DefaultAgentWatchdogTimeout)
}
}

func TestSaveAndLoadAgentWatchdogTimeout(t *testing.T) {
dir := t.TempDir()
cfg := &Config{Agent: AgentConfig{WatchdogTimeout: "20m"}}
if err := Save(dir, cfg); err != nil {
t.Fatalf("Save failed: %v", err)
}
loaded, err := Load(dir)
if err != nil {
t.Fatalf("Load failed: %v", err)
}
if loaded.Agent.WatchdogTimeout != "20m" {
t.Errorf("expected agent.watchdogTimeout='20m', got %q", loaded.Agent.WatchdogTimeout)
}
if loaded.AgentWatchdogTimeout() != 20*time.Minute {
t.Errorf("expected AgentWatchdogTimeout()=20m, got %v", loaded.AgentWatchdogTimeout())
}
}

func TestSaveAndLoadBashTimeout(t *testing.T) {
dir := t.TempDir()
cfg := &Config{Bash: BashConfig{Timeout: "2m"}}
if err := Save(dir, cfg); err != nil {
t.Fatalf("Save failed: %v", err)
}
loaded, err := Load(dir)
if err != nil {
t.Fatalf("Load failed: %v", err)
}
if loaded.Bash.Timeout != "2m" {
t.Errorf("expected bash.timeout='2m', got %q", loaded.Bash.Timeout)
}
if loaded.BashTimeout() != 2*time.Minute {
t.Errorf("expected BashTimeout()=2m, got %v", loaded.BashTimeout())
}
}

func TestExists(t *testing.T) {
dir := t.TempDir()

Expand Down
6 changes: 5 additions & 1 deletion internal/loop/loop.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,11 @@ type RetryConfig struct {
Enabled bool // Whether retry is enabled (default: true)
}

// DefaultWatchdogTimeout is the default duration of silence before the watchdog kills a hung process.
// DefaultWatchdogTimeout is the default duration of silence before the
// watchdog kills a hung process. Kept in sync with
// config.DefaultAgentWatchdogTimeout — both must move together so behaviour
// is identical whether or not the manager was given a config (callers can
// override at runtime via Loop.SetWatchdogTimeout).
const DefaultWatchdogTimeout = 5 * time.Minute

// DefaultRetryConfig returns the default retry configuration.
Expand Down
10 changes: 9 additions & 1 deletion internal/loop/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,9 @@ func (m *Manager) Start(name string) error {
instance.Loop.buildPrompt = promptBuilderForPRD(instance.PRDPath)
m.mu.RLock()
instance.Loop.SetRetryConfig(m.retryConfig)
if m.config != nil {
instance.Loop.SetWatchdogTimeout(m.config.AgentWatchdogTimeout())
}
m.mu.RUnlock()
instance.ctx, instance.cancel = context.WithCancel(context.Background())
instance.State = LoopStateRunning
Expand Down Expand Up @@ -443,7 +446,12 @@ func (m *Manager) GetState(name string) (LoopState, int, error) {
return instance.State, instance.Iteration, instance.Error
}

// GetInstance returns a copy of the loop instance data for a specific PRD.
// GetInstance returns a snapshot copy of the loop instance data for a
// specific PRD. The returned struct deliberately omits the Loop, ctx, and
// cancel fields so callers cannot mutate runtime state from outside the
// manager — use Pause/Stop/Start and the event channel for runtime queries
// and control. Tests that need the live *Loop should access m.instances
// directly under m.mu; see liveLoopFor in manager_test.go.
func (m *Manager) GetInstance(name string) *LoopInstance {
m.mu.RLock()
instance, exists := m.instances[name]
Expand Down
Loading