Skip to content

qa: plumb RADIANCE_OUTBOUND_SOCKS_ADDRESS into the Android client#8708

Open
myleshorton wants to merge 102 commits intomainfrom
qa/emulator-outbound-socks
Open

qa: plumb RADIANCE_OUTBOUND_SOCKS_ADDRESS into the Android client#8708
myleshorton wants to merge 102 commits intomainfrom
qa/emulator-outbound-socks

Conversation

@myleshorton
Copy link
Copy Markdown
Contributor

Adds a debug-only path that lets a developer route every outbound network call radiance makes from the Android client through an upstream SOCKS5 (typically the local `pinger bridge`), so the bandit treats the client as a real Russia-residential user end-to-end.

Pairs with:

Summary

  • `lantern-core/mobile`: new gomobile-exported `SetQAEnvOverrides(socks, tz string) error` that does `os.Setenv` for `RADIANCE_OUTBOUND_SOCKS_ADDRESS` and `TZ`. Must be called before `SetupRadiance` / `StartIPCServer` to take effect (radiance reads them at init time).
  • `android/.../LanternApp.kt`: override `onCreate` and call the new setter with values from Android system properties:
    • `debug.lantern.outbound_socks` → `RADIANCE_OUTBOUND_SOCKS_ADDRESS`
    • `debug.lantern.tz` → `TZ`
    • Set with `adb shell setprop debug.lantern.outbound_socks 10.0.2.2:1080`. No-op when the props are unset, so production builds aren't affected unless someone deliberately sets them on the device.
  • `go.mod`: bump radiance to the qa/outbound-socks-egress branch tip. Should be re-bumped to the merged commit once radiance#445 lands.

Why two QA props

`RADIANCE_OUTBOUND_SOCKS_ADDRESS` makes radiance route through the bridge. `TZ` (`Europe/Moscow`) is needed because the API's MaxMind logic overrides the GeoIP-derived country with the timezone-derived one when they disagree (treats it as a VPN user) — see `cmd/api/maxmind.go` `LookupCountryASNState`. Without TZ also being Moscow, the API still says `country=US` even though the IP is Russia.

Verified end-to-end

Setup:
```

1. lantern-cloud bridge (separate PR)

./cmd/pinger/bridge.sh

→ SOCKS5 listening on 127.0.0.1:1080, egress: packetstream/Russia

2. emulator boot, install, configure, launch

~/Library/Android/sdk/emulator/emulator @lantern_test -no-snapshot-load -no-boot-anim
adb shell setprop debug.lantern.outbound_socks 10.0.2.2:1080
adb shell setprop debug.lantern.tz Europe/Moscow
adb install -r build/app/outputs/flutter-apk/app-debug.apk
adb shell am start -n org.getlantern.lantern/.MainActivity
```

What logcat shows:
```
LanternApp: QA env overrides applied: outbound_socks=10.0.2.2:1080 tz=Europe/Moscow
GoLog: QA: set RADIANCE_OUTBOUND_SOCKS_ADDRESS value=10.0.2.2:1080
GoLog: Skipping publicip.Detect because RADIANCE_OUTBOUND_SOCKS_ADDRESS is set
GoLog: routing all radiance HTTP through upstream SOCKS5 addr=10.0.2.2:1080
GoLog: dropping UDP-only protocols from config request kept=15 dropped=5
GoLog: every sing-box outbound will dial via this SOCKS5 addr=10.0.2.2:1080 rewritten_outbounds=7
GoLog: received config "country":"RU","ip":"85.172.81.50" # Volgograd, AS12389 Rostelecom
```

Browsing `https://api.ipify.org\` in the emulator's Chrome shows the Lantern outbound's exit IP (Stockholm/Singapore depending on which the bandit's auto-pick converges on), not the Mac's home IP.

Test plan

  • Default (no QA props): Lantern boots normally, no `QA env` log lines, kindling uses its stacked transports as usual. Production behavior preserved.
  • With `debug.lantern.outbound_socks` set + bridge running: API sees `country=RU` and a Russia residential IP; `auto` URLTest converges on a TCP outbound from the Russia-tier set; browsing through the tunnel egresses from the entry-server location.
  • With invalid system prop value (e.g. unreachable host): Lantern logs the QA hook firing but radiance's first `/v1/config-new` retries until cancelled; no crash, no production-data leakage.
  • `adb shell setprop debug.lantern.outbound_socks ""` then re-launch: app reverts to default kindling/no-detour behavior.

Known caveats

  • The bridge SOCKS5 listener is TCP-only (no UDP ASSOCIATE), which is why the radiance PR also drops UDP-only protocols from the request. Hysteria-class protocols don't work in Russia anyway, so this is fine in scope. If we ever want UDP coverage, we'll need to add UDP ASSOCIATE to the bridge.
  • The first 10s after launch may use stale cached config from the previous boot (which can include UDP outbounds and emit transient `code=7` errors). Once the fresh `/v1/config-new` response arrives, only TCP outbounds are in rotation. Cleaner solution would be wiping `/.lantern/data/config.json` when the QA env is set; deferred.

🤖 Generated with Claude Code

garmr-ulfr and others added 30 commits March 24, 2026 16:15
Server tags are determined by URL content, not caller-supplied names.
addServerBasedOnURLs now returns the tags of added servers so callers
can connect using the actual tag. Also sends VPN status updates from
connectToServer on Linux so the UI reflects connection state changes.
jigar-f and others added 21 commits April 23, 2026 16:38
* split tunneling: treat FFI "ok" response as success, not error

_runSplitTunnelCall was checking `result != nullptr` and treating any
non-null return as an error message. But the Go FFI
(lantern-core/ffi/ffi.go) returns C.CString("ok") on success for both
addSplitTunnelItem and removeSplitTunnelItem — a non-null C string.

As a result, every successful add/remove was being reported to the UI as
a failure with message "ok". Symptoms:

- Adding a website in split tunneling showed an unstyled default
  snackbar reading "OK" (the default Material SnackBar rendering
  failure.localizedErrorMessage).
- The website appeared to not be saved — but it actually was; the
  provider's `reloaded` flag was never set, so the on-screen list never
  re-fetched from the backend.
- Re-clicking "Add" with the same domain created a duplicate entry on
  disk (visible as repeated items in split-tunnel.json) because the
  provider's local "already-added" check worked against a stale copy
  that had never been refreshed.

Fix: mirror the checkAPIError convention — treat literal "ok" as
success, parse JSON {"error": "..."} bodies for the error message, and
fall back to the raw string otherwise.

Reported in getlantern/engineering#3291 against Windows 9.0.29 build 481
(Freshdesk #173656).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* split tunneling: reuse _ffiOkResults for success-string check

Rather than hardcoding 'ok', use the existing _ffiOkResults set
({'ok', 'true'}) defined at the top of this file so the split-tunnel
path stays in sync with the other FFI success checks (e.g.
_setupRadiance at line 201).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The local showSnackbar helper in website_domain_input was using
Material's default ScaffoldMessenger.showSnackBar(SnackBar(content:
Text(message))) — producing an unstyled grey/dark snackbar that the rest
of the app doesn't use. Every call site in this file is an error path
(empty input, invalid domain, already-added, backend failure), so route
them through context.showSnackBarError which applies the app's rounded,
floating, red-background error style.

Follow-up to #8691. Addresses the "unstyled snackbar" symptom in
getlantern/engineering#3291 issue 3 for any remaining error surface
after the FFI "ok" fix.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
 connectToServer previously always called ConnectVPN, which radiance
 rejects with ErrTunnelAlreadyConnected when the tunnel is up. Check
 VPNStatus first and route to SelectServer when Connected, falling
 back to ConnectVPN otherwise.
…UI (#8689)

* review: detach connect() scope so timeout actually unblocks the UI

Copilot flagged on #8689 that the existing coroutineScope { ... } still
hangs in exactly the scenario this change is meant to protect against.
Structured coroutineScope cancels its children on exception but then
waits for them to complete — so when withTimeout fires, we cancel the
deferred (which the JNI call ignores, since it has no suspension
points) and then block on it finishing anyway. Net effect: the UI is
still frozen, which is the symptom we're trying to prevent.

Switch to a DETACHED CoroutineScope(SupervisorJob() + Dispatchers.IO).
Its Job is not a child of the enclosing coroutine, so cancelling it
doesn't join — the orphan coroutine keeps running the JNI call in the
background until Go returns or the process exits, but the caller is
unblocked and the runCatching.onFailure path fires the timeout error
state for the UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* review: add single-flight gate to prevent orphan accumulation

Copilot correctly pointed out on #8689 that the detached-scope approach
can accumulate orphan coroutines if the user retries while a previous
connect() is still stuck in JNI. Each orphan pins a Dispatchers.IO
thread; enough retries against a truly deadlocked Go side could
pressure the IO pool.

Their suggested fix (Dispatchers.IO.limitedParallelism(1)) would
serialize retries behind the orphan, turning the 2nd retry into
another 60s hang. A simple single-flight AtomicBoolean gate with fast
rejection is the cleaner mitigation:

- compareAndSet rejects concurrent attempts with IllegalStateException
  (surfaces via the existing runCatching.onFailure → error state).
- The flag clears in a try/finally inside the async block, which runs
  when the JNI call eventually returns — cancellation alone can't
  break it out, but once Go completes the finally runs and a future
  retry is admitted.
- Process death (reboot, force-stop) resets the flag naturally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* android: make restartService block until restart completes

Two bugs in the platformIfce restart path that together let the tunnel
wedge in Restarting forever on Android, triggering the "Error in VPN
operation" on every subsequent Connect attempt
(getlantern/engineering#3297, Freshdesk #173681).

1. restartService() used serviceScope.launch { ... } and returned
   immediately. Radiance's Restart() treats the sync return as "restart
   succeeded" and leaves the tunnel at status=Restarting, expecting the
   platform coroutine to drive it through stopVPN → startVPN and
   transition status via Mobile.* side-effects. If the service is torn
   down before the coroutine completes (onDestroy, process pressure),
   nothing ever transitions the tunnel out of Restarting.

   Switch to runBlocking(Dispatchers.IO) so the return actually
   reflects completion. c.mu is released on the Go side before
   RestartService is invoked, so synchronous Mobile.* callbacks on
   this thread don't deadlock.

2. stopVPNTunnel() skipped Mobile.stopVPN() when Mobile.isVPNConnected()
   returned false. isVPNConnected is status == Connected — but at the
   point stopVPNTunnel is called from restartService, radiance has
   already set status=Restarting, so the guard always skips and the
   tunnel is never actually closed.

   Swap the guard for Mobile.isRadianceConnected() — i.e. only skip
   when the IPC server itself isn't up. Mobile.stopVPN() is a no-op
   when c.tunnel is nil on the Go side, so the original guard was
   redundant even for the Connected == true case.

Evidence from Freshdesk #173681 logs for the broken path:
- 15:17:34.826 Restart → 15:17:34.828 "Tunnel restarted successfully"
  (2ms total — consistent with fire-and-forget, not real teardown)
- No subsequent tunnel.init / Tunnel connection established
- 15:19:10 onDestroy logs "Skipping stopVPN — VPN tunnel was never
  started" (same isVPNConnected() check)
- 15:21:48 next Connect fails within 2ms of the IPC request with
  "tunnel is currently Restarting"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* android: drop isVPNConnected guard in onDestroy too

Same shape as the restart-path fix: if c.tunnel is non-nil on the Go
side but the tunnel status is anything other than Connected (Restarting
after a failed restart, Connecting mid-startup, Error from a prior
failure), isVPNConnected() returns false and the old guard skipped
Mobile.stopVPN(). That left the radiance tunnel state dangling across
service destroy.

Observed in Freshdesk #173681: "onDestroy — radianceConnected=true
vpnConnected=false, Skipping stopVPN — VPN tunnel was never started"
while the tunnel was actually alive at status=Restarting.

Swap the second guard for an unconditional call. Mobile.stopVPN() is a
no-op when c.tunnel is nil, so the guard was always redundant — it just
happened to also hide the non-Connected-but-non-nil case that's
load-bearing during restart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* android: verify restart postcondition before returning to Go

launchVPN wraps its body in runCatching { ... }.onFailure { ... } and
returns normally regardless of whether Mobile.startVPN() threw — so a
nil return from startVPN() does not mean the restart succeeded. Without
a postcondition check, restartService would log "completed" and return
to radiance as if everything worked, even though the tunnel is still
stuck in Restarting, which defeats the whole point of making this
function block.

Check Mobile.isVPNConnected() at the end of the runBlocking block and
throw IllegalStateException if false. The exception propagates through
runBlocking → restartService → radiance's platformIfce.RestartService()
as a non-nil error, so Restart() hits the ErrorStatus branch and the
caller sees the failure.

Addresses Copilot review feedback on PR #8697.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Adam Fisk <afisk@mini.local>
The PacketTunnelExtension hosts the IPC server, so cancelTunnelWithError
tears down the daemon along with the tunnel. Inline MobileStartVPN in
restartService so a failed restart leaves the extension (and IPC socket)
alive; radiance's status events surface the failure for retry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: atavism <atavism@users.noreply.github.com>
* main: don't block first paint on Updater.init()

Moving Updater.init() off the critical path to runApp. Investigating a
one-shot black-screen-on-startup report on a local macOS dev build
(9.0.29 build 487): flutter.log stopped at the last pre-runApp log line
with no Dart exception and no crash, while the Go side kept running
normally. The only awaited call between that last log and runApp is
Updater.init().

Inside init(), the actual update check is already deferred 45 s via
Future.delayed + unawaited. But setFeedURL and setScheduledCheckInterval
are awaited — both bridge into Sparkle via the auto_updater Flutter
plugin, and both can stall on first launch: feed URL resolution,
keychain access, or a previous launch's background worker still holding
a lock. Any of those becomes a main-isolate hang that prevents runApp,
which exactly matches the observed symptom.

Fix: drop the await so Updater.init() runs concurrently with the rest
of startup. All errors are already handled inside init() itself, so
unawaited is safe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* review: guard sl<Updater>() lookup against failed service injection

Copilot flagged that if injectServices() throws above (caught at
main.dart:45), Updater is never registered (it's registered at
injection_container.dart:40, after storage init), and sl<Updater>()
throws synchronously. unawaited() doesn't help — the throw happens
before the Future is constructed, so it propagates out of main and
prevents runApp.

Wrap the call in try/catch + sl.isRegistered<Updater>() so any failure
to look up or start Updater.init logs and continues to runApp.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the FFI path to radiance's ipc.Client.TailLogs and merges in-app
flutter.log records so the diagnostic logs view shows both sources.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up:
- refactor(vpn): own VPN status on the client so restarts span tunnels
- vpn: instrument tunnel.start phases + VPNClient.Restart (#443)

The VPN-status-ownership refactor moves setStatus calls out of
tunnel and onto VPNClient so a restart transitions Restarting →
Disconnecting → Disconnected → Connecting → Connected cleanly.

The instrumentation PR adds child spans around libbox.Setup,
libbox.NewServiceWithContext, libbox.BoxService.Start, and
newMutableGroupManager so SigNoz can attribute the 10s+ tail
on /service/start observed in Freshdesk #173696.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nnel (#8702)

* lantern-core: dispatch ConnectVPN to SelectServer on live tunnel

When the Flutter UI triggers an auto-select on a live tunnel — most
visibly Jigar's rewrite of onSmartLocation (server_selection.dart), which
routes "switch back to Smart" through startVPN(force: true) → Dart
lantern.startVPN() → ffi.go:startVPN → c.ConnectVPN("") — radiance's
/vpn/connect endpoint rejects the request with ErrTunnelAlreadyConnected
(radiance/vpn/vpn.go:126 in VPNClient.Connect). The error is returned to
the Dart UI as a snackbar, the tunnel stays pinned to the previously
selected manual server, and lantern.log is silent because neither
LocalBackend.ConnectVPN nor VPNClient.Connect slog the ErrTunnelAlready
Connected path.

Observed on 9.0.30 beta (internal tester, Freshdesk #173763, build from
commit 4054689 which includes Jigar's 2895072). After manually
picking Bogotá, clicking "Smart" at the top of the server-selection
screen surfaces the snackbar and the tunnel keeps routing traffic
through the Bogotá samizdat outbound.

Fix: when Status() == Connected, LanternCore.ConnectVPN dispatches the
request to /server/selected (the live-tunnel outbound swap) instead of
/vpn/connect. Empty tag normalizes to vpn.AutoSelectTag — Dart sends ""
for Smart, radiance recognizes only the literal "auto" and otherwise
falls into the manual-outbound branch of SelectServer, stranding Clash
in manual mode with an empty selector. The mapping is centralized in a
small normalizeAutoTag helper used by both ConnectVPN and SelectServer.

This puts the same dispatch logic that lives in ffi.go:connectToServer
onto every caller of LanternCore.ConnectVPN — including ffi.go:startVPN
(which Jigar's rewrite now funnels through) and any future FFI/mobile
entry point.

getlantern/engineering#3291 issue 3. Supersedes earlier work on
fisk/connect-dispatch-select-when-connected (485bf5a), which was
scoped to this same dispatch but predated the current refactor branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* vpn_tunnel: dispatch StartVPN to SelectServer on live tunnel (mobile path)

Mobile.StartVPN (the gomobile entry point for Android MainActivity and
iOS VPNManager) routes through vpn_tunnel.StartVPN(client), which calls
client.ConnectVPN(ctx, vpn.AutoSelectTag) directly — bypassing
lanterncore.Core. Jigar's onSmartLocation rewrite dispatches "switch
back to Smart" through startVPN(force: true), which on Android/iOS
lands here. Same ErrTunnelAlreadyConnected bug as the FFI path fixed in
the previous commit.

Mirror the VPNStatus dispatch pattern garmr already added to
vpn_tunnel.ConnectToServer in 4054689: when Status() == Connected,
swap outbound via /server/selected; otherwise fall through to the
existing /vpn/connect start.

Together with the LanternCore.ConnectVPN dispatch, this closes the
Smart-from-connected bug on every platform (Windows FFI, Android/iOS
gomobile).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ffi: drop now-redundant VPNStatus dispatch in connectToServer

LanternCore.ConnectVPN already routes to /server/selected when the
tunnel is live (added earlier in this PR), so ffi.go:connectToServer's
own VPNStatus check is duplicate work. Collapse to a single c.ConnectVPN
call — both the live-tunnel-swap and fresh-connect paths flow through
the dispatch one layer down.

Behavior unchanged. The "start service failed" error wrapper is kept
for Dart-side snackbar stability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* lantern-core: collapse dispatch to a single implementation in vpn_tunnel

Three functions had independent VPNStatus → SelectServer-vs-ConnectVPN
dispatches after the earlier commits: LanternCore.ConnectVPN,
vpn_tunnel.StartVPN (both added in this PR), and vpn_tunnel.ConnectToServer
(pre-existing from 4054689). Consolidate so vpn_tunnel.ConnectToServer
is the authoritative dispatch and the other two delegate.

- LanternCore.ConnectVPN → vpn_tunnel.ConnectToServer(lc.client, tag)
- vpn_tunnel.StartVPN → ConnectToServer(client, vpn.AutoSelectTag)

LanternCore.SelectServer keeps its own empty-tag normalization since its
scope is the one-shot SelectServer IPC, not the dispatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…089) (#8703)

Patrick's radiance fac9089 ("fix(vpn): treat the empty string as
AutoSelect in SelectServer") is now pinned on this branch via
72a6c62. Radiance normalizes tag == "" → AutoSelectTag on both
ConnectVPN and SelectServer, so the client-side normalizations we
added earlier (normalizeAutoTag helper in core.go, `if tag == ""` in
vpn_tunnel.ConnectToServer) are redundant — radiance handles the Dart
"" convention uniformly.

Remove:
- LanternCore.normalizeAutoTag helper + its use in SelectServer
- `if tag == "" { tag = vpn.AutoSelectTag }` branch in
  vpn_tunnel.ConnectToServer
- lantern-core/core_test.go (only tested the removed helper)

Behavior unchanged end-to-end: empty tag still means auto-select on
every path (FFI, gomobile, connectToServer, startVPN).

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Server empty-tag fix (#8705)

radiance@d5a1872 completes fac9089's empty-string → AutoSelectTag
normalization by extending it to LocalBackend.SelectServer, which
previously only matched the literal "auto" and fell through to the
srvManager lookup for tag == "" — producing "no server found with tag"
(HTTP 500, snackbar) on Smart-from-connected flows after the client-
side normalization was removed in this branch's 6de3c9a.

Reported on Lantern 9.0.30 beta via Freshdesk #173773.

go.mod + go.sum bump only; no lantern code changes. Pinned commit:
getlantern/radiance@d5a18726afbc (#444).

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(logs): stream diagnostic logs via ipc TailLogs on mobile

Adds a mobile gomobile binding for ipc.Client.TailLogs (TailLogs +
LogSubscription) and switches Android and iOS to consume it, replacing
the per-platform log-file tailers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(logs): stream diagnostic logs via ipc TailLogs on macos

Switches the macOS log stream to MobileTailLogs, matching iOS. Removes
the file-watching LogTailer (no remaining callers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(logs): harden TailLogs against nil, panics, and listener leaks

- Reject nil listener in mobile.TailLogs; recover from panics crossing
  the gomobile bridge so the stream survives unexpected bridge errors.
- Retain the Kotlin LogListener in a field so the Go side's reference
  stays strongly rooted on the JVM.
- On iOS/macOS, cancel any pre-existing subscription before starting a
  new one and clear the stored listener when MobileTailLogs errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(logs): share TailLogs plumbing across mobile and ffi

Adds lantern-core/logs.Subscribe wrapping ipc.Client.TailLogs so the
mobile and desktop integrations go through one helper. Drops the iOS
LogTailer dead code and the unused lantern-core/logging package.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update log formatting

* Fix issue with ios

* Fix macos logs issue

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Jigar-f <jigar@getlantern.org>
Copilot AI review requested due to automatic review settings April 27, 2026 21:25
…r-operation timeouts (#8707)

* ffi: skip the daemon-reachability preflight on Windows / macOS / mobile

The 300 ms preflight in lantern-core/core.go's CheckDaemonReachable
was originally tuned for the Linux flow (PR #8494 by atavism, commit
bf054f4), where the failure path falls back to `systemctl is-active
lanternd.service` for a rich diagnostic error. The 300 ms cap made
sense as "fast probe → systemd-rich-error", with the systemd query
adding the actual user-facing context.

Subsequent refactors (commit bd89bea Apr 7, then PR #8578 commit
4d4e06d Apr 16) generalized that preflight to all platforms but
the systemd fallback only survived in ffi_linux.go. On Windows /
macOS / mobile, ffi_nonlinux.go ended up running the same 300 ms
probe with no fallback — just an artificial guillotine in front of
ConnectVPN, which has its own "lanternd not reachable" error path
with equivalent precision.

Cold-start IPC on Windows regularly exceeds 300 ms (named-pipe dial
+ winio impersonation token dance + H2c connection preface +
goroutine scheduling on a 96-second-idle daemon), so the first VPN
toggle after launch reliably trips the timeout and shows the user a
"lanternd not reachable" error. Clicking again 10 seconds later
silently succeeds. Reproduced on the same Windows machine across
9.0.29 (Freshdesk #173696) and 9.0.30 (#173932).

Make the preflight a no-op on non-Linux. Linux keeps the original
fast-probe-then-systemdDiag flow unchanged. If we add Windows
(`sc query LanternSvc`) or macOS (`launchctl list`) diagnostics
later, restore the preflight and call them from here.

See getlantern/engineering#3382 for the full archaeology + design
discussion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ffi + lantern-core: bound IPC calls with per-operation timeouts

Companion to dropping the non-Linux daemon-reachability preflight in this
same PR. The preflight (ffi_nonlinux.go's `checkDaemonReachable`) was
introduced in commit bd89bea along with the *removal* of per-call
timeouts that used to live on the FFI layer:

    -    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    -    if err := c.Client().DisconnectVPN(ctx); err != nil { ... }
    +    if err := c.DisconnectVPN(); err != nil { ... }

After that change, the only IPC call with any deadline at all was the
300 ms preflight. Every other operation flowed lc.ctx (
context.WithCancel(context.Background())) straight through, meaning a
hung lanternd would freeze the UI indefinitely. Dropping the preflight
without restoring per-call timeouts removes the only line of defense.

Restore them at the LanternCore layer where they belong, with values
sized for the inherent work each operation does (state changes can run
into multi-second territory; status queries should be near-instant):

    ipcConnectTimeout     = 60 * time.Second   // ConnectVPN
    ipcStateChangeTimeout = 30 * time.Second   // SelectServer, DisconnectVPN
    ipcStatusTimeout      = 10 * time.Second   // VPNStatus, IsVPNRunning

These bound the worst case (hung daemon → user sees a clear error within
a minute, no indefinite spinner) without firing during normal slow paths.
The dialer's 10 s connect timeout (radiance/ipc/conn_windows.go) already
covers the lanternd-crashed case; these guard the lanternd-hung case.

vpn_tunnel.{StartVPN, StopVPN, ConnectToServer} take the ctx through
their signatures instead of building their own context.Background()
internally, so callers stay in charge of their own deadlines. mobile/
mobile.go updated to set 60 s / 30 s / 60 s contexts on its three
gomobile entry points.

CheckDaemonReachable's 300 ms timeout is kept untouched — Linux still
calls it from ffi_linux.go for the systemctl is-active fallback that's
the whole point of the fast probe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@myleshorton myleshorton changed the base branch from main to garmr/radiance-daemon-refactor April 27, 2026 22:16
*/
private fun systemProp(key: String): String {
return try {
val cls = Class.forName("android.os.SystemProperties")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@myleshorton Lets make sure this is across android version? This can easily cause a crash. Did you run on an older version?

myleshorton and others added 2 commits April 28, 2026 06:54
…ogging (#8709)

Two narrow fixes that together resolve Freshdesk #173774 / #173778 /
#173826 (Derek's "Failed to fetch installed apps" empty list on Windows
split tunneling). Split out from #8706 so they can land independently
of the broader app-discovery rework that PR also contained.

1. **GetEnabledApps returns []string{} instead of nil.**
   When no apps are split-tunneled, the previous code returned nil,
   which json.Marshal serialized as "null". Dart's jsonDecode("null")
   returns null; the receiving code does `as List`, which throws and
   the UI shows "Failed to fetch installed apps". Initializing as an
   empty slice serializes to "[]" — Dart parses that as an empty list,
   no exception, no error UI. THIS is the actual root cause of the
   empty-list reports we've been chasing; the apps-discovery scanner
   work was investigating a different (also-real but secondary) issue.

2. **UI-process slog wired up via common.Init.**
   On the refactor branch, the UI process never called common.Init.
   slog wrote to stderr (= nowhere on a GUI host), settings were
   uninitialized, no lantern.log was produced outside the daemon.
   Patrick caught this — it was a one-line miss in the refactor.

   Platform-aware so we don't double-init on platforms where the
   backend embeds in-process:
     - windows/linux: full common.Init (separate UI + daemon procs)
     - darwin/ios:    setupAppLogging into a distinct lantern-app.log
                      so the main-app slog doesn't race the tunnel
                      extension's lantern.log on lumberjack rotation
     - android:       Mobile.SetupRadiance already ran common.Init
                      upstream — fall through

3. **Auto-attach UI-process *.log to ReportIssue (windows/linux only).**
   Without it the daemon's archive glob only sees the daemon's logDir;
   UI-side lantern.log + flutter.log never reach the issue bundle. The
   daemon runs as SYSTEM on Windows; we keep UI logDir at
   %PUBLIC%\Lantern\logs so SYSTEM can read it.

The broader Windows app-discovery work from #8706 (App Paths scan, Run
keys, Squirrel pattern, isAppPathsNoise heuristic filters) is being
held in a separate PR for independent review.

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a debug-only path that lets a developer route every outbound
network call radiance makes from the Android client through an
upstream SOCKS5 (typically the local pinger bridge), so the bandit
treats the client as a real Russia-residential user end-to-end.

Pairs with:
  * radiance:    getlantern/radiance#445
  * lantern-cloud: https://github.com/getlantern/lantern-cloud/pull/2649

  * `lantern-core/mobile`: new gomobile-exported `SetQAEnvOverrides(socks, tz)`
    that does `os.Setenv` for `RADIANCE_OUTBOUND_SOCKS_ADDRESS` and `TZ`.
    Must be called before `SetupRadiance`/`StartIPCServer` to take effect.
  * `android/.../LanternApp.kt`: override `onCreate` and call the new setter
    with values from Android system properties:
      `debug.lantern.outbound_socks` -> `RADIANCE_OUTBOUND_SOCKS_ADDRESS`
      `debug.lantern.tz`             -> `TZ`
    Set with `adb shell setprop debug.lantern.outbound_socks 10.0.2.2:1080`.
    No-op when the props are unset, so production builds aren't affected
    unless someone deliberately sets them on the device.
  * `go.mod`: bump radiance to the qa/outbound-socks-egress branch tip
    (will swap back to a pinned tag once that PR lands).

Verified end-to-end in an `lantern_test` AVD with packetstream + Russia
upstream:
  - LanternApp logs `QA env overrides applied: outbound_socks=10.0.2.2:1080`
  - Radiance's `/v1/config-new` response: `country=RU ip=85.172.81.50`
  - Bandit serves Russia-tier outbounds (samizdat / reflex in DE/SE/SG/etc.)
  - All sing-box outbound dials wrapped in `_dev_outbound_socks` detour
  - Browsing in the emulator's Chrome egresses from a Lantern entry server
    (e.g. Stockholm/Singapore — bandit-assigned, not the Mac's home IP)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@myleshorton myleshorton force-pushed the qa/emulator-outbound-socks branch from d5a0f05 to e326eab Compare April 28, 2026 13:09
Base automatically changed from garmr/radiance-daemon-refactor to main April 28, 2026 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants