Incident debug sprint - NodalMerge

Goal

Practice a repeatable incident-debug workflow in under 45 minutes.

Scenario

Assume a report: “Peers connected, but collaborative updates looked inconsistent for a short period.”

Step 1: Reproduce quickly

Start host and app surfaces from developer-experience/apps.
Open two windows in the same room.
Perform 2-3 shared actions (document update, presence update, map pin).

Capture timestamps for each action.

Step 2: Capture protocol evidence

In protocol-inspector:

apply Peer lifecycle, Presence, and Replay/query presets
export trace snippet for the reproduction window

Record:

event ordering
unexpected missing/extra message types

Step 3: Capture replay evidence

In replay-lab:

capture pre-action snapshot
capture post-action snapshot
inspect with range start/size controls around suspected window

Record:

event window differences
signals that indicate delayed or reordered behavior

Step 4: Run operator triage checklist

Apply:

operators/troubleshooting
operators/metrics-and-observability
operators/replay-cli

Check:

connection lifecycle anomalies
close codes/reasons
suspicious timing gaps in observed events

Step 5: Write your incident note

Template:

symptom
reproduction steps
protocol evidence (trace snippet summary)
replay evidence (snapshot window summary)
probable cause
next mitigation

Success criteria

You can produce an evidence-backed root-cause hypothesis.
Another engineer can replay your debug flow without additional context.

Trace a room session