R/tmux.R
experimentMonitor.RdSingle read-only entry point for inspecting workers regardless of which runner spawned them. Discovery is driven by what you pass:
experimentMonitor(ef = NULL, queue_paths = NULL, stats = FALSE)Optional "experimentFuture" object (or list of them) whose
queue_path and cores will be used for discovery. Switches
the function from tmux-scan mode to queue-scan mode.
Optional character vector of queue .rds paths.
Equivalent to passing ef = NULL plus queue_paths; used when the
ef handle is no longer in scope (e.g. across R sessions). When
queue_paths is supplied without ef, the SSH-alias probe is
skipped and machine_name from the queue is used verbatim as the
SSH target – which only works if the OS hostname is itself a Host
entry in ~/.ssh/config / /etc/hosts.
Logical. When TRUE, queries ps per worker (locally
or via batched SSH) to append state, cpuAvg (percent CPU
averaged over the process's lifetime – not the instantaneous rate
htop shows), RAM (GB) (resident memory), availableCores
(total CPUs on the node, from nproc), and total RAM (GB)
(total RAM on the node, from /proc/meminfo). Default FALSE.
Data.frame whose columns depend on the discovery mode:
tmux mode – session, window, pane, pane_id,
pane_ref (the "session:window.pane" string), title,
node (first dash-separated token in title that matches a
cluster alias from /etc/hosts; falls back to
localHostLabel() when the title contains only the raw local
hostname; NA if no match).
queue mode – pid, machine, started_at, log_file
(NA when the worker isn't a callr::r_bg writer), queue_path,
runName.
With stats = TRUE, five additional columns appear in either
mode: state, cpuAvg, RAM (GB), availableCores,
total RAM (GB). Returns an empty data.frame (0 rows, same
columns) if no workers are found.
Default (ef = NULL, queue_paths = NULL) – enumerates tmux
panes via tmux -S <socket> list-panes -a across every tmux server
under $TMUX_TMPDIR/tmux-<uid>/. Same behaviour the historical
tmuxListPanes() had. Per-socket failures are swallowed so one
broken socket cannot poison the rest; works outside a tmux pane
and across multiple tmux servers (e.g. sessions started under
different -L names). Cluster_Monitor panes are filtered out.
ef supplied (or queue_paths) – reads each queue file's
status == "RUNNING" rows, probes ssh <core> hostname -s once
per non-local entry in ef$cores to map OS hostnames (which is
what Sys.info()[["nodename"]] writes to the queue) back to SSH
aliases (~/.ssh/config / /etc/hosts entries), and verifies
each PID is alive (/proc/<pid> locally, batched
ssh <alias> "[ -d /proc/<pid> ]" remotely). This is the
experimentFuture() / experimentSBATCH() equivalent of the
tmux pane scan – workers there don't necessarily live in a
tmux pane, so the queue file is the authoritative record.
Either way, stats = TRUE runs the same ps -o pid=,%cpu=,rss=,state=
batch (locally and via one SSH connection per remote node) to append
CPU / RSS / state plus per-node nproc / total RAM.
The state column is the best single signal for hang-detection because
it is a snapshot (no time window needed). Values:
| State | Meaning |
R | running on CPU right now |
S | sleeping (waiting on I/O, timer, or lock) |
D | uninterruptible sleep (usually disk I/O; persistent D can indicate a hang) |
T | stopped (SIGSTOP or similar) |
Z | zombie (dead but not yet reaped) |
Closed | worker process has exited – PID no longer exists |
NA | could not determine (machine unreachable, or no parseable <node>-<pid> in title) |
experimentFutureList() for the same queue-mode discovery
plus cluster-wide kill / queue refresh / GS demotion.
tmuxListPanes() is preserved as a thin alias that calls this
function with no ef.