Find (and optionally kill) live experimentFuture workers

Cross-session worker discovery for experimentFuture. Scans /proc for R processes whose redirected stdout points to a worker_<NN>.log file (the convention written by callr::r_bg(stdout = log_files[[i]]) in experimentFuture), regardless of which R session originally spawned them. This is the right tool when:

you re-ran the experimentFuture example in a new R session and a fresh tail -f is silent because the previous run's workers are still claiming queue rows;
you want to clean up orphans without remembering each ef handle;
you want a one-glance view of which row each worker is currently running (joined against the queue's status == "RUNNING" process_id).

Linux-only (uses /proc/<pid>/fd/1 to find the log file each worker is writing). For other Unixes use lsof -p <pid> or ps -ef | grep tmuxRunWorkerLoop as a manual substitute.

experimentFutureList(
  ef = NULL,
  kill = FALSE,
  signal = c("TERM", "INT", "KILL"),
  queue_paths = NULL
)

Arguments

ef: Optional shorthand: an "experimentFuture" object (or list of them) whose queue_path will be added to the discovery set. Equivalent to passing queue_paths = ef$queue_path and handy when the result of experimentFuture() is still in scope.
kill: If TRUE, send signal to every worker found, wait up to 10 s for the processes to exit, then call tmuxRefreshQueueStatus() on each unique queue_path to demote the now-orphaned RUNNING rows back to PENDING. Default FALSE (list-only).
signal: One of "TERM" (15, default; graceful), "INT" (2; like Ctrl-C), or "KILL" (9; immediate).
queue_paths: Optional character vector of queue .rds paths to inspect for workers. Use this across R sessions when the ef handle is no longer in scope (e.g. you restarted R but the workers from a prior experimentFuture() call are still alive on mega and camas). Each queue's status == "RUNNING" rows are verified for liveness via /proc (local) or batched SSH (remote). When NULL (default) and ef is also NULL, the function uses only queue files auto-discovered from local /proc – which in turn only finds callr::r_bg workers, not PSOCK cluster workers, so on a node with no r_bg workers it sees nothing unless ef or queue_paths is supplied.

Value

A data.frame (one row per live worker) with columns:

pid: Worker process ID.
started_at: Approximate process start time (ctime of /proc/<pid>).
log_file: Path the worker is writing stdout/stderr to.
queue_path: The first *_queue.rds found in the log directory's parent (where experimentFuture puts it by default), or NA if not located.
runName: Hyphen-joined data column values of the row this worker is currently running, derived from the queue's status == "RUNNING" entry whose process_id matches. NA if the worker is between jobs.

When kill = TRUE, the same data.frame is returned (invisibly) describing the workers that were signalled.

Examples

if (FALSE) { # \dontrun{
# Just list everything that's running (auto-discovery via /proc only)
experimentFutureList()

# Pass the ef handle to also pick up PSOCK cluster workers and remote
# workers (anything in the queue, on any machine in `cores`).
ef <- experimentFuture(df = df, global_path = "global.R",
                       cores = c("localhost", "camas"), ...)
experimentFutureList(ef)
experimentFutureList(ef, kill = TRUE)

# Across R sessions, when ef is gone, drive discovery off the queue path:
experimentFutureList(queue_paths = "/mnt/shared_cache/.../future_queue.rds")

# Hard kill (SIGKILL, no chance to update queue meta on the worker side --
# but the post-kill tmuxRefreshQueueStatus() still demotes the rows).
experimentFutureList(ef, kill = TRUE, signal = "KILL")
} # }

Arguments

Value

See also

Examples