Cross-session worker discovery for experimentFuture. Scans
/proc for R processes whose redirected stdout points to a
worker_<NN>.log file (the convention written by
callr::r_bg(stdout = log_files[[i]]) in
experimentFuture), regardless of which R session originally
spawned them. This is the right tool when:
you re-ran the experimentFuture example in a new R session and
a fresh tail -f is silent because the previous run's workers
are still claiming queue rows;
you want to clean up orphans without remembering each
ef handle;
you want a one-glance view of which row each worker is
currently running (joined against the queue's
status == "RUNNING" process_id).
Linux-only (uses /proc/<pid>/fd/1 to find the log file each
worker is writing). For other Unixes use lsof -p <pid> or
ps -ef | grep tmuxRunWorkerLoop as a manual substitute.
experimentFutureList(
ef = NULL,
kill = FALSE,
signal = c("TERM", "INT", "KILL"),
queue_paths = NULL
)Optional shorthand: an "experimentFuture" object (or
list of them) whose queue_path will be added to the discovery
set. Equivalent to passing queue_paths = ef$queue_path and
handy when the result of experimentFuture() is still in
scope.
If TRUE, send signal to every worker found,
wait up to 10 s for the processes to exit, then call
tmuxRefreshQueueStatus() on each unique
queue_path to demote the now-orphaned RUNNING rows
back to PENDING. Default FALSE (list-only).
One of "TERM" (15, default; graceful), "INT"
(2; like Ctrl-C), or "KILL" (9; immediate).
Optional character vector of queue .rds paths to
inspect for workers. Use this across R sessions when the
ef handle is no longer in scope (e.g. you restarted R but
the workers from a prior experimentFuture() call are still
alive on mega and camas). Each queue's
status == "RUNNING" rows are verified for liveness via
/proc (local) or batched SSH (remote). When NULL
(default) and ef is also NULL, the function uses
only queue files auto-discovered from local /proc – which
in turn only finds callr::r_bg workers, not PSOCK cluster
workers, so on a node with no r_bg workers it sees nothing
unless ef or queue_paths is supplied.
A data.frame (one row per live worker) with columns:
pidWorker process ID.
started_atApproximate process start time
(ctime of /proc/<pid>).
log_filePath the worker is writing stdout/stderr to.
queue_pathThe first *_queue.rds found in the
log directory's parent (where experimentFuture puts it by
default), or NA if not located.
runNameHyphen-joined data column values of the row
this worker is currently running, derived from the queue's
status == "RUNNING" entry whose process_id matches.
NA if the worker is between jobs.
When kill = TRUE, the same data.frame is returned (invisibly)
describing the workers that were signalled.
if (FALSE) { # \dontrun{
# Just list everything that's running (auto-discovery via /proc only)
experimentFutureList()
# Pass the ef handle to also pick up PSOCK cluster workers and remote
# workers (anything in the queue, on any machine in `cores`).
ef <- experimentFuture(df = df, global_path = "global.R",
cores = c("localhost", "camas"), ...)
experimentFutureList(ef)
experimentFutureList(ef, kill = TRUE)
# Across R sessions, when ef is gone, drive discovery off the queue path:
experimentFutureList(queue_paths = "/mnt/shared_cache/.../future_queue.rds")
# Hard kill (SIGKILL, no chance to update queue meta on the worker side --
# but the post-kill tmuxRefreshQueueStatus() still demotes the rows).
experimentFutureList(ef, kill = TRUE, signal = "KILL")
} # }