setupProject calls a sequence of functions in this order:
setupOptions (first time), setupPaths, setupRestart,
setupFunctions, setupModules, setupPackages, setupSideEffects,
setupOptions (second time), setupParams, and setupGitIgnore.
This sequence will create folder structures, install missing packages from those
listed in either the packages, require arguments or in the modules reqdPkgs fields,
load packages (only those in the require argument), set options, download or
confirm the existence of modules. It will also return elements that can be passed
directly to simInit or simInitAndSpades, specifically, modules, params,
paths, times, and any named elements passed to .... This function will also
, if desired, change the .Rprofile file for this project so that every time
the project is opened, it has a specific .libPaths().
There are a number of convenience elements described in the section below. See Details.
Because of this sequence, users can take advantage of settings (i.e., objects)
that happen (are created) before others. For example, users can set paths
then use the paths list to set options that will can update/change paths,
or set times and use the times list for certain entries in params.
setupProject(
name,
paths,
modules,
packages,
times,
options,
params,
sideEffects,
functions,
config,
require = NULL,
studyArea = NULL,
Restart = getOption("SpaDES.project.Restart"),
useGit = getOption("SpaDES.project.useGit"),
setLinuxBinaryRepo = getOption("SpaDES.project.setLinuxBinaryRepo"),
standAlone = getOption("SpaDES.project.standAlone"),
libPaths = NULL,
updateRprofile = getOption("SpaDES.project.updateRprofile"),
overwrite = getOption("SpaDES.project.overwrite"),
verbose = getOption("Require.verbose", 1L),
defaultDots,
envir = parent.frame(),
dots,
...
)Optional. If supplied, the name of the project. If not supplied, an
attempt will be made to extract the name from the paths[["projectPath"]].
If this is a GitHub project, then it should indicate the full Github
repository and branch name, e.g., "PredictiveEcology/WBI_forecasts@ChubatyPubNum12"
a list with named elements, specifically, modulePath, projectPath,
packagePath and all others that are in SpaDES.core::setPaths()
(i.e., inputPath, outputPath, scratchPath, cachePath, rasterTmpDir).
Each of these has a sensible default, which will be overridden but any user
supplied values.
See setup.
a character vector of modules to pass to getModule. These
should be one of: simple name (e.g., fireSense) which will be searched for locally
in the paths[["modulePath"]]; or a GitHub repo with branch (GitHubAccount/Repo@branch e.g.,
"PredictiveEcology/Biomass_core@development"); or a character vector that identifies
one or more module folders (local or GitHub) (not the module .R script).
If the entire project is a git repository,
then it will not try to re-get these modules; instead it will rely on the user
managing their git status outside of this function.
For convenience, these can also be 2 other url formats:
the raw.githubusercontent.com url that points to the main module file or the folder e.g.,
"https://raw.githubusercontent.com/PredictiveEcology/Biomass_core/refs/heads/main/Biomass_core.R"
The github.com url used for cloning a git repository, with optional "@branch" specified:
"https://github.com/PredictiveEcology/Biomass_speciesParameters.git@development"
See setup.
Optional. A vector of packages that must exist in the libPaths.
This will be passed to Require::Install, i.e., these will be installed, but
not attached to the search path. See also the require argument. To force skip
of package installation (without assessing modules), set packages = NULL
Optional. This will be returned if supplied; if supplied, the values
can be used in e.g., params, e.g., params = list(mod = list(startTime = times$start)).
See help for SpaDES.core::simInit.
Optional. Either a named list to be passed to options
or a character vector indicating one or more file(s) to source,
in the order provided. These will be parsed locally (not
the .GlobalEnv), so they will not create globally accessible objects. NOTE:
options is run 2x within setupProject, once before setupPaths and once
after setupPackages. This occurs because many packages use options for their
behaviour (need them set before e.g., Require::require is run; but many packages
also change options at startup. See details.
See setup.
Optional. Similar to options, however, this named list will be
returned, i.e., there are no side effects.
See setup.
Optional. This can be an expression or one or more file names or
a code chunk surrounded by {...}.
If a non-text file name is specified (e.g., not .txt or .R currently),
these files will simply be downloaded, using their relative path as specified
in the github notation. They will be downloaded or accessed locally at that
relative path.
If these file names represent scripts (*.txt or .R), this/these will be parsed and evaluated,
but nothing is returned (i.e., any assigned objects are not returned). This is intended
to be used for operations like cloud authentication or configuration functions
that are run for their side effects only.
A set of function definitions to be used within setupProject.
These will be returned as a list element. If function definitions require non-base
packages, prefix the function call with the package e.g., terra::rast. When
using setupProject, the functions argument is evaluated after paths, so
it cannot be used to define functions that help specify paths.
Reserved for future use. Currently unimplemented; supplying a value triggers an error.
Optional. A character vector of packages to install and attach
(with Require::Require). These will be installed and attached at the start
of setupProject so that a user can use these during setupProject.
See setup
Optional. If a list, it will be passed to
geodata::gadm. To specify a country other than the default "CAN",
the list must have a named element, "country". All other named elements
will be passed to gadm. 2 additional named elements can be passed for
convenience, subregion = "...", which will be grepped with the column
NAME_1, and epsg = "...", so a user can pass an epsg.io code to
reproject the studyArea. See examples.
Logical or character. If either TRUE or a character,
and if the projectPath is not the current path, and the session is in
RStudio and interactive, it will try to restart Rstudio in the projectPath with
a new Rstudio project. If character, it should represent the filename
of the script that contains the setupProject call that should be copied to
the new folder and opened. If TRUE, it will use the active file as the one
that should be copied to the new projectPath and opened in the Rstudio project.
If successful, this will create an RStudio Project file (and .Rproj.user
folder), restart with a new Rstudio session with that new project and with a root
path (i.e. working directory) set to projectPath. Default is FALSE, and no
RStudio Project is created.
(if not FALSE, then experimental still). There are two levels at which a project
can use GitHub, either the projectPath and/or the modules. Any given
project can have one or the other, or both of these under git control. If "both",
then this function will assume that git submodules will be used for the modules.
A logical or "sub" for submodule. If "sub", then this function
will attempt to clone the identified modules as git submodules. This will only
work if the projectPath is a git repository. If the project is already a git
repository because the user has set that up externally to this function call, then
this function will add the modules as git submodules. If it is not already,
it will use git clone for each module. After git clone or submodule add are run,
it will run git checkout for the named branch and then git pull
to get and change branch for each module, according to its specification in
modules. If FALSE, this function will download modules with getModules.
NOTE: CREATING A
GIT REPOSITORY AT THE PROJECT LEVEL AND SETTING MODULES AS GIT SUBMODULES IS
EXPERIMENTAL. IT IS FINE IF THE PROJECT HAS BEEN MANUALLY SET UP TO BE
A GIT REPOSITORY WITH SUBMODULES: THIS FUNCTION WILL ONLY EVALUTE PATHS. This can
be set with the option(SpaDES.project.useGit = xxx).
Logical. Should the binary RStudio Package Manager be used on Linux (ignored if Windows)
A logical. Passed to Require::standAlone. This keeps all
packages installed in a project-level library, if TRUE. Default is TRUE.
Deprecated. Use paths = list(packagePath = ...).
Logical. Should the paths$packagePath be set in the .Rprofile
file for this project. Note: if paths$packagePath is within the tempdir(),
then there will be a warning, indicating this won't persist. If the user is
using Rstudio and the paths$projectPath is not the root of the current
Rstudio project, then a warning will be given, indicating the .Rprofile may not
be read upon restart.
Logical vector or character vector, however, only getModule will respond
to a vector of values. If length-one TRUE, then all files that were previously downloaded
will be overwritten throughout the sequence of setupProject – including those downloaded via sideEffects.
If a length > 1 logical or character vector, these will be passed to getModule: only the named
modules will be overwritten or the logical vector of the modules.
NOTE: if length > 1, no other file specified anywhere in setupProject will be
overwritten except a module matching the vector names() (because
only setupModules is currently responsive to a vector). To have fine grained control,
a user can just manually delete a file, then rerun.
Numeric or logical indicating how verbose should the function
be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE,
then minimal outputs; if 1 or TRUE, more outputs; 2 even more. NOTE: in
Require function, when verbose >= 2, also returns details as if
returnDetails = TRUE (for backwards compatibility).
A named list of any arbitrary R objects.
These can be supplied to give default values to objects that
are otherwise passed in with the ..., i.e., not specifically named for these
setup* functions. If named objects are supplied as top-level arguments, then
the defaultDots will be overridden. This can be particularly useful if the
arguments passed to ... do not always exist, but rely on external e.g., batch
processing to optionally fill them. See examples.
The environment where setupProject is called from. Defaults to
parent.frame() which should be fine in most cases and user shouldn't need
to set this
Any other named objects passed as a list a user might want for other elements.
further named arguments that acts like objects, but a different
way to specify them. These can be anything. The general use case
is to create the objects that are would be passed to
SpaDES.core::simInit, or SpaDES.core::simInitAndSpades,
(e.g. studyAreaName or objects) or additional objects to be passed to the simulation
(in older versions of SpaDES.core, these were passed as a named list
to the objects argument). Order matters. These are sequentially evaluated,
and also any arguments that are specified before the named arguments
e.g., name, paths, will be evaluated prior to any of the named arguments,
i.e., "at the start" of the setupProject.
If placed after the first named argument, then they will be evaluated at the
end of the setupProject, so can access all the packages, objects, etc.
setupProject will return a named list with elements modules, paths, params, and times.
The goal of this list is to contain list elements that can be passed directly
to simInit.
It will also append all elements passed by the user in the ....
This list can be passed directly to SpaDES.core::simInit() or
SpaDES.core::simInitAndSpades() using a do.call(). See example.
NOTE: both projectPath and packagePath will be omitted in the paths list
as they are used to set current directory (found with getwd()) and .libPaths()[1],
but are not accepted by simInit. setupPaths will still return these two paths as its
outputs are not expected to be passed directly to simInit (unlike setupProject outputs).
There are a number of checks that occur during setupProject. These take time, particularly
after an R restart (there is some caching in RAM that occurs, but this will only speed
things up if there is no restart of R). To get the "fastest", these options or settings
will speed things up, at the expense of not being completely re-runnable.
You can add one or more of these to the arguments. These will only be useful after a project
is set up, i.e., setupProject and SpaDES.core::simInit has/have been run at least once
to completion (so packages are installed).
options = c(
reproducible.useMemoise = TRUE, # For caching, use memory objects
Require.cloneFrom = Sys.getenv("R_LIBS_USER"),# Use personal library as possible source of packages
spades.useRequire = FALSE, # Won't install packages/update versions
spades.moduleCodeChecks = FALSE, # moduleCodeChecks checks for metadata mismatches
reproducible.inputPaths = "~/allData"), # For sharing data files across projects
packages = NULL, # Prevents any packages installs with setupProject
useGit = FALSE # Prevents checks using gitThese will be set early in setupProject, so will affect the running of setupProject.
If the user manually sets one of these in addition to setting these, the user options will
override these.
The remining causes of setupProject being "slow" will be loading the required packages.
These options/arguments can now be set all at once
(with caution as these changes will affect how your
script will be run) with options(SpaDES.project.fast = TRUE) or in the options argument.
The overarching objectives for these functions are:
To prepare what is needed for simInit.
To help a user eliminate virtually all assignments to the .GlobalEnv,
as these create and encourage spaghetti code that becomes unreproducible
as the project increases in complexity.
Be very simple for beginners, but powerful enough to expand to almost any needs of arbitrarily complex projects, using the same structure
Deal with the complexities of R package installation and loading when working with modules that may have been created by many users
Create a common SpaDES project structure, allowing easy transition from one project to another, regardless of complexity.
Throughout these functions, efforts have been made to implement sequential evaluation,
within files and within lists. This means that a user can use the values from an
upstream element in the list. For example, the following where projectPath is
part of the list that will be assigned to the paths argument and it is then
used in the subsequent list element is valid:
setupPaths(paths = list(projectPath = "here",
modulePath = file.path(paths[["projectPath"]], "modules")))Because of such sequential evaluation, paths, options, and params files
can be sequential lists that have impose a hierarchy specified
by the order. For example, a user can first create a list of default options,
then several lists of user-desired options behind an if (user("emcintir"))
block that add new or override existing elements, followed by machine specific
values, such as paths.
setupOptions(
maxMemory <- 5e+9 # if (grepl("LandWeb", runName)) 5e+12 else 5e+9
# Example -- Use any arbitrary object that can be passed in the `...` of `setupOptions`
# or `setupProject`
if (.mode == "development") {
list(test = 2)
}
if (machine("A127")) {
list(test = 3)
}
)Arguments that are not the named arguments (i.e., the ones passed in ...)
are evaluated in the order they are written. Subsequent arguments can use the
previous arguments. If "dot" arguments are declared before the first
standard arguments (the "formals") of the function, then they will be evaluated
prior to the formals. If they are after a single standard argument (i.e., not
necessarily after all the named arguments), then they will be evaluated after
all standard arguments. The exception to this is params, which will be evaluated
like the ... arguments, i.e., in order.
The arguments, paths, options, and params, can all
understand lists of named values, character vectors, or a mixture by using a list where
named elements are values and unnamed elements are character strings/vectors. Any unnamed
character string/vector will be treated as a file path. If that file path has an @ symbol,
it will be assumed to be a file that exists on a GitHub repository in https://github.com.
So a user can pass values, or pointers to remote and/or local paths that themselves have values.
The following will set an option as declared, plus read the local file (with relative path), plus download and read the cloud-hosted file.
setupProject(
options = list(reproducible.useTerra = TRUE,
"inst/options.R",
"PredictiveEcology/SpaDES.project@development/inst/options.R")
)
)This approach allows for an organic growth of complexity, e.g., a user begins with only named lists of values, but then as the number of values increases, it may be helpful to put some in an external file.
NOTE: if the GitHub repository is private the user must configure their GitHub
token by setting the GITHUB_PAT environment variable – unfortunately, the usethis
approach to setting the token will not work at this moment.
paths, options, paramsIf paths, options, and/or params are a character string
or character vector (or part of an unnamed list element) the string(s)
will be interpreted as files to parse. These files should contain R code that
specifies named lists, where the names are one or more paths, options,
or are module names, each with a named list of parameters for that named module.
This last named list for params follows the convention used for the params argument in
simInit(..., params = ).
These files can use paths, times, plus any previous list in the sequence of
params or options specified. Any functions that are used must be available,
e.g., prefixed Require::normPath if the package has not been loaded (as recommended).
If passing a file to options, it should not set options() explicitly;
only create named lists. This enables options checking/validating
to occur within setupOptions and setupParams. A simplest case would be a file with this:
opts <- list(reproducible.destinationPath = "~/destPath").
All named lists will be parsed into their own environment, and then will be
sequentially evaluated (i.e., subsequent lists will have access to previous lists),
with each named elements setting or replacing the previously named element of the same name,
creating a single list. This final list will be assigned to, e.g., options() inside setupOptions.
Because each list is parsed separately, they to not need to be assigned objects;
if they are, the object name can be any name, even if similar to another object's name
used to built the same argument's (i.e. paths, params, options) final list.
Hence, in an file to passed to options, instead of incrementing the list as:
a <- list(optA = 1)
b <- append(a, list(optB = 2))
c <- append(b, list(optC = 2.5))
d <- append(c, list(optD = 3))one can do:
NOTE: only atomics (i.e., character, numeric, etc.), named lists, or either of these that are protected by 1 level of "if" are parsed. This will not work, therefore, for other side-effect elements, like authenticating with a cloud service.
Several helper functions exist within SpaDES.project that may be useful, such
as user(...), machine(...)
To allow for batch submission, a user can specify code argument = value even if value
is missing. This type of specification will not work in normal parsing of arguments,
but it is designed to work here. In the next example, .mode = .mode can be specified,
but if R cannot find .mode for the right hand side, it will just skip with no error.
Thus a user can source a script with the following line from batch script where .mode
is specified. When running this line without that batch script specification, then this
will assign no value to .mode. We include .nodes which shows an example of
passing a value that does exist. The non-existent .mode will be returned in the out,
but as an unevaluated, captured list element.
verbose is passed through to the inner setup* helpers. Notably, verbose >= 2
prints the modules' reqdPkgs grouped by module, and verbose >= 3 additionally
prints the dput() of the exact package vector passed to Require::Require (see
setupPackages()).
Inner setup* helpers (each has its own help page; see setup_family
for a one-page overview):
setupPaths(), setupFunctions(), setupSideEffects(),
setupOptions(), setupModules(), setupPackages(),
setupParams(), setupGitIgnore(), setupStudyArea(), setupFiles().
teardownProject() reverses setupProject() and restores the prior
.libPaths() (kept on the output as out$paths$.previousLibPaths).
Also, helpful functions such as user(), machine(), node().