SpaDES
is a collection of R packages designed to provide
a flexible framework for ecological analysis, simulations, forecasting,
and other ecological tasks. At its core, a SpaDES
analysis
revolves around the concept of a “module” - a self-contained unit of
code that performs a specific function and includes both
machine-readable and human-readable metadata. In a typical
SpaDES
project, users often employ multiple modules,
whether they’ve authored them or they’ve been contributed by others.
This scalability introduces several challenges, including managing file
paths (e.g., for data inputs, outputs, figures, modules, and packages),
handling package dependencies, and coordinating the development of
multiple modules simultaneously. While it is possible to develop a
SpaDES
project using methods familiar to the user, the
SpaDES.project
package streamlines this process,
facilitating a clean, reproducible and reusable project setup for any
project. Default settings ensure that project files and packages are
kept isolated, preventing potential conflicts with other projects on the
same computer.
In the following section, we present a range of examples showcasing
the utilization of the setupProject()
function. To ensure a
smooth understanding of these examples, we will begin by discussing the
necessary prerequisites.
To get started, users should have already installed R, RStudio, and
possibly other system-level dependencies.
See vignette here
Now that we’ve established the prerequisites, let’s explore various
examples of how to use the setupProject()
function, ranging
from simple to moderately complex scenarios. We will start by installing
the package.
SpaDES.project
if (!require("SpaDES.project")) {
{install.packages("SpaDES.project", repos = c("predictiveecology.r-universe.dev", getOption("repos")))
require("SpaDES.project")}
}
SpaDES.project
’s Main Function:
setupProject()
For more details, see ?setupProject
, especially for
argument-specific descriptions.
The primary five objectives of this function are:
1. Preparation for
SpaDES.core::simInit
: This functions is designed
to set the stage for a smooth transition into
SpaDES.core::simInit
, i.e., the initiation of a collection
of SpaDES modules. After a out <- setupProject()
call,
the return can be passed directly to
do.call(SpaDES.core::simInitAndSpades, out)
.
2. Simplicity for Beginners, Versatility for Experts: The functions are crafted to be approachable for beginners, offering simplicity, while at the same time, providing the power and flexibility needed to address the demands of highly intricate projects, all within the same structural framework.
3. Handling Package Complexity: These functions address the complexities associated with R package installation and loading, especially when working with modules created by various users.
4. Creating a Standardized Project Structure: They facilitate the establishment of a consistent SpaDES project structure. This uniformity eases the transition from one project to another, regardless of project complexity, promoting a seamless and efficient workflow.
5. Minimizing .GlobalEnv Assignments: An important goal is to encourage best practices by reducing the need for assignments to the .GlobalEnv. This practice fosters clean, maintainable code that remains reproducible even as project complexity grows.
This function performs a variety of tasks to set up a SpaDES project by assigning values to these arguments:
paths
- A standardized set of paths.modules
- Either downloads them from a cloud repository
or user identifies local ones.packages
- To install and/or load (using the
require
argument) packages not already identified in the
metadata of the modules
.params
- For setting parameter values for any of the
modules
.options
- To configure basic R
options
.More specifically, this function orchestrates a series of operations
in the following order: setupPaths
,
setupModules
, setupPackages
,
setupOptions
, setupSideEffects
,
setupParams
, and setupGitIgnore
. This sequence
accomplishes several tasks, including the creation of folder structures,
installation of missing packages listed in either the
packages
or require
arguments, loading of
packages (limited to those specified in the require
argument), configuration of options, and the download or validation of
modules. Additionally, it returns elements that can be directly passed
to simInit
or simInitAndSpades
, specifically
modules, params, paths, times, and any named elements passed to …
(Dots
). If desired, this function can also modify the
.Rprofile
file for the project, ensuring that each time the
project is opened, it adopts specific .libPaths()
.
This sequence allows users to leverage settings (i.e., objects) that are established before others. More on this below (section What Makes this Function so Special).
For user convenience, there are several auxiliary elements, as
described in the help file of the function:
?setupProject()
.
The output from setupProject()
is a list containing
several named elements. These elements are designed to be passed
directly to simInit
to initialize a simList
and, potentially, a spades
call.
In addition to its capability to specify paths, options, and parameters for the entire simulation or workflow, regardless of the modules used, this function offers three distinct advantages that enhance its convenience, detailed below:
1. Versatile Argument Types: Arguments can be provided as both values and/or URLs, allowing flexibility in how information is sourced and integrated into the setup.
2. Sequential Argument Processing: Arguments are sourced in a sequential manner during function evaluation. This sequential processing enables a logical flow and great flexibility in setting up the project.
3. Handling Missing Values: The function accommodates missing values gracefully, making it adaptable to various project scenarios where certain parameters or information may not be yet available.
The first important advantage of setupProject()
is the
type of inputs for arguments it can take, which can be lists of, and/or
paths. For most arguments, the values can be either named lists or
character vectors consisting of strings ending in .R
,
representing urls
to cloud or local R scripts to be
sourced. More specifically, the function arguments paths
,
options
, and params
, can all understand lists
of named values, character vectors, or a mixture by using a list where
named elements are values and unnamed elements are character
strings/vectors. Any unnamed character string/vector will be treated
as a file path. If that file path has an @ symbol, it will be
assumed to be a file that exists on https://github.com. So a user can pass values, or
pointers to remote and/or local paths that, themselves, have values.
The following will set an option as declared (i.e.,
reproducible.useTerra = TRUE
), plus read the local
inst/options.R
file (with relative path), plus download and
read the cloud-hosted
PredictiveEcology/SpaDES.project@transition/inst/options.R
file.
out <- setupProject(
options = list(reproducible.useTerra = TRUE,
"inst/options.R",
"PredictiveEcology/SpaDES.project@transition/inst/options.R"
)
)
attr(out, "projectOptions")
#> optionName
#> 1: reproducible.useTerra
#> 2: fftempdir
#> 3: future.globals.maxSize
#> 4: LandR.assertions
#> 5: LandR.verbose
#> 6: map.dataPath
#> 7: map.overwrite
#> 8: map.tilePath
#> 9: map.maxNumCores
#> 10: rasterMaxMemory
#> 11: reproducible.destinationPath
#> 12: reproducible.useGDAL
#> 13: reproducible.useNewDigestAlgorithm
#> 14: reproducible.inputPaths
#> 15: reproducible.devMode
#> 16: reproducible.overwrite
#> 17: reproducible.useCloud
#> 18: spades.inputPath
#> 19: spades.moduleCodeChecks
#> 20: spades.recoveryMode
#> 21: spades.useRequire
#> optionName
#> newValue
#> 1: TRUE
#> 2: C:/Users/emcintir/AppData/Local/Temp/Rtmp63PEH3/SpaDES.project
#> 3: 1048576000
#> 4: FALSE
#> 5: 1
#> 6: C:/Eliot/GitHub/SpaDES.project/inputs
#> 7: FALSE
#> 8: <call[3]>
#> 9: 2
#> 10: 5e+09
#> 11: C:/Eliot/GitHub/SpaDES.project/inputs
#> 12: FALSE
#> 13: TRUE
#> 14: C:/Eliot/OneDrive - NRCan RNCan/Documents/data
#> 15: TRUE
#> 16: TRUE
#> 17: TRUE
#> 18: C:/Eliot/GitHub/SpaDES.project/outputs
#> 19: FALSE
#> 20: FALSE
#> 21: FALSE
#> newValue
#> oldValue
#> 1:
#> 2:
#> 3:
#> 4:
#> 5:
#> 6:
#> 7:
#> 8:
#> 9:
#> 10:
#> 11:
#> 12:
#> 13:
#> 14:
#> 15:
#> 16: FALSE
#> 17: FALSE
#> 18: C:\\Users\\emcintir\\AppData\\Local\\Temp\\Rtmp63PEH3/SpaDES/inputs
#> 19: <list[5]>
#> 20: 1
#> 21: TRUE
#> oldValue
This approach allows for an organic growth of complexity, e.g., a user begins with only named lists of values, but then as the number of values increases, it may be helpful for clarity to put some in an external file.
The second important advantage of setupProject()
,
briefly mentioned above, is that argument values within each of the
function arguments are sourced in a sequential manner. This implies that
a prior value can be leveraged by a subsequent one within the same
function call. For example, users can set paths, and subsequently,
employ the paths list to configure options that can update or modify
paths as needed. Similarly, users can set times and apply the times list
for specific entries in the parameters (params
)
configuration. For instance, consider the following example, where
projectPath
is included as part of the list assigned to the
paths
argument. This value is then utilized in the
subsequent list element:
# 'modulePaths' is defined based on the previous argument 'projectPath' in the same function call.
setupPaths(paths = list(projectPath = "here",
modulePath = file.path(paths[["projectPath"]], "modules")))
Because of such sequential evaluation, the content with R scripts can
be sequential lists, and also the arguments can themselves be
sequential. This creates a hierarchy specified by the order. In this
fairly rich example, a user can first create a set of default options in
a file, then custom elements, then a list of more changes protected by
an if
(user("userNamer")). In the following example,
setupOptionswill first parse the file (
inst/options.R), maintaining an active named list of options that will be set (lets call that
opts), then it will
modifyList(opts,
list(maxMemory =
6e+10)), i.e., modify the list with a new value, then if the user is "emcintir", it will further modify the
optslist. Finally, because this is
setupOptions, it will call
options(opts)`
with the new list values:
a <- setupOptions(options = list(
"inst/options.R", # many items set because this is a file; has "future.globals.maxSize" = 5.24288e8
future.globals.maxSize = 4e8, # this value overrides first, because it is 2nd
if(user("emcintir")) # an "if" statement is OK --> but must use list after
list(future.globals.maxSize = 6e8) # conditional on user, value is overridden again
))
To facilitate batch submission, users can specify arguments in the
form of argument = value
, even when value
is
missing (i.e., not previously defined in the Environment). This
unconventional approach may not typically work with standard argument
parsing, but it has been specifically designed to function within this
context. For instance, in the following example, you can specify
.mode = .mode
. If R cannot find a valid value for
.mode
on the right-hand side, it will gracefully skip
without raising an error. This feature enables users to source a script
with the following line from a batch script where .mode
is
specified. When running this line separately, without the batch script
specification, .mode
will have no value assigned to it.
Additionally, we include .nodes
as an example of passing a
value that does exist. In the output, the non-existent
.mode
will be returned as an unevaluated, captured list
element:
.nodes <- 2
out <- setupProject(.mode = .mode,
.nodes = .nodes)
out[c(".mode", ".nodes")]
#> $.mode
#> .mode
#>
#> $.nodes
#> [1] 2
setupProject()
and
SpaDES.core::simInit
Call
The output from setupProject()
is a list containing
several named elements. These elements are designed to be passed
directly to simInit
to initialize a simList
and, potentially, a spades
call. Every
setupProject
call can be used with do.call
to
pass arguments to a simInit
or
simInitAndSpades
call.
sim <- setupProject() |> do.call(what = SpaDES.core::simInitAndSpades)
## or
out <- setupProject()
sim = do.call(SpaDES.core::simInitAndSpades, out)
In R, checkout the help files for other functions such as:
?setupPaths
?setupOptions
?setupPackages
?setupModules
?setupGitIgnore
Other helpful functions that are worse going though are
?user
, ?machine
, and ?node
.
setupProject()
This will retrieve the module as specified, and place it in
modulePath
, which is unspecified here, thus taking the
default. This specification will not re-download the file if it exists
locally, with a message indicating this each time. This specification
will also automatically install all missing packages that are in
reqdPkgs
metadata of the module(s). NOTE: we can omit this
package installation step by specifying package = NULL
.
out <- setupProject(
modules = "PredictiveEcology/Biomass_borealDataPrep@development",
packages = NULL # for this example, skip installation of packages; normally don't do this
)
# Initiate a simInit and spades call:
# do.call(SpaDES.core::simInitAndSpades, out)
All paths are relative to the projectPath
, except for 2
specific exceptions: packagePath
is not placed within the
projectPath
and all “scratch” i.e., “temporary” folders,
are place in a subfolder of tempdir()
.
packagePath
is placed in the
tools::R_user_dir(name)
folder, which is project specific,
but does not clutter the projectPath
with hundreds of
folders that can interfere with many user actions especially in Rstudio.
For example, “Find in Files” will take much longer (especially on
Windows) if all the package files are in the projectPath
(there are ways around this, but they require custom setups).
In this example, modulePath
will be set to
file.path(packagePath, modulePath)
; since
packagePath
is unspecified, setupProject
assumes the current directory is the desired root of the project.
library(SpaDES.project)
setupProject(paths = list(modulePath = "myModules"))
#> $modules
#> named character(0)
#>
#> $paths
#> $paths$cachePath
#> [1] "C:/Eliot/GitHub/SpaDES.project/cache"
#>
#> $paths$inputPath
#> [1] "C:/Eliot/GitHub/SpaDES.project/inputs"
#>
#> $paths$modulePath
#> [1] "C:/Eliot/GitHub/SpaDES.project/myModules"
#>
#> $paths$outputPath
#> [1] "C:/Eliot/GitHub/SpaDES.project/outputs"
#>
#> $paths$rasterPath
#> [1] "C:/Users/emcintir/AppData/Local/Temp/Rtmp63PEH3/SpaDES.project/raster"
#>
#> $paths$scratchPath
#> [1] "C:/Users/emcintir/AppData/Local/Temp/Rtmp63PEH3/SpaDES.project"
#>
#> $paths$terraPath
#> [1] "C:/Users/emcintir/AppData/Local/Temp/Rtmp63PEH3/SpaDES.project/terra"
#>
#>
#> $params
#> list()
#>
#> $times
#> $times$start
#> [1] 0
#>
#> $times$end
#> [1] 1
One compelling reason to use setupProject()
is the
availability of numerous default values that can be leveraged, leading
to cleaner and more concise code in many instances. This code sets all
default values for paths
.
library(SpaDES.project)
setupProject()
#> $modules
#> named character(0)
#>
#> $paths
#> $paths$cachePath
#> [1] "C:/Eliot/GitHub/SpaDES.project/cache"
#>
#> $paths$inputPath
#> [1] "C:/Eliot/GitHub/SpaDES.project/inputs"
#>
#> $paths$modulePath
#> [1] "C:/Eliot/GitHub/SpaDES.project/modules"
#>
#> $paths$outputPath
#> [1] "C:/Eliot/GitHub/SpaDES.project/outputs"
#>
#> $paths$rasterPath
#> [1] "C:/Users/emcintir/AppData/Local/Temp/Rtmp63PEH3/SpaDES.project/raster"
#>
#> $paths$scratchPath
#> [1] "C:/Users/emcintir/AppData/Local/Temp/Rtmp63PEH3/SpaDES.project"
#>
#> $paths$terraPath
#> [1] "C:/Users/emcintir/AppData/Local/Temp/Rtmp63PEH3/SpaDES.project/terra"
#>
#>
#> $params
#> list()
#>
#> $times
#> $times$start
#> [1] 0
#>
#> $times$end
#> [1] 1
The argument values within each of the arguments are sourced sequentially. This means that a previous value can be utilized by a subsequent one.
# Use the newValue in `secondValue`
out <- setupProject(
newValue = 1,
secondValue = newValue + 1)
out[c("newValue", "secondValue")]
#> $newValue
#> [1] 1
#>
#> $secondValue
#> [1] 2
The .GlobalEnv
in R
is special because it
is in the search path of any user-created function. This is very
convenient, but also provides the potential for many problems. For
example, a variable could be misspelled in a code chunck, but the code
doesn’t fail because the .GlobalEnv
has the same variable,
spelled correctly.
# Gets the wrong version of the module
githubRepo = "PredictiveEcology/Biomass_core@development" # incorrect one -- used in error
out <- setupProject(
gitHubRepo = "PredictiveEcology/Biomass_core@main", # The correct one --> not used
modules = githubRepo,
packages = NULL # no packages for this demonstration
)
out$modules # shows @development, which is unexpected
#> PredictiveEcology/Biomass_core@development
#> "Biomass_core"
# Initiate a simInit and spades call:
# do.call(SpaDES.core::simInitAndSpades, out)
We can start to increase the complexity of the
setupProject
call gradually. Once we have the module.
out <- setupProject(
options = list(reproducible.useTerra = TRUE),
params = list(Biomass_borealDataPrep = list(.plots = "screen")),
paths = list(modulePath = "m",
projectPath = "."), # within vignette, project dir cannot be changed; user likely will
modules = "PredictiveEcology/Biomass_borealDataPrep@development"
)
#> package 'pemisc' successfully unpacked and MD5 sums checked
#> package 'pkgbuild' successfully unpacked and MD5 sums checked
#>
#> The downloaded binary packages are in
#> C:\Users\emcintir\AppData\Local\Temp\Rtmp63PEH3\downloaded_packages
#> Warning: package 'SpaDES.project' is in use and will not be installed
# Initiate a simInit and spades call:
# do.call(SpaDES.core::simInitAndSpades, out)
For most arguments, the values can be either named lists or character
vectors consisting of strings ending in .R
, representing
urls
to cloud or local R scripts to be sourced. This
flexibility enables users to begin with simple argument values and
gradually introduce greater complexity as a project expands in
scale.
In this first example, options
is set by parsing a file
that is at a remote location. This file will be downloaded first, if it
does not exist locally, then parsed from the local location.
out <- setupProject(
options = c(
"PredictiveEcology/SpaDES.project@transition/inst/options.R"
),
params = list(
Biomass_borealDataPrep = list(.plots = "screen")
),
paths = list(modulePath = "m",
projectPath = "."), # within vignette, project dir cannot be changed; user likely will
modules = "PredictiveEcology/Biomass_borealDataPrep@development"
)
#> package 'pkgbuild' successfully unpacked and MD5 sums checked
#>
#> The downloaded binary packages are in
#> C:\Users\emcintir\AppData\Local\Temp\Rtmp63PEH3\downloaded_packages
#> Warning: package 'SpaDES.project' is in use and will not be installed
# Initiate a simInit and spades call:
# do.call(SpaDES.core::simInitAndSpades, out)
out <- setupProject(
name = "example_5", # puts in a folder with this name
modules = "PredictiveEcology/Biomass_borealDataPrep@development",
sideEffects = "PredictiveEcology/SpaDES.project@transition/inst/sideEffects.R",
# if mode and studyAreaName are not available in the .GlobalEnv, then will use these
defaultDots = list(mode = "development",
studyAreaName = "MB"),
mode = mode, # may not exist in the .GlobalEnv, so `setup*` will use the defaultDots above
studyAreaName = studyAreaName#, # same as previous argument.
# params = list("Biomass_borealDataPrep" = list(.useCache = mode))
)
#> package 'pkgbuild' successfully unpacked and MD5 sums checked
#>
#> The downloaded binary packages are in
#> C:\Users\emcintir\AppData\Local\Temp\Rtmp63PEH3\downloaded_packages
#> Warning: package 'SpaDES.project' is in use and will not be installed
# Initiate a simInit and spades call:
# do.call(SpaDES.core::simInitAndSpades, out)
studyAreaName <- "AB"
out <- setupProject(
paths = list(projectPath = "example_7"),
modules = "PredictiveEcology/Biomass_borealDataPrep@development",
defaultDots = list(mode = "development",
studyAreaName = "MB"),
mode = "development",
studyAreaName = studyAreaName # <----- pass it here, naming it
)
#> package 'pkgbuild' successfully unpacked and MD5 sums checked
#>
#> The downloaded binary packages are in
#> C:\Users\emcintir\AppData\Local\Temp\Rtmp63PEH3\downloaded_packages
#> Warning: package 'SpaDES.project' is in use and will not be installed
# Initiate a simInit and spades call:
# do.call(SpaDES.core::simInitAndSpades, out)
There are several ways to specifying an argument, as we have seen. We can also mix and match these ways:
out <- setupProject(
options = list(
reproducible.useTerra = TRUE, # direct argument # sets one option
"PredictiveEcology/SpaDES.project@transition/inst/options.R", # remote file -- sets many options
system.file("authentication.R", package = "SpaDES.project") # local file -- sets gargle options
)
)
studyArea
argument
?setupStudyArea
for more details. A user can choose to
use the internal setupStudyArea
function, which will use
the gadm
global database. Setting the
studyArea
argument to a list
will trigger the
use of setupStudyArea
.
In this example, we will select an area in northwestern Alberta and
the adjacent northeastern BC, Canada. We use the internal matching,
because we aren’t sure of the exact names: for the top level
(NAME_1
), we use either “Al” or “Brit”, separated by the
“or” symbol, |
. For the second level, we do the same, but
specify using the names in that column. If a user does not know which
values exist, they can not specify NAME_2
, then visualize
the resulting map and click on values, such as:
out <- setupStudyArea(list("Al|Brit", level = 2))
if (interactive()) {
library(quickPlot)
dev() # open a non-RStudio window; can't use RStudio for `clickValues` below
Plot(out)
val <- clickValues()
out <- setupStudyArea(list("Al|Brit", level = 2, NAME_2 = val$NAME_2))
Plot(out, new = TRUE) # need "new = TRUE" to clear previous map
}
# Can pick multiple parts with partial matching
out <- setupStudyArea(list("Al|Brit", level = 2, NAME_2 = "19|18|17|Peace"))
epsg
We can get the studyArea
in a different projection if we
specify the epsg
code.
out <- setupProject(
name = "example_10",
studyArea = list("Al|Brit|Sas", level = 2, epsg = "3005")
)
require
and objects
arguments
Using the require
argument, calls
Require::Require()
, which installs and loads
packages i.e., equivalent to install.packages(...)
and
require(...)
on the named packages). objects
accepts a list of arbitrary objects that will be returned at the end of
this function call. Any code that needs executing will run, and they
will have access to all the packages
and
require
packages, plus any other arguments (e.g.,
paths
or other named arguments in the ...
)
that preceded it in the call.
out <- setupProject(
paths = list(projectPath = "example_11"), # will deduce name of project from projectPath
standAlone = TRUE,
require = c(
"PredictiveEcology/reproducible@development (>= 1.2.16.9017)",
"PredictiveEcology/SpaDES.core@development (>= 1.1.0.9001)"
),
modules = c(
"PredictiveEcology/Biomass_speciesData@master",
"PredictiveEcology/Biomass_borealDataPrep@development",
"PredictiveEcology/Biomass_core@master",
"PredictiveEcology/Biomass_validationKNN@master",
"PredictiveEcology/Biomass_speciesParameters@development"
),
objects = list(
studyAreaLarge = terra::vect(
terra::ext(-598722, -557858,
776827, 837385),
crs = terra::crs("epsg:3978")
),
studyArea = terra::vect(
terra::ext(-598722, -578177,
779252, 809573),
crs = terra::crs("epsg:3978")
)
)
)
# Initiate a simInit and spades call:
# do.call(SpaDES.core::simInitAndSpades, out)
...
Sometimes it is just as easy to pass objects directly to named
arguments, skipping the use of the objects
argument. These
will be returned at the top level of the list, instead of within the
objects
list element. Recent versions of
SpaDES.core
allow ...
to be passed to
simInit
, so it can handle either using objects
(the original way to pass arguments to simInit
) or just
arbitrarily named arguments.
out <- setupProject(
paths = list(projectPath = "example_12"), # will deduce name of project from projectPath
standAlone = TRUE,
require = c(
"PredictiveEcology/reproducible@development (>= 1.2.16.9017)",
"PredictiveEcology/SpaDES.core@development (>= 1.1.0.9001)"
),
modules = c(
"PredictiveEcology/Biomass_speciesData@master",
"PredictiveEcology/Biomass_borealDataPrep@development",
"PredictiveEcology/Biomass_core@master",
"PredictiveEcology/Biomass_validationKNN@master",
"PredictiveEcology/Biomass_speciesParameters@development"
),
studyAreaLarge = terra::vect(
terra::ext(-598722, -557858, 776827, 837385),
crs = terra::crs("epsg:3978")
),
studyArea = terra::vect(
terra::ext(-598722, -578177, 779252, 809573),
crs = terra::crs("epsg:3978")
)
)
# Initiate a simInit and spades call:
# do.call(SpaDES.core::simInitAndSpades, out)
out <- setupProject(
name = "example_13",
packages = "terra",
updateRprofile = TRUE
)