Sets up a new or existing SpaDES project

setupProject calls a sequence of functions in this order: setupOptions (first time), setupPaths, setupModules, setupPackages, setupSideEffects, setupOptions (second time), setupParams, setupGitIgnore, and setupRestart.

This sequence will create folder structures, install missing packages from those listed in either the packages, require arguments or in the modules reqdPkgs fields, load packages (only those in the require argument), set options, download or confirm the existence of modules. It will also return elements that can be passed directly to simInit or simInitAndSpades, specifically, modules, params, paths, times, and any named elements passed to .... This function will also , if desired, change the .Rprofile file for this project so that every time the project is opened, it has a specific .libPaths().

There are a number of convenience elements described in the section below. See Details. Because of this sequence, users can take advantage of settings (i.e., objects) that happen (are created) before others. For example, users can set paths then use the paths list to set options that will can update/change paths, or set times and use the times list for certain entries in params.

setupProject(
  name,
  paths,
  modules,
  packages,
  times,
  options,
  params,
  sideEffects,
  config,
  require = NULL,
  studyArea = NULL,
  Restart = getOption("SpaDES.project.Restart", FALSE),
  useGit = getOption("SpaDES.project.useGit", FALSE),
  setLinuxBinaryRepo = TRUE,
  standAlone = TRUE,
  libPaths = NULL,
  updateRprofile = getOption("Require.updateRprofile", FALSE),
  overwrite = FALSE,
  verbose = getOption("Require.verbose", 1L),
  defaultDots,
  dots,
  ...
)

Arguments

name: Optional. If supplied, the name of the project. If not supplied, an attempt will be made to extract the name from the paths[["projectPath"]]. If this is a GitHub project, then it should indicate the full Github repository and branch name, e.g., "PredictiveEcology/WBI_forecasts@ChubatyPubNum12"
paths: a list with named elements, specifically, modulePath, projectPath, packagePath and all others that are in SpaDES.core::setPaths() (i.e., inputPath, outputPath, scratchPath, cachePath, rasterTmpDir). Each of these has a sensible default, which will be overridden but any user supplied values. See setup.
modules: a character string of modules to pass to getModule. These should be one of: simple name (e.g., fireSense) which will be searched for locally in the paths[["modulePath"]]; or a GitHub repo with branch (GitHubAccount/Repo@branch e.g., "PredictiveEcology/Biomass_core@development"); or a character vector that identifies one or more (not optional file extension) .R file(s) (local or GitHub) to parse that will produce a character vector assigned to the name "modules". If the entire project is a git repository, then it will not try to re-get these modules; instead it will rely on the user managing their git status outside of this function. See setup.
packages: Optional. A vector of packages that must exist in the libPaths. This will be passed to Require::Install, i.e., these will be installed, but not attached to the search path. See also the require argument. To force skip of package installation (without assessing modules), set packages = NULL
times: Optional. This will be returned if supplied; if supplied, the values can be used in e.g., params, e.g., params = list(mod = list(startTime = times$start)). See help for SpaDES.core::simInit.
options: Optional. Either a named list to be passed to options or a character vector indicating one or more file(s) to source, in the order provided. These will be parsed locally (not the .GlobalEnv), so they will not create globally accessible objects. NOTE: options is run 2x within setupProject, once before setupPaths and once after setupPackages. This occurs because many packages use options for their behaviour (need them set before e.g., Require::require is run; but many packages also change options at startup. See details. See setup.
params: Optional. Similar to options, however, this named list will be returned, i.e., there are no side effects. See setup.
sideEffects: Optional. This can be an expression or one or more file names or a code chunk surrounded by {...}. If a non-text file name is specified (e.g., not .txt or .R currently), these files will simply be downloaded, using their relative path as specified in the github notation. They will be downloaded or accessed locally at that relative path. If these file names represent scripts (*.txt or .R), this/these will be parsed and evaluated, but nothing is returned (i.e., any assigned objects are not returned). This is intended to be used for operations like cloud authentication or configuration functions that are run for their side effects only.
config: Still experimental linkage to the SpaDES.config package. Currently not working.
require: Optional. A character vector of packages to install and attach (with Require::Require). These will be installed and attached at the start of setupProject so that a user can use these during setupProject. See setup
studyArea: Optional. If a list, it will be passed to geodata::gadm. To specify a country other than the default "CAN", the list must have a named element, "country". All other named elements will be passed to gadm. 2 additional named elements can be passed for convenience, subregion = "...", which will be grepped with the column NAME_1, and epsg = "...", so a user can pass an epsg.io code to reproject the studyArea. See examples.
Restart: If the projectPath is not the current path, and the session is in RStudio, and interactive, it will create an RStudio Project file (and .Rproj.user folder), restart with a new Rstudio session with that new project and with a root path (i.e. working directory) set to projectPath. Default is FALSE, and no RStudio Project is created.
useGit: A logical. If TRUE, it will use git clone and git checkout to get and change branch for each module, according to its specification in modules. Otherwise it will download modules with getModules. NOTE: CREATING A GIT REPOSITORY AT THE PROJECT LEVEL AND SETTING MODULES AS GIT SUBMODULES IS NOT YET IMPLEMENTED. IT IS FINE IF THE PROJECT HAS BEEN MANUALLY SET UP TO BE A GIT REPOSITORY WITH SUBMODULES: THIS FUNCTION WILL ONLY EVALUTE PATHS. This can be set with the option(SpaDES.project.useGit = xxx).
setLinuxBinaryRepo: Logical. Should the binary RStudio Package Manager be used on Linux (ignored if Windows)
standAlone: A logical. Passed to Require::standAlone. This keeps all packages installed in a project-level library, if TRUE. Default is TRUE.
libPaths: Deprecated. Use paths = list(packagePath = ...).
updateRprofile: Logical. Should the paths$packagePath be set in the .Rprofile file for this project. Note: if paths$packagePath is within the tempdir(), then there will be a warning, indicating this won't persist. If the user is using Rstudio and the paths$projectPath is not the root of the current Rstudio project, then a warning will be given, indicating the .Rprofile may not be read upon restart.
overwrite: Logical vector or character vector, however, only getModule will respond to a vector of values. If length-one TRUE, then all files that were previously downloaded will be overwritten throughout the sequence of setupProject. If a vector of logical or character, these will be passed to getModule: only the named modules will be overwritten or the logical vector of the modules. NOTE: if a vector, no other file specified anywhere in setupProject will be overwritten except a module that/those names, because only setupModules is currently responsive to a vector. To have fine grained control, a user can just manually delete a file, then rerun.
verbose: Numeric or logical indicating how verbose should the function be. If -1 or -2, then as little verbosity as possible. If 0 or FALSE, then minimal outputs; if 1 or TRUE, more outputs; 2 even more. NOTE: in Require function, when verbose >= 2, the return object will have an attribute: attr(.., "Require") which has lots of information about the processes of the installs.
defaultDots: A named list of any arbitrary R objects. These can be supplied to give default values to objects that are otherwise passed in with the ..., i.e., not specifically named for these setup* functions. If named objects are supplied as top-level arguments, then the defaultDots will be overridden. This can be particularly useful if the arguments passed to ... do not always exist, but rely on external e.g., batch processing to optionally fill them. See examples.
dots: Any other named objects passed as a list a user might want for other elements.
...: further named arguments that acts like objects, but a different way to specify them. These can be anything. The general use case is to create the objects that are would be passed to SpaDES.core::simInit, or SpaDES.core::simInitAndSpades, (e.g. studyAreaName or objects) or additional objects to be passed to the simulation (in older versions of SpaDES.core, these were passed as a named list to the objects argument). Order matters. These are sequentially evaluated, and also any arguments that are specified before the named arguments e.g., name, paths, will be evaluated prior to any of the named arguments, i.e., "at the start" of the setupProject. If placed after the first named argument, then they will be evaluated at the end of the setupProject, so can access all the packages, objects, etc.

Value

setupProject will return a named list with elements modules, paths, params, and times. The goal of this list is to contain list elements that can be passed directly to simInit.

It will also append all elements passed by the user in the .... This list can be passed directly to SpaDES.core::simInit() or SpaDES.core::simInitAndSpades() using a do.call(). See example.

NOTE: both projectPath and packagePath will be omitted in the paths list as they are used to set current directory (found with getwd()) and .libPaths()[1], but are not accepted by simInit. setupPaths will still return these two paths as its outputs are not expected to be passed directly to simInit (unlike setupProject outputs).

Objective

The overarching objectives for these functions are:

To prepare what is needed for simInit.
To help a user eliminate virtually all assignments to the .GlobalEnv, as these create and encourage spaghetti code that becomes unreproducible as the project increases in complexity.
Be very simple for beginners, but powerful enough to expand to almost any needs of arbitrarily complex projects, using the same structure
Deal with the complexities of R package installation and loading when working with modules that may have been created by many users
Create a common SpaDES project structure, allowing easy transition from one project to another, regardless of complexity.

Convenience elements

Sequential evaluation

Throughout these functions, efforts have been made to implement sequential evaluation, within files and within lists. This means that a user can use the values from an upstream element in the list. For example, the following where projectPath is part of the list that will be assigned to the paths argument and it is then used in the subsequent list element is valid:

setupPaths(paths = list(projectPath = "here",

                        modulePath = file.path(paths[["projectPath"]], "modules")))

Because of such sequential evaluation, paths, options, and params files can be sequential lists that have impose a hierarchy specified by the order. For example, a user can first create a list of default options, then several lists of user-desired options behind an if (user("emcintir")) block that add new or override existing elements, followed by machine specific values, such as paths.

setupOptions(

  maxMemory <- 5e+9 # if (grepl("LandWeb", runName)) 5e+12 else 5e+9



  # Example -- Use any arbitrary object that can be passed in the `...` of `setupOptions`

  #  or `setupProject`

  if (.mode == "development") {

     list(test = 2)

  }

  if (machine("A127")) {

    list(test = 3)

  }

)

Values and/or files

The arguments, paths, options, and params, can all understand lists of named values, character vectors, or a mixture by using a list where named elements are values and unnamed elements are character strings/vectors. Any unnamed character string/vector will be treated as a file path. If that file path has an @ symbol, it will be assumed to be a file that exists on a GitHub repository in https://github.com. So a user can pass values, or pointers to remote and/or local paths that themselves have values.

The following will set an option as declared, plus read the local file (with relative path), plus download and read the cloud-hosted file.

setupProject(

   options = list(reproducible.useTerra = TRUE,

                  "inst/options.R",

                  "PredictiveEcology/SpaDES.project@transition/inst/options.R")

                 )

   )

This approach allows for an organic growth of complexity, e.g., a user begins with only named lists of values, but then as the number of values increases, it may be helpful to put some in an external file.

NOTE: if the GitHub repository is private the user must configure their GitHub token by setting the GITHUB_PAT environment variable -- unfortunately, the usethis approach to setting the token will not work at this moment.

Specifying `paths`, `options`, `params`

If paths, options, and/or params are a character string or character vector (or part of an unnamed list element) the string(s) will be interpreted as files to parse. These files should contain R code that specifies named lists, where the names are one or more paths, options, or are module names, each with a named list of parameters for that named module. This last named list for params follows the convention used for the params argument in simInit(..., params = ).

These files can use paths, times, plus any previous list in the sequence of params or options specified. Any functions that are used must be available, e.g., prefixed Require::normPath if the package has not been loaded (as recommended).

If passing a file to options, it should not set options() explicitly; only create named lists. This enables options checking/validating to occur within setupOptions and setupParams. A simplest case would be a file with this: opts <- list(reproducible.destinationPath = "~/destPath").

All named lists will be parsed into their own environment, and then will be sequentially evaluated (i.e., subsequent lists will have access to previous lists), with each named elements setting or replacing the previously named element of the same name, creating a single list. This final list will be assigned to, e.g., options() inside setupOptions.

Because each list is parsed separately, they to not need to be assigned objects; if they are, the object name can be any name, even if similar to another object's name used to built the same argument's (i.e. paths, params, options) final list. Hence, in an file to passed to options, instead of incrementing the list as: a <- list(optA = 1) b <- append(a, list(optB = 2)) c <- append(b, list(optC = 2.5)) d <- append(c, list(optD = 3)) one can do: a <- list(optA = 1) a <- list(optB = 2) c <- list(optC = 2.5) list(optD = 3)

NOTE: only atomics (i.e., character, numeric, etc.), named lists, or either of these that are protected by 1 level of "if" are parsed. This will not work, therefore, for other side-effect elements, like authenticating with a cloud service.

Several helper functions exist within SpaDES.project that may be useful, such as user(...), machine(...)

Can hard code arguments that may be missing

To allow for batch submission, a user can specify code argument = value even if value is missing. This type of specification will not work in normal parsing of arguments, but it is designed to work here. In the next example, .mode = .mode can be specified, but if R cannot find .mode for the right hand side, it will just skip with no error. Thus a user can source a script with the following line from batch script where .mode is specified. When running this line without that batch script specification, then this will assign no value to .mode. We include .nodes which shows an example of passing a value that does exist. The non-existent .mode will be returned in the out, but as an unevaluated, captured list element.

.nodes <- 2

out <- setupProject(.mode = .mode,

                    .nodes = .nodes,

                    options = "inst/options.R"

                    )

Examples

## For more examples:
vignette("i-getting-started", package = "SpaDES.project")
#> Warning: vignette 'i-getting-started' not found

library(SpaDES.project)

origDir <- getwd()
          tmpdir <- Require::tempdir2() # for testing tempdir2 is better
# \dontshow{
if (is.null(getOption("repos"))) {
  options(repos = c(CRAN = "https://cloud.r-project.org"))
  }
  setwd(tmpdir)
# }
 ## simplest case; just creates folders
out <- setupProject(
  paths = list(projectPath = ".") #
)
#> setting up paths ...
#> Copying SpaDES.project, data.table, Require, rprojroot packages to paths$packagePath (C:/Users/emcintir/AppData/Roaming/R/data/R/Require/packages/x86_64-w64-mingw32/4.3)
#> Setting:
#>   options(
#>     reproducible.cachePath = 'C:/Users/emcintir/AppData/Local/Temp/RtmpOovltm/Require/cache'
#>     spades.inputPath = 'C:/Users/emcintir/AppData/Local/Temp/RtmpOovltm/Require/inputs'
#>     spades.outputPath = 'C:/Users/emcintir/AppData/Local/Temp/RtmpOovltm/Require/outputs'
#>     spades.modulePath = 'C:/Users/emcintir/AppData/Local/Temp/RtmpOovltm/Require/modules'
#>     spades.scratchPath = 'C:/Users/emcintir/AppData/Local/Temp/RtmpOovltm/Require'
#>   )
#>   done setting up paths
#> no packages to set up
#> .libPaths() are: C:/Users/emcintir/AppData/Roaming/R/data/R/Require/packages/x86_64-w64-mingw32/4.3, C:/Program Files/R/R-4.3.1/library
setwd(origDir)