---
title: "Modeling Options"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Modeling Options}
  %\VignetteEngine{knitr::knitr}
  %\VignetteEncoding{UTF-8}
---

<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/css/all.min.css" />


### <i class="fa-solid fa-address-card"></i> Modeling options 

`biomod2` is a wrapper calling for single models functions from external packages. Modeling options are automatically retrieved from these packages, allowing the use of all arguments taken into account by these functions.  
**Default** parameter values are unmodified and often non-adapted to species distribution modeling in general, and to specific dataset in particular. **Bigboss** options provided by biomod2 team tend to correct at least the species distribution modeling aspect, while **tuned** options allow to try and find more appropriate parameterization for user data through caret package mainly. The user can also defines its own modeling options parameterization (**user.defined**).

Note that only binary data type and associated models are allowed currently, but the package structure has been changed to enable the addition of new data types in near future, such as absolute or relative abundances.

In the dataset [`ModelsTable`](../reference/ModelsTable.html), all the different algorithms are listed with their packages and functions :

```R
           model   type      package         func       train
1            ANN binary         nnet         nnet      avNNet
2            CTA binary        rpart        rpart       rpart
3            FDA binary          mda          fda         fda
4            GAM binary          gam          gam    gamLoess
5            GAM binary         mgcv          bam         bam
6            GAM binary         mgcv          gam         gam
7            GBM binary          gbm          gbm         gbm
8            GLM binary        stats          glm         glm
9           MARS binary        earth        earth       earth
10        MAXENT binary       MAXENT       MAXENT ENMevaluate
11        MAXNET binary       maxnet       maxnet      maxnet
12            RF binary randomForest randomForest          rf
13           SRE binary      biomod2       bm_SRE      bm_SRE
14       XGBOOST binary      xgboost      xgboost     xgbTree
```
<br/>

*All the examples are made with the data of the package.* <br/>
*For the beginning of the code, see the [main functions vignette](examples_1_mainFunctions.html).*

<br/>


### <i class="fa-solid fa-database"></i> Default options 

`biomod2` has a set of `default` options, matching most of the time the algorithms' default values, but with some minor modifications to allow the [`BIOMOD_Modeling`](../reference/BIOMOD_Modeling.html) function to run smoothly. <br/>

*Please be aware that this strategy can often lead to bad models or even some errors.*

```R
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
                                    modeling.id = 'Example',
                                    models = c('RF', 'GLM'),
                                    CV.strategy = 'random',
                                    CV.nb.rep = 2,
                                    CV.perc = 0.8,
                                    OPT.strategy = 'default',
                                    metric.eval = c('TSS','ROC'),
                                    var.import = 2,
                                    seed.val = 42)
```
You can retrieve the models options with [`get_options`](../reference/getters.out.html)

```R
get_options(myBiomodModelOut)
```
<br/>


### <i class="fa-solid fa-hand-fist"></i> Bigboss options 

The `bigboss` set of parameters is available in the dataset [`OptionsBigboss`](../reference/OptionsBigboss.html). This set should give better results than the default set and will be continued to be optimized by the `biomod2` Team. <br/>

*Keep in mind that this is something general and dependent of your case, the results can be not better than the default set.*

```R
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
                                    modeling.id = 'Example',
                                    models = c('RF', 'GLM'),
                                    CV.strategy = 'random',
                                    CV.nb.rep = 2,
                                    CV.perc = 0.8,
                                    OPT.strategy = 'bigboss',
                                    metric.eval = c('TSS','ROC'),
                                    var.import = 2,
                                    seed.val = 42)
```
<br/>


### <i class="fa-solid fa-gears"></i> Tuned options

With `tuned` options, some algorithms can be trained over your dataset, and optimized parameters are returned to be used within the [`BIOMOD_Modeling`](../reference/BIOMOD_Modeling.html) function. This tuning is mostly based upon the [`caret`](http://topepo.github.io/caret/) package which calls a specific function to tune each algorithm (see column `train` in `ModelsTable`). As exception, the `ENMevaluate` function of the [`ENMeval`](https://jamiemkass.github.io/ENMeval/) package is called for `MAXENT` and the `biomod2` team wrote a special function for `SRE`.

Here is the list of the parameters that can be tuned :

| algorithm     | parameters                                                                                 |
| --------------| :------------------------------------------------------------------------------------------|
| ANN           | `size`, `decay`, `bag`                                                                     |
| FDA           | `degree`, `nprune`                                                                         |
| GAM           | `select`, `method`                                                                         |
| GBM           | `n.trees`, `interaction.depth`, `shrinkage`, `n.minobsinnode`                              |
| MARS          | `degree`, `nprune`                                                                         |
| RF            | `mtry`                                                                                     |
| SRE           | `quant`                                                                                    |
| XGBOOST       | `nrounds`, `max_depth`, `eta`, `gamma`, `colsampl_bytree`, `min_child_weight`, `subsample` |


For almost every algorithm (except `MAXENT`, `MAXNET` and `SRE`), you can choose to optimize the formula by setting `do.formula = TRUE`. The optimized formula will be chosen between the different type (`simple`, `quadratic`, `polynomial`, `s_smoother`) and for different interaction level.    
In the same way, a variable selection can be run for `GLM` and `GAM` if `do.stepAIC = TRUE` (respectively, `MASS::stepAIC` and `gam::step.Gam`). <br/>

*More information about the training can be found in the documentation of the [`bm_Tuning`](../reference/bm_tuning.html) function.*


```R
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
                                    modeling.id = 'Example',
                                    models = c('RF','SRE'),
                                    CV.strategy = 'random',
                                    CV.nb.rep = 2,
                                    CV.perc = 0.8,
                                    OPT.strategy = 'tuned',
                                    metric.eval = c('TSS','ROC'),
                                    var.import = 2,
                                    seed.val = 42)

print(get_options(myBiomodModelOut), dataset = '_allData_RUN1')                                    
```

<br/>


### <i class="fa-solid fa-pen"></i> User defined

The `user.defined` option allows you to adjust yourself the parameters of all the algorithms. 

*Note that you can find information about the parameters of MAXENT within the documentation of the [`bm_ModelingOptions`](../reference/bm_ModelingOptions.html) function.*

<br/>


**Example :** 

- You want to run 3 models : `RF`, `GLM` and `MARS.`
- You have your `BiomodData` and you set your [cross-validation table](vignette_crossValidation.html). 
- Globally, you want to use the `bigboss` parameters as a base. 


```R
myCVtable <- bm_CrossValidation(bm.format = myBiomodData,
                                strategy = "random",
                                nb.rep = 2,
                                perc = 0.8)


myOpt  <- bm_ModelingOptions(data.type = 'binary',
                             models = c('RF','GLM','MARS'),
                             strategy = 'bigboss',
                             bm.format = myBiomodData, 
                             calib.lines = myCVtable)

print(myOpt)
```


- You decide to tune the parameters for `RF` and you want to change the formula for `GLM`.


```R
tuned.rf <- bm_Tuning(model = 'RF',
                      tuning.fun = 'rf', ## see in ModelsTable
                      do.formula = TRUE,
                      bm.options = myOpt@options$RF.binary.randomForest.randomForest,
                      bm.format = myBiomodData, 
                      calib.lines = myCVtable)

form.GLM <- bm_MakeFormula(resp.name = myBiomodData@sp.name,
                           expl.var = head(myBiomodData@data.env.var),
                           type = 'simple',
                           interaction.level = 0)
                     
user.GLM <- list('_allData_RUN1' = list(formula = form.GLM),
                 '_allData_RUN2' = list(formula = form.GLM))

```


- As you have all the parameters ready, you can set your [`BIOMOD.models.options`](../reference/BIOMOD.models.options.html) object and run the [`BIOMOD_Modeling`](../reference/BIOMOD_Modeling.html) function.


```R
## Gather in one list
## Models names can be found in OptionsBigboss@models
user.val <- list( RF.binary.randomForest.randomForest = tuned.rf,
                  GLM.binary.stats.glm= user.GLM)

myOpt <- bm_ModelingOptions(data.type = 'binary',
                            models = c('RF','GLM','MARS'),
                            strategy = "user.defined",
                            user.val = user.val,
                            user.base = "bigboss",
                            bm.format = myBiomodData, 
                            calib.lines = myCVtable)
                            
print(myOpt)
print(myOpt, dataset = '_allData_RUN1')
print(myOpt, dataset = '_allData_RUN2')

myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
                                modeling.id = 'Example',
                                models = c('RF','GLM','MARS'),
                                CV.strategy = 'user.defined',
                                CV.user.table = myCVtable,
                                OPT.user = myOpt,
                                metric.eval = c('TSS','ROC'),
                                var.import = 2)
```

*You can find more examples in the [Secondary functions vignette](examples_2_secundaryFunctions.html).*



<br/>