--- title: "Overview: Creating FFTs with FFTrees" author: "Nathaniel D. Phillips and Hansjörg Neth" date: "`r Sys.Date()`" output: rmarkdown::html_vignette bibliography: fft.bib csl: apa.csl vignette: > %\VignetteIndexEntry{Overview: Creating FFTs with FFTrees} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, echo = FALSE} knitr::opts_chunk$set(collapse = FALSE, comment = "#>", prompt = FALSE, tidy = FALSE, echo = TRUE, message = FALSE, warning = FALSE, # Default figure options: dpi = 100, fig.align = 'center', fig.height = 6.0, fig.width = 6.5, out.width = "580px") ``` ```{r pkgs, echo = FALSE, message = FALSE, results = 'hide'} library(FFTrees) library(dplyr) library(testthat) library(tidyselect) library(magrittr) library(knitr) ``` ```{r urls, echo = FALSE, message = FALSE, results = 'hide'} # URLs: url_pkg_CRAN <- "https://CRAN.R-project.org/package=FFTrees" url_pkg_GitHub <- "https://github.com/ndphillips/FFTrees" url_pkg_issues <- "https://github.com/ndphillips/FFTrees/issues" url_JDM_issue <- "https://journal.sjdm.org/vol12.4.html" url_JDM_html <- "https://journal.sjdm.org/17/17217/jdm17217.html" url_JDM_pdf <- "https://journal.sjdm.org/17/17217/jdm17217.pdf" url_JDM_doi <- "https://doi.org/10.1017/S1930297500006239" email_contact <- "Nathaniel.D.Phillips.is@gmail.com" url_contact <- "https://www.linkedin.com/in/nathanieldphillips/" ``` <!-- Brief intro: --> The R package **FFTrees** [@phillips2017FFTrees; @FFTrees-pkg] makes it easy to create, visualize, and evaluate fast-and-frugal decision trees\ (FFTs). FFTs are simple and transparent decision algorithms for solving binary classification problems in an\ effective and efficient fashion. ## Fast-and-Frugal Trees (FFTs) <!-- Defining FFTs: --> A _fast-and-frugal tree_ (FFT) [@martignon2003naive] is a set of hierarchical rules for solving binary classification tasks based on very little pieces of information (usually using\ 4 or fewer cues). In contrast to more complex decision trees, each node of an\ FFT has exactly two branches. A branch can either contain another cue (i.e., ask another question) or lead to an exit (i.e., yield a decision or prediction outcome). Each non-final node of an\ FFT has one exit branch and the final node has two exit branches. <!-- Characteristics and benefits of FFTs: --> FFTs are simple and effective decision strategies that use minimal information for making decisions in binary classification problems [see @gigerenzer1999fast;@gigerenzer1999good]. FFTs are often preferable to more complex decision strategies (such as logistic regression, LR) because they rarely over-fit data [@gigerenzer2009homo] and are easy to interpret, implement, and communicate in real-world settings [@marewski2012heuristic]. FFTs have been designed to tackle many real world tasks from making fast decisions in emergency rooms [@green1997alters] to detecting depression [@jenny2013simple]. <!-- Emphasize transparency: --> Whereas their performance and success are empirical questions, a key theoretical advantage of FFTs is their _transparency_ to decision makers and anyone aiming to understand and evaluate the details of an algorithm. In the words of @burton2020, "human users could interpret, justify, control, and interact with a fast-and-frugal decision aid" (p.\ 229). ## Using the FFTrees package The **FFTrees** package makes it easy to produce, display, and evaluate FFTs [@phillips2017FFTrees]. The package's main function is `FFTrees()` which takes formula\ `formula` and dataset\ `data` arguments and returns several FFTs that attempt to classify training cases into criterion classes. The FFTs created can then be used to predict new data to cross-validate their performance. Here is an example of using the main `FFTrees()` function to fit FFTs to `heart.train` data: ```{r fft-example, message = FALSE, results = 'hide'} # Create a fast-and-frugal tree (FFT) predicting heart disease: heart.fft <- FFTrees(formula = diagnosis ~., data = heart.train, data.test = heart.test, main = "Heart Disease", decision.labels = c("Healthy", "Diseased")) ``` The resulting `FFTrees` object `heart.fft` contains 7\ FFTs that were fitted to the `heart.test` data. To evaluate a tree's predictive performance, we compare its predictions for the un-trained `heart.test` data with their true criterion values. Here is how we can apply the best training FFT to the `heart.test` data: ```{r fig-1, fig.width = 6.5, fig.height = 6.0, out.width = "600px", fig.align = 'center', fig.cap = "A fast-and-frugal tree (FFT) to predict heart disease status."} # Visualize predictive performance: plot(heart.fft, data = "test") ``` ## Getting started To start using the **FFTrees** package, we recommend studying the [Tutorial: Creating FFTs for heart disease](FFTrees_heart.html). The tutorial illustrates the basics steps of creating, visualizing, and evaluating fast-and-frugal trees (FFTs). The scientific background of FFTs and the development of **FFTrees** are described in @phillips2017FFTrees (doi\ [10.1017/S1930297500006239](`r url_JDM_doi`) | [html](`r url_JDM_html`) | [PDF](`r url_JDM_pdf`)). The following vignettes provide details on related topics and corresponding examples. ### Vignettes <!-- Table of all vignettes: --> Here is a complete list of the vignettes available in the **FFTrees** package: | | Vignette | Description | |--:|:------------------------------|:-------------------------------------------------| | | [Main guide: FFTrees overview](guide.html) | An overview of the **FFTrees** package | | 1 | [Tutorial: FFTs for heart disease](FFTrees_heart.html) | An example of using `FFTrees()` to model heart disease diagnosis | | 2 | [Accuracy statistics](FFTrees_accuracy_statistics.html) | Definitions of accuracy statistics used throughout the package | | 3 | [Creating FFTs with FFTrees()](FFTrees_function.html) | Details on the main `FFTrees()` function | | 4 | [Manually specifying FFTs](FFTrees_mytree.html) | How to directly create FFTs without using the built-in algorithms | | 5 | [Visualizing FFTs](FFTrees_plot.html) | Plotting `FFTrees` objects, from full trees to icon arrays | | 6 | [Examples of FFTs](FFTrees_examples.html) | Examples of FFTs from different datasets contained in the package | ### Datasets The **FFTrees** package contains several datasets ---\ mostly from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/)\ --- that allow you to address interesting questions when exploring FFTs: - `blood` -- Which people donate blood? [source](https://archive.ics.uci.edu/ml/datasets/Blood+Transfusion+Service+Center) - `breastcancer` -- Which patients suffer from breast cancer? [source](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)) - `car` -- Which cars are acceptable? [source](http://archive.ics.uci.edu/ml/datasets/Car+Evaluation) - `contraceptive` -- Which factors determine whether women use contraceptives? [source](https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice) - `creditapproval` -- Which factors determine a creditcard approval? [source](https://archive.ics.uci.edu/ml/datasets/Credit+Approval) - `fertility` -- Which factors predict a fertile sperm concentration? [source](https://archive.ics.uci.edu/ml/datasets/Fertility) - `forestfires` -- Which environmental conditions predict forest fires? [source](https://archive.ics.uci.edu/ml/datasets/Forest+Fires) - `heartdisease` -- Which patients suffer from heart disease? [source](https://archive.ics.uci.edu/ml/datasets/Heart+Disease) - `iris.v` -- Which iris belongs to the class "virginica"? [source](https://archive.ics.uci.edu/ml/datasets/Forest+Fires) - `mushrooms` -- Which features predict poisonous mushrooms? [source](https://archive.ics.uci.edu/ml/datasets/Mushroom) - `sonar` -- Did a sonar signal bounce off a metal cylinder (or a rock)? [source](https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks)) - `titanic` -- Which passengers survived the Titanic? [source](https://www.encyclopedia-titanica.org) - `voting` -- How did U.S. congressmen vote in 1984? [source](https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records) - `wine` -- What determines ratings of wine quality? [source](https://archive.ics.uci.edu/ml/datasets/Wine) #### Details about the datasets {-} When preparing data to be predicted by FFTs, we usually distinguish between several (categorical or numeric) predictors and a (binary) criterion variable. **Table\ 1** provides basic information on the datasets included in the **FFTrees** package (see their documentation for additional details). **Table\ 1:** Key information on the datasets included in **FFTrees**. ```{r dataframe for overview data, echo = FALSE} ## Preparations for applying the describe_data() function to all data sets ## When new data sets are included, add their info so that they will also be shown in the vignette-table! # List all data sets: data_list <- list(blood, breastcancer, car, contraceptive, creditapproval, fertility, forestfires, heartdisease, iris.v, mushrooms, sonar, titanic, voting, wine) # Vector with all names of the data sets: data_names <- c("blood", "breastcancer", "car", "contraceptive", "creditapproval", "fertility", "forestfires", "heartdisease", "iris.v", "mushrooms", "sonar", "titanic", "voting", "wine") # Vector with all criterion names: criterion_names <- c("donation.crit", "diagnosis", "acceptability", "cont.crit", "crit", "diagnosis","fire.crit", "diagnosis", "virginica", "poisonous", "mine.crit", "survived", "party.crit", "type") # Vector with criterion values of interest: baseline_values <- c(1, TRUE, "acc", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, "red") # Use combined lists/vectors and apply describe_data() to each: result_list <- mapply(describe_data, data = data_list, data_name = data_names, criterion_name = criterion_names, baseline_value = baseline_values, SIMPLIFY = FALSE) # Combine results in df: combined_result <- do.call(rbind, result_list) # Round baseline and NA pct values for brevity: combined_result$Baseline_pct<- round(combined_result$Baseline_pct, 1) combined_result$NAs_pct<- round(combined_result$NAs_pct, 2) # Rename columns: colnames(combined_result) <- c("Dataset name", "Number of cases", "Criterion name", "Baseline (`TRUE`,\\ in\\ %)", "Number of predictors", "Number of NAs", "NAs (in\\ %)") # Render the table from the data frame # use as many items per page as we have data sets # redefine column names as we like them: knitr::kable(combined_result, format = "html") ``` ## Citing **FFTrees** We had a lot of fun creating **FFTrees** and hope you like it too! For an accessible introduction to FFTs, we recommend reading our article in the journal _Judgment and Decision Making_ ([2017](`r url_JDM_doi`)), entitled _FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees_ (available in [html](`r url_JDM_html`) | [PDF](`r url_JDM_pdf`)\ ). **Citation** (in APA format): - Phillips, N. D., Neth, H., Woike, J. K. & Gaissmaier, W. (2017). FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees. _Judgment and Decision Making_, _12_ (4), 344–368. doi\ [10.1017/S1930297500006239](`r url_JDM_doi`) <!-- PDF available at `r url_JDM_pdf` --> When using **FFTrees** in your own work, please cite our article and spread the word, so that we can continue developing the package. **BibTeX Citation**: ```{r bibtex-citation, eval = FALSE, highlight = FALSE} @article{FFTrees, title = {FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees}, author = {Phillips, Nathaniel D and Neth, Hansjörg and Woike, Jan K and Gaissmaier, Wolfgang}, year = 2017, journal = {Judgment and Decision Making}, volume = 12, number = 4, pages = {344--368}, url = {https://journal.sjdm.org/17/17217/jdm17217.pdf}, doi = {10.1017/S1930297500006239} } ``` ## Contact - The latest release of **FFTrees** is available at [`r url_pkg_CRAN`](`r url_pkg_CRAN`). - The latest developer version is available at [`r url_pkg_GitHub`](`r url_pkg_GitHub`). - For comments, tips, and bug reports, please post at [`r url_pkg_issues`](`r url_pkg_issues`) or contact Nathaniel at `r email_contact` or [`r url_contact`](`r url_contact`). <!-- Logo: --> ```{r logo, echo = FALSE, fig.align = "center", out.width="40%"} knitr::include_graphics("../inst/FFTrees_Logo.jpg") ``` <!-- Automatic references: --> ## Bibliography <!-- eof. -->