Package website: release | dev
OpenML integration to the mlr3 ecosystem.
mlr3oml
?OpenML is an open-source
platform that facilitates the sharing and dissemination of machine
learning research data. All entities on the platform have unique
identifiers and standardized (meta)data that can be accessed via an
open-access REST API or the web interface. mlr3oml
allows
to work with the REST API through R and integrates OpenML with the mlr3
ecosystem. Note that some upload options are currently not supported,
use the OpenML
package package for this.
As a brief demo, we show how to access an OpenML task, convert it to
an mlr3::Task
and associated mlr3::Resampling
,
and conduct a simple resample experiment.
library(mlr3oml)
library(mlr3)
# Download and print the OpenML task with ID 145953
= otsk(145953)
oml_task oml_task
## <OMLTask:145953>
## * Type: Supervised Classification
## * Data: kr-vs-kp (id: 3; dim: 3196x37)
## * Target: class
## * Estimation: crossvalidation (id: 1; repeats: 1, folds: 10)
# Access the OpenML data object on which the task is built
$data oml_task
## <OMLData:3:kr-vs-kp> (3196x37)
## * Default target: class
# Convert the OpenML task to an mlr3 task and resampling
= as_task(oml_task)
task = as_resampling(oml_task)
resampling
# Conduct a simple resample experiment
= resample(task, lrn("classif.rpart"), resampling)
rr $aggregate() rr
## classif.ce
## 0.0319181
Besides working with objects with known IDs, data of interest can also be queried using listing functions. Below, we search for datasets with 10 - 20 features, 100 to 10000 observations and 2 classes.
= list_oml_data(
odatasets number_features = c(10, 20),
number_instances = c(100, 10000),
number_classes = 2
)
c("data_id", "name")] odatasets[,
## data_id
## 1: 13
## 2: 15
## 3: 29
## 4: 49
## 5: 50
## ---
## 238: 44767
## 239: 45039
## 240: 45063
## 241: 45562
## 242: 45568
## name
## 1: breast-cancer
## 2: breast-w
## 3: credit-approval
## 4: heart-c
## 5: tic-tac-toe
## ---
## 238: Click_prediction_small_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True
## 239: compas-two-years
## 240: credit-approval
## 241: seismic-bumps
## 242: telco-customer-churn
R6
classes.mlr3
counterpart.mlr3oml.cache
option.arff
and parquet
filetype for
datasets are supported.mlr3
book.mlr3oml is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!
In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour (but don’t worry about this if the bug is obvious).