---
title: "Long-Horizon Forecasting"
output: 
  rmarkdown::html_vignette:
    toc: true 
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Long-Horizon Forecasting}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
library(httptest2)
.mockPaths("../tests/mocks")
start_vignette(dir = "../tests/mocks")

original_options <- options("NIXTLA_API_KEY"="dummy_api_key", digits=7)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", 
  fig.width = 7, 
  fig.height = 4
)
```

```{r}
library(nixtlar)
```

## 1. Long-horizon forecasting 

For some problems, it is necessary to forecast long horizons. Here "long horizons" refer to predictions that exceed more than two seasonal periods. For instance, this means predicting more than 48 hours ahead for hourly data, and more than 7 days for daily data. The specific definition of "long horizon" varies according to the frequency of the data.

`TimeGPT` features a specialized model designed for long-horizon forecasting. This model is trained to predict far into the future, where the uncertainty increases as the forecast extends further. Here we'll explain how to use the long horizon model of `TimeGPT`. 

This vignette will explain how to do this. It assumes you have already set up your API key. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/anomaly-detection.html) vignette first. 

## 2. Load data 

For this vignette, we'll use the electricity consumption dataset that is included in `nixtlar`, which contains the hourly prices of five different electricity markets. 

```{r}
df <- nixtlar::electricity
head(df)
```

For every `unique_id`, we'll try to predict the last 96 hours. Hence, we'll separate the data into training and test datasets. 

```{r}
test <- df |> 
  dplyr::group_by(unique_id) |> 
  dplyr::slice_tail(n = 96) |> 
  dplyr::ungroup() 

train <- df[df$ds %in% setdiff(df$ds, test$ds), ]
```

## 3. Forecast with a long-horizon 

To use the long-horizon model of `TimeGPT`, set the `model` argument to `timegpt-1-long-horizon`. 

```{r}
fcst_long_horizon <- nixtlar::nixtla_client_forecast(train, h=96, model="timegpt-1-long-horizon")
head(fcst_long_horizon)
```

## 4. Plot the long-horizon forecast

`nixtlar` includes a function to plot the historical data and any output from `nixtlar::nixtla_client_forecast`, `nixtlar::nixtla_client_historic`, `nixtlar::nixtla_client_detect_anomalies` and `nixtlar::nixtla_client_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full). 

When using `nixtlar::nixtla_client_plot` with the output of `nixtlar::nixtla_client_detect_anomalies`, set `plot_anomalies=TRUE` to plot the anomalies. 

```{r}
nixtlar::nixtla_client_plot(train, fcst_long_horizon, max_insample_length = 200)
```

## 5. Evaluate the long-horizon model 

To evaluate the long-horizon forecast, we'll generate the same forecast with the default model of `TimeGPT`, which is `timegpt-1`, and then we'll compute and compare the [Mean Absolute Error](https://en.wikipedia.org/wiki/Mean_absolute_error) (MAE) of the two models. 

```{r}
fcst <- nixtlar::nixtla_client_forecast(train, h=96)
head(fcst)
```

We'll rename the `TimeGPT` long-horizon model to merge it with the default `TimeGPT` model. We'll then merge them with the actual values from the test set and compute the MAE. Note that in the output of the `nixtla_client_forecast` function, the `ds` column contains dates. This is because the `nixtla_client_plot` uses the dates to make the plot. However, to merge the actual values, we'll convert them to characters. 

```{r}
names(fcst_long_horizon)[which(names(fcst_long_horizon) == "TimeGPT")] <- "TimeGPT-long-horizon"

res <- merge(fcst, fcst_long_horizon) # merge TimeGPT and TimeGPT-long-horizon
res$ds <- as.character(res$ds)

res <- merge(test, res) # merge with actual values
head(res)
```

```{r}
print(paste0("MAE TimeGPT: ", mean(abs(res$y-res$TimeGPT))))
print(paste0("MAE TimeGPT long-horizon: ", mean(abs(res$y-res$`TimeGPT-long-horizon`))))
```

As we can see, the long-horizon version of `TimeGPT` produced a model with a lower MAE than the default `TimeGPT` model.