Introduction to hydroroute

Julia Haider

Introduction

The method developed by Greimel et al. (2016) detects and characterizes sub-daily flow fluctuations and is implemented in the R package hydropeak (available on CRAN). Based on the events detected by the method implemented in package hydropeak, hydroroute identifies associated events in hydrographs from neighboring gauging stations and models the translation and retention processes between them (Greimel et al., 2022).

This vignette presents the main function peaktrace() which given the events and relation information between gauging stations determines the associated events and based on these, estimates predictive models to trace initial values specified for the relevant metrics across the neighboring gauging stations. First, an overview on the input data required is given and the additional function get_lag() and variants thereof are presented which estimate the mean translation time of the hydrographs between adjacent gauging stations, if not already available. Then the individual function estimate_AE() is presented which is used by peaktrace() to identify the associated events (AEs). Finally the application of peaktrace() is illustrated and it is shown how the return value can be inspected.

Input data

Several data files are required to perform the analysis.

Dataset Q

If the mean translation time between hydrographs at neighboring gauging stations is not given or known, the raw dataset Q, containing (equispaced) date-time values and the corresponding flow fluctuations, is needed. Based on these data the mean translation time between hydrographs at neighboring gauging stations may be estimated using get_lag(). The dataset Q needs to contain three variables:

  1. ID Character string which refers to the identifier of the gauging station (in Austria: HZBCODE).
  2. Time Character string with date-time information of measurements.
  3. Q Numeric, flow measurements.

The combination of ID and Time must be unique. Functions that use the dataset Q assume that these variables are contained in this order with exactly these names. If this is not the case, i.e., if the columns have different names or are not in this order, the order can be specified in these functions with argument cols. The columns are then renamed internally to make the data processable. Also the date-time format must be specified if it is different from the function’s default format used.

The following code loads the sample dataset Q and shows the first few rows:

Q_path <- system.file("testdata", "Q.csv", package = "hydroroute")
Q <- read.csv(Q_path)
head(Q)
#>       ID                Time    Q
#> 1 200000 2014-01-01 00:00:00 4.07
#> 2 200000 2014-01-01 00:15:00 4.06
#> 3 200000 2014-01-01 00:30:00 4.05
#> 4 200000 2014-01-01 00:45:00 4.04
#> 5 200000 2014-01-01 01:00:00 4.03
#> 6 200000 2014-01-01 01:15:00 4.01

The sample dataset Q is a data frame with 22656 rows and 3 variables as described above. The station ID’s along the flow path are 100000, 200000, 300000, 400000. The time ranges from "2014-01-01 00:00:00" to "2014-02-28 23:45:00".

Dataset relation

The dataset relation provides information about the gauging stations of neighboring hydrographs. It contains:

  1. ID Character string which refers to the identifier of the gauging station (in Austria: HZBCODE).
  2. Type Character string which characterizes the source hydrograph (Turbine flow, Gauge, Basin outflow).
  3. Station Character string which indicates the order of the n hydrographs in relation in downstream direction (Si with i = 1, ..., n).
  4. fkm Numeric, position of hydrograph in km relative to the source.
  5. LAG Character string which contains the cumulative mean translation time (or estimated cumulative lag) between the source and a specific gauging station in the format HH:MM. For S1 this is either indicated as missing (NA) or always given as 00:00. It is either already provided in relation or can be estimated from the corresponding dataset Q with get_lag().

The following code loads an example dataset:

relation_path <- system.file("testdata", "relation.csv", package = "hydroroute")
relation <- read.csv(relation_path)
relation
#>       ID  Type Station   fkm   LAG
#> 1 100000 Gauge      S1  0.01 00:00
#> 2 200000 Gauge      S2  9.16 01:00
#> 3 300000 Gauge      S3 17.02 02:15
#> 4 400000 Gauge      S4 29.27 03:30

The dataset relation contains 4 adjacent gauging stations.

Event files

The output files from hydropeak’s get_events_*() function are used to identify AEs. The naming scheme of the output files is ID_event-type_date-time-from_date-time-to.csv. Event types are defined as follows:

The most important event types for the following analysis are 2 (increasing event; IC) and 4 (decreasing event; DC).

Package hydroroute includes 8 sample Event files for each gauging station ID contained in the sample dataset Q and event type 2 (IC) and 4 (DC) between "2014-01-01 00:00:00" and "2014-02-28 23:45:00". The increasing events for the station with ID 100000 are thus loaded using:

Sx <- system.file("testdata", "Events", "100000_2_2014-01-01_2014-02-28.csv",
                  package = "hydroroute")
Sx <- read.csv(Sx)
head(Sx)
#>       ID EVENT_TYPE                Time   AMP  MAFR       MEFR DUR     RATIO
#> 1 100000          2 2014-01-01 00:15:00  0.01  0.01 0.01000000   1  1.003546
#> 2 100000          2 2014-01-01 01:00:00 27.07 12.20 6.76750000   4 10.565371
#> 3 100000          2 2014-01-01 10:15:00 29.38 15.50 7.34500000   4 11.801471
#> 4 100000          2 2014-01-01 13:30:00  0.16  0.03 0.02666667   6  1.060837
#> 5 100000          2 2014-01-01 17:30:00 30.91 16.90 7.72750000   4 12.490706
#> 6 100000          2 2014-01-01 20:45:00  0.02  0.01 0.01000000   2  1.007812

Get mean translation time between hydrographs with get_lag()

For the identification of AEs, the translation time between neighboring hydrographs and the event amplitude have to be considered. For the first criterion, the mean translation time (LAG) between hydrographs has to be estimated and the cumulative values appended to the relation data for further processing, if not available yet.

Function get_lag_file() uses:

If the argument save is TRUE, the relation data with appended LAG column is written to a file specified in outfile. If a LAG column already exists, argument overwrite has to be set to TRUE to overwrite the existing column. The function can be applied to several relation files by iterating over file paths or if a single Q data file is available, get_lag_dir() can be used. relation files can be selected from a directory using regular expressions (argument relation_pattern).

The following code shows this for single file names:

Q_file <- system.file("testdata", "Q.csv", package = "hydroroute")
relation <- system.file("testdata", "relation.csv", package = "hydroroute")
(get_lag_file(Q_file, relation, inputsep = ",", format = "%Y-%m-%d %H:%M",
              save = FALSE, overwrite = TRUE))
#>       ID  Type Station   fkm   LAG
#> 1 100000 Gauge      S1  0.01 00:00
#> 2 200000 Gauge      S2  9.16 01:15
#> 3 300000 Gauge      S3 17.02 02:45
#> 4 400000 Gauge      S4 29.27 04:00

This code indicates the use with get_lag_dir() where the directory is specified:

Q_file <- system.file("testdata", "Q.csv", package = "hydroroute")
relations_path <- file.path(tempdir(), "testdata")
dir.create(relations_path)
file.copy(list.files(system.file("testdata", package = "hydroroute"),
                     full.name = TRUE),
          relations_path, recursive = FALSE, copy.mode = TRUE)
#> [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
(get_lag_dir(Q_file, relations_path, inputsep = ",",
             inputdec = ".", format = "%Y-%m-%d %H:%M", overwrite = TRUE))
#> [[1]]
#>       ID  Type Station   fkm   LAG
#> 1 100000 Gauge      S1  0.01 00:00
#> 2 200000 Gauge      S2  9.16 01:15
#> 3 300000 Gauge      S3 17.02 02:45
#> 4 400000 Gauge      S4 29.27 04:00

Estimate settings with estimate_AE()

Greimel et al. (2022) propose the following algorithm to identify AEs:

“For every event x at the upstream hydrograph, the mean translation time between the neighboring hydrographs (calculated by an autocorrelation analysis) is subtracted from the downstream hydrograph. This then captures several events from the downstream hydrograph within a time slot +/- the translation time. Among those matches only those events are retained where the relative difference in amplitude is +/- one. In the following, a potential AE is the event y detected with the smallest time difference to x after accounting for the mean translation time meeting the time and amplitude criteria. […]

The relative difference in amplitude is then determined for these events, and parabolas are fitted to the histograms obtained for the relative difference data binned into intervals from -1 to 1 with a width 0.1 by fixing the vertex at the inner maximum of the histogram […]. The width of the parabola is determined by minimizing the average squared distances between the parabola and the histogram data along arbitrary symmetric ranges from the inner maximum. Based on the fitted parabola, cut points with the x-axis are determined so that only those potential AEs whose relative difference is within these cut points are retained. If this automatic scheme does not succeed in determining suitable cut points, e.g., because the estimated cut points are outside the defined intervals, a strict criterion for the relative difference in amplitude is imposed to identify AEs considering only deviations of at most 10%.”

estimate_AE() estimates suitable settings for the amplitude based on the method developed in Greimel et al. (2022) based on potential associated events identified using the specified time and metric deviations allowed for a match.

It performs this procedure for two neighboring hydrographs, i.e., it takes a subset of relation and the two corresponding Event files as input. The gauging station IDs in the subset of relation and in the Event files must match. Suitable settings for the amplitude are estimated as follows:

Note that the metric flow ratio (RATIO) does not make sense for S1 if the hydrograph is not of type Gauge. So metric RATIO is set to NA internally in this case.

The following code shows this procedure for two Event files:

# file paths
Sx <- system.file("testdata", "Events", "100000_2_2014-01-01_2014-02-28.csv",
                  package = "hydroroute")
Sy <- system.file("testdata", "Events", "200000_2_2014-01-01_2014-02-28.csv",
                  package = "hydroroute")
relation <- system.file("testdata", "relation.csv", package = "hydroroute")

# read data
Sx <- utils::read.csv(Sx)
Sy <- utils::read.csv(Sy)
relation <- utils::read.csv(relation)
relation <- relation[1:2, ]

# estimate AE, exact time matches
results <- estimate_AE(Sx, Sy, relation, timeLag = c(0, 1, 0))
results$settings
#>   station.x station.y bound lag    metric
#> 1        S1        S2 lower   0 0.9286046
#> 2        S1        S2 inner   1 1.0500000
#> 3        S1        S2 upper   0 1.1713954

Column bound represents the lower, inner and upper bounds that are used to subset potential AEs. lag represents the time lag. Only exact matches are used in this examples, which is specified by argument timeLag = c(0, 1, 0), which refers to 0 deviation at the lower and upper bound and 1 at the inner bound, meaning, that the mean translation time from relation is not altered when time matches are computed.

results$plot_threshold

head(results$real_AE)
#>     id.x event_type.x              time.x amp.x mafr.x   mefr.x dur.x   ratio.x
#> 1 100000            2 2014-01-11 17:15:00 39.01   20.4 13.00333     3 12.177650
#> 2 100000            2 2014-01-14 12:30:00 15.00    6.0  2.50000     6  1.262238
#> 3 100000            2 2014-01-15 06:15:00 77.14   38.3 12.85667     6 27.060811
#> 4 100000            2 2014-01-15 20:15:00 10.20    5.1  2.55000     4  1.312883
#> 5 100000            2 2014-01-19 06:45:00 74.41   23.7  9.30125     8 30.883534
#> 6 100000            2 2014-01-27 15:45:00  0.10    0.1  0.10000     1  1.001391
#>   station.x   id.y event_type.y              time.y amp.y mafr.y mefr.y dur.y
#> 1        S1 200000            2 2014-01-11 18:15:00 39.06   20.3  6.510     6
#> 2        S1 200000            2 2014-01-14 13:30:00 14.40    5.3  2.400     6
#> 3        S1 200000            2 2014-01-15 07:15:00 77.49   33.2 15.498     5
#> 4        S1 200000            2 2014-01-15 21:15:00 10.20    4.0  2.040     5
#> 5        S1 200000            2 2014-01-19 07:45:00 76.36   28.9  9.545     8
#> 6        S1 200000            2 2014-01-27 16:45:00  0.10    0.1  0.100     1
#>     ratio.y station.y   diff_metric
#> 1  7.804878        S2  1.281723e-03
#> 2  1.231140        S2 -4.000000e-02
#> 3 14.111675        S2  4.537205e-03
#> 4  1.280992        S2  0.000000e+00
#> 5 16.457490        S2  2.620616e-02
#> 6  1.001311        S2  1.469658e-13

Combine everything with peaktrace()

Function peaktrace combines the identification of potential AEs and the estimation of suitable amplitude settings for a whole river section as specified in a relation file. In addition, the flow metrics of the AEs are pictured by scatter plots and the translation and retention process between the hydrographs is described by linear models.

In the following, the input arguments of function peaktrace() are described:

initial_values_path <- system.file("testdata", "initial_value_routing.csv",
                                   package = "hydroroute")
initials <- read.csv(initial_values_path)
initials
#>    Station Metric    Value   Name
#> 1       S1    AMP 93.00000  Q_max
#> 2       S1    AMP 69.75000 Q_0.75
#> 3       S1    AMP 46.50000  Q_0.5
#> 4       S1    AMP 23.25000 Q_0.25
#> 5       S1   MAFR 93.00000  Q_max
#> 6       S1   MAFR 69.75000 Q_0.75
#> 7       S1   MAFR 46.50000  Q_0.5
#> 8       S1   MAFR 23.25000 Q_0.25
#> 9       S1   MEFR 93.00000  Q_max
#> 10      S1   MEFR 69.75000 Q_0.75
#> 11      S1   MEFR 46.50000  Q_0.5
#> 12      S1   MEFR 23.25000 Q_0.25
#> 13      S1    DUR  1.00000      N
#> 14      S2  RATIO 19.90099    Max

The columns must be identical to this example. The content may vary. The initial values used for prediction must not contain any missing values.

The return value of function peaktrace() is structured as follows:

relation_path <- system.file("testdata", "relation.csv", package = "hydroroute")
events_path <- system.file("testdata", "Events", package = "hydroroute")
initial_values_path <- system.file("testdata", "initial_value_routing.csv",
                                   package = "hydroroute")
res <- peaktrace(relation_path, events_path, initial_values_path)

The first list object refers to event type 2 (IC event).

res$`2`$settings
#>   station.x station.y bound lag    metric
#> 1        S1        S2 lower   1 0.7388856
#> 2        S1        S2 inner   1 0.9500000
#> 3        S1        S2 upper   1 1.1611144
#> 4        S2        S3 lower   1 0.8818457
#> 5        S2        S3 inner   1 1.0500000
#> 6        S2        S3 upper   1 1.2181543
#> 7        S3        S4 lower   1 0.8267117
#> 8        S3        S4 inner   1 0.9500000
#> 9        S3        S4 upper   1 1.0732883

The settings data frame contains the estimated time lag and metric settings computed with estimate_AE(). In this example with 4 stations, it contains nine rows where three rows describe the relation between two neighboring stations. Since only exact time matches were allowed, the lag values are 0 for lower and upper bound and 1 for inner. metric contains the range of relative values of the amplitude allowed.
Events, where the relative difference in amplitude and the relative difference in time are within these settings, are considered as “real” AE and therefore events caused by disruptive factors are excluded as far as possible.

The following plot shows the histograms obtained for the relative differences in amplitude for each pair of neighboring gauging stations binned into intervals from -1 to 1 of width 0.1. The dashed line shows the fitted parabola and the cut points of the parabola with the x-axis are indicated. Potential AEs where the relative difference is within these cut points are considered as “real” AEs.

grid::grid.draw(res$`2`$plot_threshold)

The “real” AEs can be inspected using:

head(res$`2`$real_AE)
#>     id.x event_type.x              time.x amp.x mafr.x   mefr.x dur.x   ratio.x
#> 1 100000            2 2014-01-01 01:00:00 27.07   12.2 6.767500     4 10.565371
#> 2 100000            2 2014-01-01 10:15:00 29.38   15.5 7.345000     4 11.801471
#> 3 100000            2 2014-01-01 17:30:00 30.91   16.9 7.727500     4 12.490706
#> 4 100000            2 2014-01-02 17:30:00 26.88   14.1 4.480000     6  8.636364
#> 5 100000            2 2014-01-03 05:00:00 27.34   12.9 9.113333     3 10.559441
#> 6 100000            2 2014-01-03 16:00:00 27.16   14.8 3.017778     9 10.912409
#>   station.x   id.y event_type.y              time.y amp.y mafr.y   mefr.y dur.y
#> 1        S1 200000            2 2014-01-01 02:15:00 25.83  13.11 5.166000     5
#> 2        S1 200000            2 2014-01-01 11:30:00 28.49  14.54 5.698000     5
#> 3        S1 200000            2 2014-01-01 18:45:00 29.15  14.98 7.287500     4
#> 4        S1 200000            2 2014-01-02 19:00:00 26.43  11.49 3.303750     8
#> 5        S1 200000            2 2014-01-03 06:15:00 26.69  13.70 5.338000     5
#> 6        S1 200000            2 2014-01-03 17:30:00 27.03  12.20 3.861429     7
#>    ratio.y station.y  diff_metric
#> 1 7.506297        S2 -0.045807167
#> 2 8.679245        S2 -0.030292716
#> 3 8.379747        S2 -0.056939502
#> 4 6.427105        S2 -0.016741071
#> 5 7.655860        S2 -0.023774689
#> 6 7.808564        S2 -0.004786451

The scatter plots of the metrics at the neighboring gauging stations for the “real” AEs are contained in plot_scatter. The scatter plots are arranged in a grid where each row contains scatter plots for a specific metric and each colums contains a different pair of neighboring gauging stations. The x-axis is the upstream hydrograph Sx, the y-axis is the downstream hydrograph Sy. A linear regression line and the corresponding \(R^2\) value are added to each plot. By default the aspect ratio is fixed and the axis limits are equal within each plot.

grid::grid.draw(res$`2`$plot_scatter)

The fitted regression models may also be inspected:

res$`2`$models
#>    station.x station.y metric type formula (Intercept)         x   n         r2
#> 1         S1        S2    amp   lm   y ~ x  -0.5542922 1.0194716 164 0.99658281
#> 2         S1        S2    dur   lm   y ~ x   1.6520198 0.8453943 164 0.64266715
#> 3         S1        S2   mafr   lm   y ~ x  -0.3694711 1.0066784 164 0.91454377
#> 4         S1        S2   mefr   lm   y ~ x   0.4813008 0.7150873 164 0.70223571
#> 5         S2        S3    amp   lm   y ~ x   0.7597112 0.9782935 100 0.99765823
#> 6         S2        S3    dur   lm   y ~ x   0.7959971 0.7868426 100 0.70537848
#> 7         S2        S3   mafr   lm   y ~ x  -1.0932362 1.5764049 100 0.94356887
#> 8         S2        S3   mefr   lm   y ~ x   0.5505522 1.0358693 100 0.76774263
#> 9         S2        S3  ratio   lm   y ~ x   0.5890385 0.6172956 100 0.97341252
#> 10        S3        S4    amp   lm   y ~ x  -0.1251017 0.9485448  30 0.98810523
#> 11        S3        S4    dur   lm   y ~ x   3.0299519 0.4642539  30 0.09809454
#> 12        S3        S4   mafr   lm   y ~ x   3.0964565 0.5820430  30 0.87656463
#> 13        S3        S4   mefr   lm   y ~ x   2.2653366 0.7352544  30 0.56098791
#> 14        S3        S4  ratio   lm   y ~ x   1.2112368 0.3257644  30 0.44545302

The models data frame contains the fitted (linear) models for each pair of neighboring stations and each metric.

Finally the fitted regression models are used to predict the values of the metrics along the longitudinal flow path given the initial values:

gridExtra::grid.arrange(grobs = res$`2`$plot_predict$grobs, nrow = 3, ncol = 2)

If a file with initial values is passed to peaktrace(), predictions along the longitudinal flow path are made and visualized in a plot. Each line in the plot represents a different scenario, e.g., the uppermost solid lines for AMP, MAFR and MEFR represent the values of Q_max in the initial file. Starting from these initial values, predictions are made with the corresponding models, e.g., the first value of the initial valus file is passed to the model that describes the relationship of AMP between S1 and S2 to predict the value at S2.

The initial value and the first fitted model are:

initials[1, ]
#>   Station Metric Value  Name
#> 1      S1    AMP    93 Q_max
res$`2`$models[1, ]
#>   station.x station.y metric type formula (Intercept)        x   n        r2
#> 1        S1        S2    amp   lm   y ~ x  -0.5542922 1.019472 164 0.9965828

The resulting predicted value is then passed to the next model along the flow path, i.e., the model of S2 and S3 to predict the value at S3.

res$`2`$models[6, ]
#>   station.x station.y metric type formula (Intercept)         x   n        r2
#> 6        S2        S3    dur   lm   y ~ x   0.7959971 0.7868426 100 0.7053785

Finally, this predicted value is passed to the last model along this river section: the model between S3 and S4 to predict the value at S4.

res$`2`$models[11, ]
#>    station.x station.y metric type formula (Intercept)         x  n         r2
#> 11        S3        S4    dur   lm   y ~ x    3.029952 0.4642539 30 0.09809454

This procedure is repeated for all metrics according to the initial values file. Note that in this initial values file metric RATIO starts at station S2. Therefore the first value of RATIO to predict is at station S3.

The second list object refers to event type 4 (DC event). The nested objects of this event type are shown below.

res$`4`$settings
#>   station.x station.y bound lag    metric
#> 1        S1        S2 lower   1 0.8168386
#> 2        S1        S2 inner   1 0.9500000
#> 3        S1        S2 upper   1 1.0831614
#> 4        S2        S3 lower   1 0.9202873
#> 5        S2        S3 inner   1 1.0500000
#> 6        S2        S3 upper   1 1.1797127
#> 7        S3        S4 lower   1 0.5814461
#> 8        S3        S4 inner   1 0.8500000
#> 9        S3        S4 upper   1 1.1185539
head(res$`4`$real_AE)
#>     id.x event_type.x              time.x amp.x mafr.x   mefr.x dur.x   ratio.x
#> 1 100000            4 2014-01-01 02:00:00 27.07  13.40 3.867143     7 10.565371
#> 2 100000            4 2014-01-01 11:15:00 29.47  13.90 3.274444     9 12.205323
#> 3 100000            4 2014-01-01 18:30:00 31.04  16.00 3.448889     9 13.125000
#> 4 100000            4 2014-01-02 07:45:00 27.44  13.98 6.860000     4 12.154472
#> 5 100000            4 2014-01-02 20:15:00 27.37  14.20 3.421250     8 10.033003
#> 6 100000            4 2014-01-03 18:15:00 26.87  13.90 5.374000     5  9.867987
#>   station.x   id.y event_type.y              time.y amp.y mafr.y    mefr.y
#> 1        S1 200000            4 2014-01-01 03:30:00 26.09    6.5 0.8153125
#> 2        S1 200000            4 2014-01-01 12:45:00 28.09    6.5 1.7556250
#> 3        S1 200000            4 2014-01-01 19:45:00 29.50    7.1 1.2826087
#> 4        S1 200000            4 2014-01-02 08:45:00 26.37    6.2 1.6481250
#> 5        S1 200000            4 2014-01-02 21:00:00 27.26    6.5 0.8793548
#> 6        S1 200000            4 2014-01-03 19:15:00 25.84    6.4 1.9876923
#>   dur.y  ratio.y station.y  diff_metric
#> 1    32 8.032345        S2 -0.036202438
#> 2    16 7.834550        S2 -0.046827282
#> 3    23 9.194444        S2 -0.049613402
#> 4    16 8.069705        S2 -0.038994169
#> 5    31 7.747525        S2 -0.004018999
#> 6    13 6.007752        S2 -0.038332713
grid::grid.draw(res$`4`$plot_threshold)

grid::grid.draw(res$`4`$plot_scatter)

res$`4`$models
#>    station.x station.y metric type formula (Intercept)         x   n        r2
#> 1         S1        S2    amp   lm   y ~ x  -0.6331382 1.0230064 159 0.9976993
#> 2         S1        S2    dur   lm   y ~ x   3.4889475 1.1941344 159 0.5660323
#> 3         S1        S2   mafr   lm   y ~ x   0.3670848 0.6093259 159 0.8288550
#> 4         S1        S2   mefr   lm   y ~ x   0.3152232 0.5218025 159 0.6858269
#> 5         S2        S3    amp   lm   y ~ x   0.8889574 0.9860819  89 0.9960296
#> 6         S2        S3    dur   lm   y ~ x   2.5702522 0.6747927  89 0.8660458
#> 7         S2        S3   mafr   lm   y ~ x   0.8424465 0.8700683  89 0.9305420
#> 8         S2        S3   mefr   lm   y ~ x   0.9748893 0.6441977  89 0.7688562
#> 9         S2        S3  ratio   lm   y ~ x   0.6994949 0.5659408  89 0.9776473
#> 10        S3        S4    amp   lm   y ~ x  -0.4977512 0.8877421  50 0.9569848
#> 11        S3        S4    dur   lm   y ~ x   1.6421232 0.6351157  50 0.7096118
#> 12        S3        S4   mafr   lm   y ~ x  -0.2972899 0.8213848  50 0.5844032
#> 13        S3        S4   mefr   lm   y ~ x   0.2811406 1.1317973  50 0.2960442
#> 14        S3        S4  ratio   lm   y ~ x   0.8553193 0.3178922  50 0.7922559
gridExtra::grid.arrange(grobs = res$`4`$plot_predict$grobs, nrow = 3, ncol = 2)

Extract AEs and perform routing with existing settings

If estimated settings are already available, it is possible to use the settings directly to extract “real” AEs from the Event data. For such an analysis, a path to a relation file must be provided, as well as a path to the Event data and paths to settings and initial values.

Extract AEs

The following code shows the extraction of “real” AEs based on the settings file which is included in the package after having been generated with estimate_AE().

relation_path <- system.file("testdata", "relation.csv", package = "hydroroute")
events_path <- system.file("testdata", "Events", package = "hydroroute")
settings_path <- system.file("testdata", "Q_event_2_AMP-LAG_aut_settings.csv",
                             package = "hydroroute")
initials_path <- system.file("testdata", "initial_value_routing.csv",
                             package = "hydroroute")
real_AE <- extract_AE(relation_path, events_path, settings_path)
head(real_AE)
#>     id.x event_type.x              time.x amp.x mafr.x   mefr.x dur.x   ratio.x
#> 1 100000            2 2014-01-01 01:00:00 27.07   12.2 6.767500     4 10.565371
#> 2 100000            2 2014-01-01 10:15:00 29.38   15.5 7.345000     4 11.801471
#> 3 100000            2 2014-01-01 17:30:00 30.91   16.9 7.727500     4 12.490706
#> 4 100000            2 2014-01-02 17:30:00 26.88   14.1 4.480000     6  8.636364
#> 5 100000            2 2014-01-03 05:00:00 27.34   12.9 9.113333     3 10.559441
#> 6 100000            2 2014-01-03 16:00:00 27.16   14.8 3.017778     9 10.912409
#>   station.x   id.y event_type.y              time.y amp.y mafr.y   mefr.y dur.y
#> 1        S1 200000            2 2014-01-01 02:15:00 25.83  13.11 5.166000     5
#> 2        S1 200000            2 2014-01-01 11:30:00 28.49  14.54 5.698000     5
#> 3        S1 200000            2 2014-01-01 18:45:00 29.15  14.98 7.287500     4
#> 4        S1 200000            2 2014-01-02 19:00:00 26.43  11.49 3.303750     8
#> 5        S1 200000            2 2014-01-03 06:15:00 26.69  13.70 5.338000     5
#> 6        S1 200000            2 2014-01-03 17:30:00 27.03  12.20 3.861429     7
#>    ratio.y station.y  diff_metric
#> 1 7.506297        S2 -0.045807167
#> 2 8.679245        S2 -0.030292716
#> 3 8.379747        S2 -0.056939502
#> 4 6.427105        S2 -0.016741071
#> 5 7.655860        S2 -0.023774689
#> 6 7.808564        S2 -0.004786451

Routing

With the extracted “real” AEs, the routing procedure can be performed to describe the translation and retention processes between neighboring hydrographs.

Therefore, the output from extract_AE() (or similar, the output $real_AE from estimate_AE()), the initial values data frame and the relation data frame have to be passed to function routing().

relation <- utils::read.csv(relation_path)
initials <- utils::read.csv(initials_path)
res <- routing(real_AE, initials, relation)

This produces the same scatter plot as before when peaktrace() was called as the events and the settings are the same.

grid::grid.draw(res$plot_scatter)

res$models
#>    station.x station.y metric type formula (Intercept)         x   n         r2
#> 1         S1        S2    amp   lm   y ~ x  -0.5542922 1.0194716 164 0.99658281
#> 2         S1        S2    dur   lm   y ~ x   1.6520198 0.8453943 164 0.64266715
#> 3         S1        S2   mafr   lm   y ~ x  -0.3694711 1.0066784 164 0.91454377
#> 4         S1        S2   mefr   lm   y ~ x   0.4813008 0.7150873 164 0.70223571
#> 5         S2        S3    amp   lm   y ~ x   0.7597112 0.9782935 100 0.99765823
#> 6         S2        S3    dur   lm   y ~ x   0.7959971 0.7868426 100 0.70537848
#> 7         S2        S3   mafr   lm   y ~ x  -1.0932362 1.5764049 100 0.94356887
#> 8         S2        S3   mefr   lm   y ~ x   0.5505522 1.0358693 100 0.76774263
#> 9         S2        S3  ratio   lm   y ~ x   0.5890385 0.6172956 100 0.97341252
#> 10        S3        S4    amp   lm   y ~ x  -0.1251017 0.9485448  30 0.98810523
#> 11        S3        S4    dur   lm   y ~ x   3.0299519 0.4642539  30 0.09809454
#> 12        S3        S4   mafr   lm   y ~ x   3.0964565 0.5820430  30 0.87656463
#> 13        S3        S4   mefr   lm   y ~ x   2.2653366 0.7352544  30 0.56098791
#> 14        S3        S4  ratio   lm   y ~ x   1.2112368 0.3257644  30 0.44545302
gridExtra::grid.arrange(grobs = res$plot_predict$grobs, nrow = 3, ncol = 2)