Title: | Tools to Conduct Meteorological Normalisation and Counterfactual Modelling for Air Quality Data |
---|---|
Description: | An integrated set of tools to allow data users to conduct meteorological normalisation and counterfactual modelling for air quality data. The meteorological normalisation technique uses predictive random forest models to remove variation of pollutant concentrations so trends and interventions can be explored in a robust way. For examples, see Grange et al. (2018) <doi:10.5194/acp-18-6223-2018> and Grange and Carslaw (2019) <doi:10.1016/j.scitotenv.2018.10.344>. The random forest models can also be used for counterfactual or business as usual (BAU) modelling by using the models to predict, from the model's perspective, the future. For an example, see Grange et al. (2021) <doi:10.5194/acp-2020-1171>. |
Authors: | Stuart K. Grange [cre, aut] |
Maintainer: | Stuart K. Grange <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 0.2.62 |
Built: | 2025-02-21 01:20:39 UTC |
Source: | https://github.com/skgrange/rmweather |
Pseudo-function to re-export magrittr's pipe.
Pseudo-function to re-export functions from the stats package.
These example data are daily means of NO2 and NOx observations at London Marylebone Road. The accompanying surface meteorological data are from London Heathrow, a major airport located 23 km west of Central London.
data_london
data_london
Tibble with 15676 observations and 11 variables. The variables are:
date
, date_end
, site
, site_name
, value
,
air_temp
, atmospheric_pressure
, rh
, wd
, and
ws
. The dates are in POSIXct
format, the site variables are
characters and all other variables are numeric.
The NO2 and NOx observations are sourced from the European Commission Air Quality e-Reporting repository which can be freely shared with acknowledgement of the source. The meteorological data are sourced from the Integrated Surface Data (ISD) database which cannot be redistributed for commercial purposes and are bound to the WMO Resolution 40 Policy.
Stuart K. Grange
# Load rmweather's example data and check head(data_london)
# Load rmweather's example data and check head(data_london)
These example data are derived from the observational data included in rmweather and represent meteorologically normalised NO2 concentrations at London Marylebone Road, aggregated to monthly resolution.
data_london_normalised
data_london_normalised
Tibble with 258 observations and 5 variables. The variables are:
date
, date_end
, site
, site_name
, and
value_predict
. The dates are in POSIXct
format, the site
variables are characters and value_predict
is numeric.
Stuart K. Grange
# Load rmweather's meteorologically normalised example data and check head(data_london_normalised)
# Load rmweather's meteorologically normalised example data and check head(data_london_normalised)
Pseudo-function to re-export dplyr's common functions.
This example object was created from the observational data included in
rmweather and is a random forest model returned by
rmw_train_model
. This forest is only made from one tree to keep
the file size small and is only used for the package's examples.
model_london
model_london
A ranger object, a named list with 16 elements.
Stuart K. Grange
# Load rmweather's ranger model example data and see what elements it contains names(model_london) # Print ranger object print(model_london)
# Load rmweather's ranger model example data and see what elements it contains names(model_london) # Print ranger object print(model_london)
Function to calculate observed-predicted error statistics.
rmw_calculate_model_errors( df, value_model = "value_predict", value_observed = "value", testing_only = TRUE, as_long = FALSE )
rmw_calculate_model_errors( df, value_model = "value_predict", value_observed = "value", testing_only = TRUE, as_long = FALSE )
df |
Data frame with observed-predicted variables. |
value_model |
The modelled/predicted variable in |
value_observed |
The observed variable in |
testing_only |
Should only the testing set be used for the calculation of errors? |
as_long |
Should the returned tibble be in "long" format? This is useful for plotting. |
Tibble.
Stuart K. Grange
rmw_normalise
.rmw_clip
helps if the random forest model behaves strangely at the
beginning and end of the time series during prediction.
rmw_clip(df, seconds = 31536000/2)
rmw_clip(df, seconds = 31536000/2)
df |
Data frame from |
seconds |
Number of seconds to clip from start and end of time-series. The default is half a year. |
Data frame.
Stuart K. Grange
rmw_normalise
, rmw_plot_normalised
# Clip the edges of a normalised time series, default is half a year data_normalised_clipped <- rmw_clip(data_london_normalised)
# Clip the edges of a normalised time series, default is half a year data_normalised_clipped <- rmw_clip(data_london_normalised)
rmw_do_all
is a user-level function to conduct the meteorological
normalisation process in one step.
rmw_do_all( df, variables, variables_sample = NA, n_trees = 300, min_node_size = 5, mtry = NULL, keep_inbag = TRUE, n_samples = 300, replace = TRUE, se = FALSE, aggregate = TRUE, n_cores = NA, verbose = FALSE )
rmw_do_all( df, variables, variables_sample = NA, n_trees = 300, min_node_size = 5, mtry = NULL, keep_inbag = TRUE, n_samples = 300, replace = TRUE, se = FALSE, aggregate = TRUE, n_cores = NA, verbose = FALSE )
df |
Input data frame after preparation with
|
variables |
Independent/explanatory variables used to predict
|
variables_sample |
Variables to use for the normalisation step. If not
used, the default of all variables used for training the model with the
exception of |
n_trees |
Number of trees to grow to make up the forest. |
min_node_size |
Minimal node size. |
mtry |
Number of variables to possibly split at in each node. Default is the (rounded down) square root of the number variables. |
keep_inbag |
Should in-bag data be kept in the ranger model
object? This needs to be |
n_samples |
Number of times to sample |
replace |
Should |
se |
Should the standard error of the predictions be calculated too? The standard error method is the "infinitesimal jackknife for bagging" and will slow down the predictions significantly. |
aggregate |
Should all the |
n_cores |
Number of CPU cores to use for the model calculation. Default is system's total minus one. |
verbose |
Should the function give messages? |
Named list.
Stuart K. Grange
rmw_prepare_data
, rmw_train_model
,
rmw_normalise
# Load package library(dplyr) # Keep things reproducible set.seed(123) # Prepare example data data_london_prepared <- data_london %>% filter(variable == "no2") %>% rmw_prepare_data() # Use the example data to conduct the steps needed for meteorological # normalisation list_normalised <- rmw_do_all( df = data_london_prepared, variables = c( "ws", "wd", "air_temp", "rh", "date_unix", "day_julian", "weekday", "hour" ), n_trees = 300, n_samples = 300 )
# Load package library(dplyr) # Keep things reproducible set.seed(123) # Prepare example data data_london_prepared <- data_london %>% filter(variable == "no2") %>% rmw_prepare_data() # Use the example data to conduct the steps needed for meteorological # normalisation list_normalised <- rmw_do_all( df = data_london_prepared, variables = c( "ws", "wd", "air_temp", "rh", "date_unix", "day_julian", "weekday", "hour" ), n_trees = 300, n_samples = 300 )
rmw_find_breakpoints
will generally be applied to a data frame after
rmw_normalise
. rmw_find_breakpoints
is rather slow.
rmw_find_breakpoints(df, h = 0.15, n = NULL)
rmw_find_breakpoints(df, h = 0.15, n = NULL)
df |
Tibble from |
h |
Minimal segment size either given as fraction relative to the sample size or as an integer giving the minimal number of observations in each segment. |
n |
Number of breaks to detect. Default is maximum number allowed by
|
Tibble with a date
variable indicating where the breakpoints
are.
Stuart K. Grange
# Test for breakpoints in an example normalised time series data_breakpoints <- rmw_find_breakpoints(data_london_normalised)
# Test for breakpoints in an example normalised time series data_breakpoints <- rmw_find_breakpoints(data_london_normalised)
Function to train random forest models using a nested tibble.
rmw_model_nested_sets( df_nest, variables, n_trees = 10, mtry = NULL, min_node_size = 5, n_cores = NA, verbose = FALSE, progress = FALSE )
rmw_model_nested_sets( df_nest, variables, n_trees = 10, mtry = NULL, min_node_size = 5, n_cores = NA, verbose = FALSE, progress = FALSE )
df_nest |
Nested tibble created by |
variables |
Independent/explanatory variables used to predict |
n_trees |
Number of trees to grow to make up the forest. |
mtry |
Number of variables to possibly split at in each node. Default is the (rounded down) square root of the number variables. |
min_node_size |
Minimal node size. |
n_cores |
Number of CPU cores to use for the model calculations. |
verbose |
Should the function give messages? |
progress |
Should a progress bar be displayed? |
Nested tibble.
Stuart K. Grange
rmw_nest_for_modelling
,
rmw_predict_nested_sets
, rmw_train_model
rmw_calculate_model
.Functions to extract model statistics from a model calculated with
rmw_calculate_model
.
rmw_model_statistics(model) rmw_model_importance(model, date_unix = TRUE)
rmw_model_statistics(model) rmw_model_importance(model, date_unix = TRUE)
model |
A ranger model object from |
date_unix |
Should the |
The variable importances are defined as "the permutation importance differences of predictions errors". This measure is unit-less and the values are not useful when comparing among data sets.
Tibble.
Stuart K. Grange
# Extract statistics from the example random forest model rmw_model_statistics(model_london) # Extract importances from a model object rmw_model_importance(model_london)
# Extract statistics from the example random forest model rmw_model_statistics(model_london) # Extract importances from a model object rmw_model_importance(model_london)
rmw_nest_for_modelling
will resample the observations if desired, will
test and prepare the data (with rmw_prepare_data
), and return
a nested tibble ready for modelling.
rmw_nest_for_modelling( df, by = "resampled_set", n = 1, na.rm = FALSE, fraction = 0.8 )
rmw_nest_for_modelling( df, by = "resampled_set", n = 1, na.rm = FALSE, fraction = 0.8 )
df |
Input data frame. Generally a time series of air quality data with pollutant concentrations and meteorological variables. |
by |
Variables within |
n |
Number of resampling sets to create. |
na.rm |
Should missing values (NA) be removed from value? |
fraction |
Fraction of the observations to make up the training set. |
Nested tibble.
Stuart K. Grange
rmw_prepare_data
, rmw_model_nested_sets
,
rmw_predict_nested_sets
# Load package library(dplyr) # Keep things reproducible set.seed(123) # Prepare example data for modelling, replicate observations twice too data_london %>% rmw_nest_for_modelling(by = c("site", "variable"), n = 2)
# Load package library(dplyr) # Keep things reproducible set.seed(123) # Prepare example data for modelling, replicate observations twice too data_london %>% rmw_nest_for_modelling(by = c("site", "variable"), n = 2)
Function to normalise a variable for "average" meteorological conditions.
rmw_normalise( model, df, variables = NA, n_samples = 300, replace = TRUE, se = FALSE, aggregate = TRUE, keep_samples = FALSE, n_cores = NA, verbose = FALSE )
rmw_normalise( model, df, variables = NA, n_samples = 300, replace = TRUE, se = FALSE, aggregate = TRUE, keep_samples = FALSE, n_cores = NA, verbose = FALSE )
model |
A ranger model object from |
df |
Input data used to calculate |
variables |
Variables to randomly sample. Default is all variables used
for training the model with the exception of |
n_samples |
Number of times to sample |
replace |
Should |
se |
Should the standard error of the predictions be calculated too? The standard error method is the "infinitesimal jackknife for bagging" and will slow down the predictions significantly. |
aggregate |
Should all the |
keep_samples |
When |
n_cores |
Number of CPU cores to use for the model predictions. Default is system's total minus one. |
verbose |
Should the function give messages and display a progress bar? |
Tibble.
Stuart K. Grange
rmw_prepare_data
, rmw_train_model
# Load package library(dplyr) # Keep things reproducible set.seed(123) # Prepare example data data_london_prepared <- data_london %>% filter(variable == "no2") %>% rmw_prepare_data() # Normalise the example no2 data data_normalised <- rmw_normalise( model_london, df = data_london_prepared, n_samples = 300, verbose = TRUE )
# Load package library(dplyr) # Keep things reproducible set.seed(123) # Prepare example data data_london_prepared <- data_london %>% filter(variable == "no2") %>% rmw_prepare_data() # Normalise the example no2 data data_normalised <- rmw_normalise( model_london, df = data_london_prepared, n_samples = 300, verbose = TRUE )
Function to normalise a variable for "average" meteorological conditions in a nested tibble.
rmw_normalise_nested_sets( df_nest, variables = NA, n_samples = 10, replace = TRUE, se = FALSE, aggregate = TRUE, keep_samples = FALSE, n_cores = NA, verbose = FALSE, progress = FALSE )
rmw_normalise_nested_sets( df_nest, variables = NA, n_samples = 10, replace = TRUE, se = FALSE, aggregate = TRUE, keep_samples = FALSE, n_cores = NA, verbose = FALSE, progress = FALSE )
df_nest |
Nested tibble created by |
variables |
Variables to randomly sample. Default is all variables used
for training the model with the exception of |
n_samples |
Number of times to sample |
replace |
Should |
se |
Should the standard error of the predictions be calculated too? The standard error method is the "infinitesimal jackknife for bagging" and will slow down the predictions significantly. |
aggregate |
Should all the |
keep_samples |
When |
n_cores |
Number of CPU cores to use for the model predictions. Default is system's total minus one. |
verbose |
Should the function give messages? |
progress |
Should a progress bar be displayed? |
Nested tibble.
Stuart K. Grange
rmw_nest_for_modelling
,
rmw_model_nested_sets
, rmw_model_nested_sets
,
rmw_normalise
.
rmw_plot_partial_dependencies
is rather slow.
rmw_partial_dependencies( model, df, variable, training_only = TRUE, resolution = NULL, n_cores = NA, verbose = FALSE )
rmw_partial_dependencies( model, df, variable, training_only = TRUE, resolution = NULL, n_cores = NA, verbose = FALSE )
model |
A ranger model object from |
df |
Input data frame after preparation with
|
variable |
Vector of variables to calculate partial dependencies for. |
training_only |
Should only the training set be used for prediction? The
default is |
resolution |
The number of points that should be predicted for each
independent variable. If left as |
n_cores |
Number of CPU cores to use for the model calculation. The default is system's total minus one. |
verbose |
Should the function give messages? |
Tibble.
Stuart K. Grange
# Load packages library(dplyr) # Ranger package needs to be loaded library(ranger) # Prepare example data data_london_prepared <- data_london %>% filter(variable == "no2") %>% rmw_prepare_data() # Calculate partial dependencies for wind speed data_partial <- rmw_partial_dependencies( model = model_london, df = data_london_prepared, variable = "ws", verbose = TRUE ) # Calculate partial dependencies for all independent variables used in model data_partial <- rmw_partial_dependencies( model = model_london, df = data_london_prepared, variable = NA, verbose = TRUE )
# Load packages library(dplyr) # Ranger package needs to be loaded library(ranger) # Prepare example data data_london_prepared <- data_london %>% filter(variable == "no2") %>% rmw_prepare_data() # Calculate partial dependencies for wind speed data_partial <- rmw_partial_dependencies( model = model_london, df = data_london_prepared, variable = "ws", verbose = TRUE ) # Calculate partial dependencies for all independent variables used in model data_partial <- rmw_partial_dependencies( model = model_london, df = data_london_prepared, variable = NA, verbose = TRUE )
rmw_train_model
.Function to plot random forest variable importances after training by
rmw_train_model
.
rmw_plot_importance(df, colour = "black")
rmw_plot_importance(df, colour = "black")
df |
Data frame created by |
colour |
Colour of point and segment geometries. |
ggplot2 plot with point and segment geometries.
Stuart K. Grange
rmw_train_model
, rmw_model_importance
rmw_normalise
.If the input data contains a standard error variable named "se"
,
this will be plotted as a ribbon (+ and -) around the mean.
rmw_plot_normalised(df, colour = "#6B186EFF")
rmw_plot_normalised(df, colour = "#6B186EFF")
df |
Tibble created by |
colour |
Colour for line geometry. |
ggplot2 plot with a line and ribbon geometries.
Stuart K. Grange
# Plot normalised example data rmw_plot_normalised(data_london_normalised)
# Plot normalised example data rmw_plot_normalised(data_london_normalised)
rmw_partial_dependencies
.Function to plot partial dependencies after calculation by
rmw_partial_dependencies
.
rmw_plot_partial_dependencies(df)
rmw_plot_partial_dependencies(df)
df |
Tibble created by |
ggplot2 plot with a point geometry.
Stuart K. Grange
rmw_predict_the_test_set
.Function to plot the test set and predicted set after
rmw_predict_the_test_set
.
rmw_plot_test_prediction(df, bins = 30, coord_equal = TRUE)
rmw_plot_test_prediction(df, bins = 30, coord_equal = TRUE)
df |
Tibble created by |
bins |
Numeric vector giving number of bins in both vertical and horizontal directions. |
coord_equal |
Should axes be forced to be equal? |
ggplot2 plot with a hex geometry.
Stuart K. Grange
Function to predict using a ranger random forest.
rmw_predict(model, df = NA, se = FALSE, n_cores = NULL, verbose = FALSE)
rmw_predict(model, df = NA, se = FALSE, n_cores = NULL, verbose = FALSE)
model |
A ranger model object from |
df |
Input data to be used for predictions. |
se |
If |
n_cores |
Number of CPU cores to use for the model predictions. |
verbose |
Should the function give messages? |
Numeric vector or a named list containing two numeric vectors.
Stuart K. Grange
# Load package library(dplyr) # Prepare example data data_london_prepared <- data_london %>% filter(variable == "no2") %>% rmw_prepare_data() # Make a prediction with the examples vector_prediction <- rmw_predict( model_london, df = data_london_prepared ) # Make a prediction with standard errors too list_prediction <- rmw_predict( model_london, df = data_london_prepared, se = TRUE )
# Load package library(dplyr) # Prepare example data data_london_prepared <- data_london %>% filter(variable == "no2") %>% rmw_prepare_data() # Make a prediction with the examples vector_prediction <- rmw_predict( model_london, df = data_london_prepared ) # Make a prediction with standard errors too list_prediction <- rmw_predict( model_london, df = data_london_prepared, se = TRUE )
Function to calculate partial dependencies from a random forest models using a nested tibble.
rmw_predict_nested_partial_dependencies( df_nest, variables = NA, n_cores = NA, training_only = TRUE, rename = FALSE, verbose = FALSE, progress = FALSE )
rmw_predict_nested_partial_dependencies( df_nest, variables = NA, n_cores = NA, training_only = TRUE, rename = FALSE, verbose = FALSE, progress = FALSE )
df_nest |
Nested tibble created by |
variables |
Vector of variables to calculate partial dependencies for. |
n_cores |
Number of CPU cores to use for the model calculations. |
training_only |
Should only the training set be used for prediction? |
rename |
Within the |
verbose |
Should the function give messages? |
progress |
Should a progress bar be displayed? |
Nested tibble.
Stuart K. Grange
rmw_nest_for_modelling
,
rmw_model_nested_sets
, rmw_partial_dependencies
Function to make predictions from a random forest models using a nested tibble.
rmw_predict_nested_sets( df_nest, se = FALSE, n_cores = NULL, keep_vectors = FALSE, model_errors = FALSE, as_long = TRUE, partial = FALSE, verbose = FALSE, progress = FALSE )
rmw_predict_nested_sets( df_nest, se = FALSE, n_cores = NULL, keep_vectors = FALSE, model_errors = FALSE, as_long = TRUE, partial = FALSE, verbose = FALSE, progress = FALSE )
df_nest |
Nested tibble created by |
se |
Should the standard error of the predictions be calculated? |
n_cores |
Number of CPU cores to use for the model calculations. |
keep_vectors |
Should the prediction vectors be kept in the return? This
is usually not needed because these vectors have been added to the
|
model_errors |
Should model error statistics between the observed and predicted values be calculated and returned? |
as_long |
For when |
partial |
Should the model's partial dependencies also be calculated? This will increase the execution time of the function. |
verbose |
Should the function give messages? |
progress |
Should a progress bar be displayed? |
Nested tibble.
Stuart K. Grange
rmw_nest_for_modelling
,
rmw_model_nested_sets
, rmw_predict
,
rmw_calculate_model_errors
,
rmw_partial_dependencies
Function to make predictions by meteorological year from a random forest models using a nested tibble.
rmw_predict_nested_sets_by_year( df_nest, variables = NA, n_samples = 10, aggregate = TRUE, n_cores = NULL, verbose = FALSE )
rmw_predict_nested_sets_by_year( df_nest, variables = NA, n_samples = 10, aggregate = TRUE, n_cores = NULL, verbose = FALSE )
df_nest |
Nested tibble created by |
variables |
Variables to randomly sample. Default is all variables used
for training the model with the exception of |
n_samples |
Number of times to sample the observations from each meteorological year and then predict. |
aggregate |
Should all the |
n_cores |
Number of CPU cores to use for the model calculations. |
verbose |
Should the function give messages? |
Nested tibble.
Stuart K. Grange
rmw_nest_for_modelling
,
rmw_model_nested_sets
rmw_calculate_model
.rmw_predict_the_test_set
uses data withheld from the training of the
model and therefore can be used for investigating overfitting.
rmw_predict_the_test_set(model, df)
rmw_predict_the_test_set(model, df)
model |
A ranger model object from |
df |
Input data used to calculate |
Tibble.
Stuart K. Grange
# Load package library(dplyr) # Prepare example data data_london_prepared <- data_london %>% filter(variable == "no2") %>% rmw_prepare_data() # Use the test set for prediction rmw_predict_the_test_set( model_london, df = data_london_prepared ) # Predict, then produce a hex plot of the predictions rmw_predict_the_test_set( model_london, df = data_london_prepared ) %>% rmw_plot_test_prediction()
# Load package library(dplyr) # Prepare example data data_london_prepared <- data_london %>% filter(variable == "no2") %>% rmw_prepare_data() # Use the test set for prediction rmw_predict_the_test_set( model_london, df = data_london_prepared ) # Predict, then produce a hex plot of the predictions rmw_predict_the_test_set( model_london, df = data_london_prepared ) %>% rmw_plot_test_prediction()
rmw_prepare_data
will test and prepare a data frame for further use
with rmweather.
rmw_prepare_data( df, value = "value", na.rm = FALSE, replace = FALSE, fraction = 0.8 )
rmw_prepare_data( df, value = "value", na.rm = FALSE, replace = FALSE, fraction = 0.8 )
df |
Input data frame. Generally a time series of air quality data with pollutant concentrations and meteorological variables. |
value |
Name of the dependent variable. Usually a pollutant, for example,
|
na.rm |
Should missing values ( |
replace |
When adding the date variables to the set, should they replace the versions already contained in the data frame if they exist? |
fraction |
Fraction of the observations to make up the training set. Default is 0.8, 80 %. |
rmw_prepare_data
will check if a date
variable is present and
is of the correct data type, impute missing numeric and categorical values,
randomly split the input into training and testing sets, and rename the
dependent variable to "value"
. The date
variable will also be
used to calculate new variables such as date_unix
, day_julian
,
weekday
, and hour
which can be used as independent variables.
These attributes are needed for other rmweather functions to operate.
Use set.seed
in an R session to keep results reproducible.
Tibble, the input data transformed ready for modelling with rmweather.
Stuart K. Grange
set.seed
, rmw_train_model
,
rmw_normalise
# Load package library(dplyr) # Keep things reproducible set.seed(123) # Prepare example data for modelling, only use no2 data here data_london_prepared <- data_london %>% filter(variable == "no2") %>% rmw_prepare_data()
# Load package library(dplyr) # Keep things reproducible set.seed(123) # Prepare example data for modelling, only use no2 data here data_london_prepared <- data_london %>% filter(variable == "no2") %>% rmw_prepare_data()
Function to train a random forest model to predict (usually) pollutant concentrations using meteorological and time variables.
rmw_train_model( df, variables, n_trees = 300, mtry = NULL, min_node_size = 5, keep_inbag = TRUE, n_cores = NA, verbose = FALSE )
rmw_train_model( df, variables, n_trees = 300, mtry = NULL, min_node_size = 5, keep_inbag = TRUE, n_cores = NA, verbose = FALSE )
df |
Input tibble after preparation with |
variables |
Independent/explanatory variables used to predict
|
n_trees |
Number of trees to grow to make up the forest. |
mtry |
Number of variables to possibly split at in each node. Default is the (rounded down) square root of the number variables. |
min_node_size |
Minimal node size. |
keep_inbag |
Should in-bag data be kept in the ranger model
object? This needs to be |
n_cores |
Number of CPU cores to use for the model calculation. Default is system's total minus one. |
verbose |
Should the function give messages? |
A ranger model object, a named list.
Stuart K. Grange
rmw_prepare_data
, rmw_normalise
# Load package library(dplyr) # Keep things reproducible set.seed(123) # Prepare example data data_london_prepared <- data_london %>% filter(variable == "no2") %>% rmw_prepare_data() # Calculate a model using common meteorological and time variables model <- rmw_train_model( data_london_prepared, variables = c( "ws", "wd", "air_temp", "rh", "date_unix", "day_julian", "weekday", "hour" ), n_trees = 300 )
# Load package library(dplyr) # Keep things reproducible set.seed(123) # Prepare example data data_london_prepared <- data_london %>% filter(variable == "no2") %>% rmw_prepare_data() # Calculate a model using common meteorological and time variables model <- rmw_train_model( data_london_prepared, variables = c( "ws", "wd", "air_temp", "rh", "date_unix", "day_julian", "weekday", "hour" ), n_trees = 300 )
Function to return the system's number of CPU cores.
system_cpu_core_count(logical_cores = TRUE, max_cores = NA)
system_cpu_core_count(logical_cores = TRUE, max_cores = NA)
logical_cores |
Should logical cores be included in the core count? |
max_cores |
Should the return have a maximum value? This can be useful when there are very many cores and logic is being built. |
Stuart K. Grange
1
is Monday and
7
is Sunday.Function to get weekday number from a date where 1
is Monday and
7
is Sunday.
wday_monday(x, as.factor = FALSE)
wday_monday(x, as.factor = FALSE)
x |
Date vector. |
as.factor |
Should the return be a factor? |
Numeric vector.
Stuart K. Grange
Squash the global variable notes when building a package.