Package 'RLT' reference manual

Title:	Reinforcement Learning Trees
Description:	Random forest with a variety of additional features for regression, classification, survival analysis and graphical model. New features include parallel computing with OpenMP, reproduciblity with random seeds, variance and confidence band estimations, embedded model for selecting splitting varibles and constructing linear combination splits, observaton and variable weights, setting and tracking subjects used in each tree, etc.
Authors:	Ruoqing Zhu [aut, cre, cph] , Sarah Formentini [aut], Haowen Zhou [ctb], Tianning Xu [ctb], Zhechao Huang [ctb]
Maintainer:	Ruoqing Zhu <[email protected]>
License:	GPL (>= 2)
Version:	4.2.6
Built:	2025-03-08 03:30:53 UTC
Source:	https://github.com/teazrq/rlt

C-index

Description

Calculate c-index for survival data

Usage

cindex(y, censor, pred)
cindex(y, censor, pred)

Arguments

`y`	survival time
`censor`	The censoring indicator if survival model is used
`pred`	the predicted value for each subject

Value

c-index

      random forest kernel

Description

Get random forest induced kernel weight matrix of testing samples 
             or between any two sets of data. This is an experimental feature.
             Use at your own risk.

Usage

forest.kernel(
  object,
  X1 = NULL,
  X2 = NULL,
  vs.train = FALSE,
  verbose = FALSE,
  ...
)
forest.kernel(
  object,
  X1 = NULL,
  X2 = NULL,
  vs.train = FALSE,
  verbose = FALSE,
  ...
)

Arguments

`object`	A fitted RLT object.
`X1`	The dataset for prediction. This calculates an $n_1 \times n_1$ kernel matrix of `X1`.
`X2`	The dataset for reference/training. If `X2` is supplied, then calculate an $n_1 \times n_2$ kernel matrix. If `vs.train` is used, then this must be the original training data.
`vs.train`	To calculate the kernel weights with respect to the training data. This is slightly different than supplying the training data to `X2` due to re-samplings of the training process. To use this feature, you must specify `resample.track = TRUE` in `param.control` when fitting the forest
`verbose`	Whether fitting should be printed.
`...`	... Additional arguments.

Value

A kernel matrix that contains kernel weights for each observation in X1 with respect to X1

Print a single tree

Description

Print a single fitted tree from a forest object

Usage

get.one.tree(x, tree = 1, ...)
get.one.tree(x, tree = 1, ...)

Arguments

`x`	A fitted RLT object
`tree`	the tree number, starting from 1 to `ntrees`.
`...`	...

      get.surv.band

Description

Calculate the survival function (two-sided) confidence band from 
             a RLT survival prediction.

Usage

get.surv.band(
  x,
  i = 0,
  alpha = 0.05,
  approach = "naive-mc",
  nsim = 1000,
  r = 3,
  ...
)
get.surv.band(
  x,
  i = 0,
  alpha = 0.05,
  approach = "naive-mc",
  nsim = 1000,
  r = 3,
  ...
)

Arguments

`x`	A RLT prediction object. This must be an object calculated from a forest with `var.ready = TRUE`.
`i`	Observation number in the prediction. Default to calculate all ( $i = 0$ )
`alpha`	alpha level for interval $(\alpha/2, 1 - \alpha/2)$
`approach`	What approach is used to calculate the confidence band. Can be `naive-mc`: positive-definite projection of the covariance matrix. the confidence band is non-smooth `smoothed-mc`: use a smoothed marginal variance to perform the Monte Carlo approximation of the critical value. This is only recommended for large number of time points. `smoothed-lr`: use a smoothed low-rank approximation of the covariance matrix and apply an adaptive Bonferroni correction to derive the critical values. Note that this approach relies on the assumption of the smoothness and low rank of the covariance matrix.
`nsim`	number of simulations for estimating the Monte Carlo critical value. Set this to be a large number. Default is 1000.
`r`	maximum number of ranks used in the `smoothed-lr` approximation. Usually 5 is enough for approximating the covariance matrix due to smoothness.
`...`	...

mytest

Description

my function

Usage

mytest(n, ...)
mytest(n, ...)

Arguments

`n`	n
`...`	other arguments

Value

output

prediction using RLT

Description

Predict the outcome (regression, classification or survival) using a fitted RLT object

Usage

## S3 method for class 'RLT'
predict(
  object,
  testx = NULL,
  var.est = FALSE,
  keep.all = FALSE,
  ncores = 1,
  verbose = 0,
  ...
)
## S3 method for class 'RLT'
predict(
  object,
  testx = NULL,
  var.est = FALSE,
  keep.all = FALSE,
  ncores = 1,
  verbose = 0,
  ...
)

Arguments

`object`	A fitted RLT object
`testx`	The testing samples, must have the same structure as the training samples
`var.est`	Whether to estimate the variance of each testing data. The original forest must be fitted with `var.ready = TRUE`. For survival forests, calculates the covariance matrix over all observed time points and calculates critical value for the confidence band.
`keep.all`	whether to keep the prediction from all trees. Warning: this can occupy a large storage space, especially in survival model
`ncores`	number of cores
`verbose`	print additional information
`...`	...

Value

A RLT prediction object, constructed as a list consisting

`Prediction`	Prediction
`Variance`	if `var.est = TRUE` and the fitted object is `var.ready = TRUE`

For Survival Forests

`hazard`	predicted hazard functions
`CumHazard`	predicted cumulative hazard function
`Survival`	predicted survival function
`Allhazard`	if `keep.all = TRUE`, the predicted hazard function for each observation and each tree
`AllCHF`	if `keep.all = TRUE`, the predicted cumulative hazard function for each observation and each tree
`Cov`	if `var.est = TRUE` and the fitted object is `var.ready = TRUE`. For each test subject, a matrix of size NFail $\times$ NFail where NFail is the number of observed failure times in the training data
`Var`	if `var.est = TRUE` and the fitted object is `var.ready = TRUE`. Marginal variance for each subject
`timepoints`	ordered observed failure times from the training data
`MarginalVar`	if `var.est = TRUE` and the fitted object is `var.ready = TRUE`. Marginal variance for each subject from the Cov matrix projected to the nearest positive definite matrix
`MarginalVarSmooth`	if `var.est = TRUE` and the fitted object is `var.ready = TRUE`. Marginal variance for each subject from the Cov matrix projected to the nearest positive definite matrix and then smoothed using Gaussian kernel smoothing
`CVproj`	if `var.est = TRUE` and the fitted object is `var.ready = TRUE`. Critical values to calculate confidence bands around cumulative hazard predictions at several confidence levels. Calculated using `MarginalVar`
`CVprojSmooth`	if `var.est = TRUE` and the fitted object is `var.ready = TRUE`. Critical values to calculate confidence bands around cumulative hazard predictions at several confidence levels. Calculated using `MarginalVarSmooth`

Print a RLT object

Description

Print a RLT object

Usage

## S3 method for class 'RLT'
print(x, ...)
## S3 method for class 'RLT'
print(x, ...)

Arguments

`x`	A fitted RLT object
`...`	...

            Reinforcement Learning Trees

Description

      Fit models for regression, classification and survival 
                   analysis using reinforced splitting rules. The model 
                   fits regular random forest models by default unless the
                   parameter \code{reinforcement} is set to `"TRUE"`. Using 
                   \code{reinforcement = TRUE} activates embedded model for 
                   splitting variable selection and allows linear combination 
                   split. To specify parameters of embedded models, see 
                   definition of \code{param.control} for details.

Usage

RLT(
  x,
  y,
  censor = NULL,
  model = NULL,
  ntrees = if (reinforcement) 100 else 500,
  mtry = max(1, as.integer(ncol(x)/3)),
  nmin = max(1, as.integer(log(nrow(x)))),
  split.gen = "random",
  nsplit = 1,
  resample.replace = TRUE,
  resample.prob = if (resample.replace) 1 else 0.8,
  resample.preset = NULL,
  obs.w = NULL,
  var.w = NULL,
  importance = FALSE,
  reinforcement = FALSE,
  param.control = list(),
  ncores = 0,
  verbose = 0,
  seed = NULL,
  ...
)
RLT(
  x,
  y,
  censor = NULL,
  model = NULL,
  ntrees = if (reinforcement) 100 else 500,
  mtry = max(1, as.integer(ncol(x)/3)),
  nmin = max(1, as.integer(log(nrow(x)))),
  split.gen = "random",
  nsplit = 1,
  resample.replace = TRUE,
  resample.prob = if (resample.replace) 1 else 0.8,
  resample.preset = NULL,
  obs.w = NULL,
  var.w = NULL,
  importance = FALSE,
  reinforcement = FALSE,
  param.control = list(),
  ncores = 0,
  verbose = 0,
  seed = NULL,
  ...
)

Arguments

`x`	A `matrix` or `data.frame` of features. If `x` is a data.frame, then all factors are treated as categorical variables, which will go through an exhaustive search of splitting criteria.
`y`	Response variable. a `numeric`/`factor` vector.
`censor`	Censoring indicator if survival model is used.
`model`	The model type: `"regression"`, `"classification"`, `"quantile"`, `"survival"` or `"graph"`.
`ntrees`	Number of trees, `ntrees = 100` if reinforcement is used and `ntrees = 1000` otherwise.
`mtry`	Number of randomly selected variables used at each internal node.
`nmin`	Terminal node size. Splitting will stop when the internal node size is less equal to `nmin`.
`split.gen`	How the cutting points are generated: `"random"`, `"rank"` or `"best"`. If minimum child node size is enforced (`alpha` $> 0$), then `"rank"` and `"best"` should be used.
`nsplit`	Number of random cutting points to compare for each variable at an internal node.
`resample.replace`	Whether the in-bag samples are obtained with replacement.
`resample.prob`	Proportion of in-bag samples.
`resample.preset`	A pre-specified matrix for in-bag data indicator/count matrix. It must be an $n \times$ `ntrees` matrix with integer entries. Positive number indicates the number of copies of that observation (row) in the corresponding tree (column); zero indicates out-of-bag; negative values indicates not being used in either. Extremely large counts should be avoided. The sum of each column should not exceed $n$ .
`obs.w`	Observation weights. The weights will be used for calculating the splitting scores, such as a weighted variance reduction or weighted gini index. But they will not be used for sampling observations. In that case, one can pre-specify `resample.preset` instead for balanced sampling, etc. For survival analysis, observation weights are not implemented in the `"logrank"` or `"suplogrank"` tests, due to the difficulty of calculating the variance of test statistic. However, it is used in the `"coxgrad"` splitting rule. For other models, this feature is currently not available.
`var.w`	Variable weights. If this is supplied, the default is to perform weighted sampling of `mtry` variables. For other usage, see the details of `split.rule` in `param.control`.
`importance`	Whether to calculate variable importance measures. When set to `"TRUE"` (or `"permute"`), the calculation follows Breiman's original permutation strategy. If set to `"distribute"`, then it sends the oob data to both child nodes with weights proportional to their sample sizes. Hence the final prediction is a weighted average of all possible terminal nodes that a perturbed observation could fall into. This feature is currently only available in regression and classification models.
`reinforcement`	Should reinforcement splitting rule be used. Default is `"FALSE"`, i.e., regular random forests with marginal search of splitting variable. When it is activated, an embedded model is fitted to find the best splitting variable or a linear combination of them, if `linear.comb` $> 1$. They can also be specified in `param.control`.
`param.control`	A list of additional parameters. This can be used to specify other features in a random forest or set embedded model parameters for reinforcement splitting rules. Using `reinforcement = TRUE` will automatically generate some default tuning for the embedded model. This mode is currently only available in regression. They are not necessarily optimized. `embed.ntrees`: number of trees in the embedded model `embed.mtry`: number or proportion of variables `embed.nmin`: terminal node size `embed.split.gen` random cutting point search method (`"random"`, `"rank"` or `"best"`) `embed.nsplit` number of random cutting points `embed.resample.replace` whether to sample with replacement `embed.resample.prob`: proportion of samples (of the internal node) in the embedded model `embed.mute` muting rate `embed.protect` number of protected variables `embed.threshold` threshold, as a fraction of the best VI, for being included in the protected set at an internal node. \code{linear.comb} is a separate feature that can be activated with or without using reinforcement. It creates linear combination of features as the splitting rule. Currently only available for regression. \itemize{ \item In reinforcement mode, a linear combination is created using the top continuous variables from the embedded model. If a categorical variable is the best, then a regular split will be used. The splitting point will be searched based on \code{split.rule} of the model. \item In non-reinforcement mode, a marginal screening is performed and the top features are used to construct the linear combination. This is an experimental feature. } \code{split.rule} is used to specify the criteria used to compare different splittings. Here are the available choices. The first one is the default: \itemize{ \item Regression: `"var"` (variance reduction); `"pca"` and `"sir"` can be used for linear combination splits \item Classification: `"gini"` (gini index) \item Survival: `"logrank"` (log-rank test), `"suplogrank"`, `"coxgrad"`. \item Quantile: `"ks"` (Kolmogorov-Smirnov test) \item Graph: `"spectral"` (spectral embedding with variance reduction) } \code{resample.track} indicates whether to keep track of the observations used in each tree. \code{var.ready} this is a feature to allow calculating variance (hence confidence intervals) of the random forest prediction. Currently only available for regression (Xu, Zhu & Shao, 2023) and confidence band in survival models (Formentini, Liang & Zhu, 2023). Please note that this only perpares the model fitting so that it is ready for the calculation. To obtain the confidence intervals, please see the prediction function. Specifying \code{var.ready = TRUE} has the following effect if these parameters are not already provided. For details of their restrictions, please see the orignal paper. \itemize{ \item \code{resample.preset} is constructed automatically \item \code{resample.replace} is set to `FALSE` \item \code{resample.prob} is set to \eqn{n / 2} \item \code{resample.track} is set to `TRUE` } It is recommended to use a very large \code{ntrees}, e.g, 10000 or larger. For \code{resample.prob} greater than \eqn{n / 2}, one should consider the bootstrap approach in Xu, Zhu & Shao (2023). \code{alpha} force a minimum proportion of samples (of the parent node) in each child node. \code{failcount} specifies the unique number of failure time points used in survival model. By default, all failure time points will be used. A smaller number may speed up the computation. The time points will be chosen uniformly on the quantiles of failure times, while must include the minimum and the maximum.
`ncores`	Number of CPU logical cores. Default is 0 (using all available cores).
`verbose`	Whether info should be printed.
`seed`	Random seed number to replicate a previously fitted forest. Internally, the `⁠xoshiro256++⁠` generator is used. If not specified, a seed will be generated automatically and recorded.
`...`	Additional arguments.

Value

A RLT fitted object, constructed as a list consisting

FittedForestFitted tree structures
VarImpVariable importance measures, if importance = TRUE
PredictionOut-of-bag prediction
ErrorOut-of-bag prediction error, adaptive to the model type
ObsTrackProvided if resample.track = TRUE, var.ready = TRUE, or if resample.preset was supplied. This is an n $\times$ ntrees matrix that has the same meaning as resample.preset.

For classification forests, these items are further provided or will replace the regression version

NClassThe number of classes
ProbOut-of-bag predicted probability

For survival forests, these items are further provided or will replace the regression version

timepointsordered observed failure times
NFailThe number of observed failure times
PredictionOut-of-bag prediciton of hazard function

References

Zhu, R., Zeng, D., & Kosorok, M. R. (2015) "Reinforcement Learning Trees." Journal of the American Statistical Association. 110(512), 1770-1784.
Xu, T., Zhu, R., & Shao, X. (2023) "On Variance Estimation of Random Forests with Infinite-Order U-statistics." arXiv preprint arXiv:2202.09008.
Formentini, S. E., Wei L., & Zhu, R. (2022) "Confidence Band Estimation for Survival Random Forests." arXiv preprint arXiv:2204.12038.

Package 'RLT'

Help Index

C-index

Description

Usage

Arguments

Value

random forest kernel

Description

Usage

Arguments

Value

Print a single tree

Description

Usage

Arguments

get.surv.band

Description

Usage

Arguments

mytest

Description

Usage

Arguments

Value

prediction using RLT

Description

Usage

Arguments

Value

Print a RLT object

Description

Usage

Arguments

Reinforcement Learning Trees

Description

Usage

Arguments

Value

References