Title: | Reinforcement Learning Trees |
---|---|
Description: | Random forest with a variety of additional features for regression, classification, survival analysis and graphical model. New features include parallel computing with OpenMP, reproduciblity with random seeds, variance and confidence band estimations, embedded model for selecting splitting varibles and constructing linear combination splits, observaton and variable weights, setting and tracking subjects used in each tree, etc. |
Authors: | Ruoqing Zhu [aut, cre, cph] |
Maintainer: | Ruoqing Zhu <[email protected]> |
License: | GPL (>= 2) |
Version: | 4.2.6 |
Built: | 2025-03-08 03:30:53 UTC |
Source: | https://github.com/teazrq/rlt |
Calculate c-index for survival data
cindex(y, censor, pred)
cindex(y, censor, pred)
y |
survival time |
censor |
The censoring indicator if survival model is used |
pred |
the predicted value for each subject |
c-index
random forest kernel
Get random forest induced kernel weight matrix of testing samples or between any two sets of data. This is an experimental feature. Use at your own risk.
forest.kernel( object, X1 = NULL, X2 = NULL, vs.train = FALSE, verbose = FALSE, ... )
forest.kernel( object, X1 = NULL, X2 = NULL, vs.train = FALSE, verbose = FALSE, ... )
object |
A fitted RLT object. |
X1 |
The dataset for prediction. This calculates an |
X2 |
The dataset for reference/training.
If |
vs.train |
To calculate the kernel weights with respect to the training data.
This is slightly different than supplying the training data to |
verbose |
Whether fitting should be printed. |
... |
... Additional arguments. |
A kernel matrix that contains kernel weights for each observation in X1
with respect to X1
Print a single fitted tree from a forest object
get.one.tree(x, tree = 1, ...)
get.one.tree(x, tree = 1, ...)
x |
A fitted RLT object |
tree |
the tree number, starting from 1 to |
... |
... |
get.surv.band
Calculate the survival function (two-sided) confidence band from a RLT survival prediction.
get.surv.band( x, i = 0, alpha = 0.05, approach = "naive-mc", nsim = 1000, r = 3, ... )
get.surv.band( x, i = 0, alpha = 0.05, approach = "naive-mc", nsim = 1000, r = 3, ... )
x |
A RLT prediction object. This must be an object calculated from a forest
with |
i |
Observation number in the prediction. Default to calculate all ( |
alpha |
alpha level for interval |
approach |
What approach is used to calculate the confidence band. Can be
|
nsim |
number of simulations for estimating the Monte Carlo critical value. Set this to be a large number. Default is 1000. |
r |
maximum number of ranks used in the |
... |
... |
my function
mytest(n, ...)
mytest(n, ...)
n |
n |
... |
other arguments |
output
Predict the outcome (regression, classification or survival) using a fitted RLT object
## S3 method for class 'RLT' predict( object, testx = NULL, var.est = FALSE, keep.all = FALSE, ncores = 1, verbose = 0, ... )
## S3 method for class 'RLT' predict( object, testx = NULL, var.est = FALSE, keep.all = FALSE, ncores = 1, verbose = 0, ... )
object |
A fitted RLT object |
testx |
The testing samples, must have the same structure as the training samples |
var.est |
Whether to estimate the variance of each testing data.
The original forest must be fitted with |
keep.all |
whether to keep the prediction from all trees. Warning: this can occupy a large storage space, especially in survival model |
ncores |
number of cores |
verbose |
print additional information |
... |
... |
A RLT
prediction object, constructed as a list consisting
Prediction |
Prediction |
Variance |
if |
For Survival Forests
hazard |
predicted hazard functions |
CumHazard |
predicted cumulative hazard function |
Survival |
predicted survival function |
Allhazard |
if |
AllCHF |
if |
Cov |
if |
Var |
if |
timepoints |
ordered observed failure times from the training data |
MarginalVar |
if |
MarginalVarSmooth |
if |
CVproj |
if |
CVprojSmooth |
if |
Print a RLT object
## S3 method for class 'RLT' print(x, ...)
## S3 method for class 'RLT' print(x, ...)
x |
A fitted RLT object |
... |
... |
Reinforcement Learning Trees
Fit models for regression, classification and survival analysis using reinforced splitting rules. The model fits regular random forest models by default unless the parameter \code{reinforcement} is set to `"TRUE"`. Using \code{reinforcement = TRUE} activates embedded model for splitting variable selection and allows linear combination split. To specify parameters of embedded models, see definition of \code{param.control} for details.
RLT( x, y, censor = NULL, model = NULL, ntrees = if (reinforcement) 100 else 500, mtry = max(1, as.integer(ncol(x)/3)), nmin = max(1, as.integer(log(nrow(x)))), split.gen = "random", nsplit = 1, resample.replace = TRUE, resample.prob = if (resample.replace) 1 else 0.8, resample.preset = NULL, obs.w = NULL, var.w = NULL, importance = FALSE, reinforcement = FALSE, param.control = list(), ncores = 0, verbose = 0, seed = NULL, ... )
RLT( x, y, censor = NULL, model = NULL, ntrees = if (reinforcement) 100 else 500, mtry = max(1, as.integer(ncol(x)/3)), nmin = max(1, as.integer(log(nrow(x)))), split.gen = "random", nsplit = 1, resample.replace = TRUE, resample.prob = if (resample.replace) 1 else 0.8, resample.preset = NULL, obs.w = NULL, var.w = NULL, importance = FALSE, reinforcement = FALSE, param.control = list(), ncores = 0, verbose = 0, seed = NULL, ... )
x |
A |
y |
Response variable. a |
censor |
Censoring indicator if survival model is used. |
model |
The model type: |
ntrees |
Number of trees, |
mtry |
Number of randomly selected variables used at each internal node. |
nmin |
Terminal node size. Splitting will stop when the internal
node size is less equal to |
split.gen |
How the cutting points are generated: |
nsplit |
Number of random cutting points to compare for each variable at an internal node. |
resample.replace |
Whether the in-bag samples are obtained with replacement. |
resample.prob |
Proportion of in-bag samples. |
resample.preset |
A pre-specified matrix for in-bag data indicator/count
matrix. It must be an |
obs.w |
Observation weights. The weights will be used for calculating
the splitting scores, such as a weighted variance reduction
or weighted gini index. But they will not be used for
sampling observations. In that case, one can pre-specify
|
var.w |
Variable weights. If this is supplied, the default is to
perform weighted sampling of |
importance |
Whether to calculate variable importance measures. When
set to |
reinforcement |
Should reinforcement splitting rule be used. Default
is |
param.control |
A list of additional parameters. This can be used to
specify other features in a random forest or set embedded
model parameters for reinforcement splitting rules.
Using
\code{linear.comb} is a separate feature that can be activated with or without using reinforcement. It creates linear combination of features as the splitting rule. Currently only available for regression. \itemize{ \item In reinforcement mode, a linear combination is created using the top continuous variables from the embedded model. If a categorical variable is the best, then a regular split will be used. The splitting point will be searched based on \code{split.rule} of the model. \item In non-reinforcement mode, a marginal screening is performed and the top features are used to construct the linear combination. This is an experimental feature. } \code{split.rule} is used to specify the criteria used to compare different splittings. Here are the available choices. The first one is the default: \itemize{ \item Regression: `"var"` (variance reduction); `"pca"` and `"sir"` can be used for linear combination splits \item Classification: `"gini"` (gini index) \item Survival: `"logrank"` (log-rank test), `"suplogrank"`, `"coxgrad"`. \item Quantile: `"ks"` (Kolmogorov-Smirnov test) \item Graph: `"spectral"` (spectral embedding with variance reduction) } \code{resample.track} indicates whether to keep track of the observations used in each tree. \code{var.ready} this is a feature to allow calculating variance (hence confidence intervals) of the random forest prediction. Currently only available for regression (Xu, Zhu & Shao, 2023) and confidence band in survival models (Formentini, Liang & Zhu, 2023). Please note that this only perpares the model fitting so that it is ready for the calculation. To obtain the confidence intervals, please see the prediction function. Specifying \code{var.ready = TRUE} has the following effect if these parameters are not already provided. For details of their restrictions, please see the orignal paper. \itemize{ \item \code{resample.preset} is constructed automatically \item \code{resample.replace} is set to `FALSE` \item \code{resample.prob} is set to \eqn{n / 2} \item \code{resample.track} is set to `TRUE` } It is recommended to use a very large \code{ntrees}, e.g, 10000 or larger. For \code{resample.prob} greater than \eqn{n / 2}, one should consider the bootstrap approach in Xu, Zhu & Shao (2023). \code{alpha} force a minimum proportion of samples (of the parent node) in each child node. \code{failcount} specifies the unique number of failure time points used in survival model. By default, all failure time points will be used. A smaller number may speed up the computation. The time points will be chosen uniformly on the quantiles of failure times, while must include the minimum and the maximum. |
ncores |
Number of CPU logical cores. Default is 0 (using all available cores). |
verbose |
Whether info should be printed. |
seed |
Random seed number to replicate a previously fitted forest.
Internally, the |
... |
Additional arguments. |
A RLT
fitted object, constructed as a list consisting
FittedForestFitted tree structures
VarImpVariable importance measures, if importance = TRUE
PredictionOut-of-bag prediction
ErrorOut-of-bag prediction error, adaptive to the model type
ObsTrackProvided if resample.track = TRUE
, var.ready = TRUE
,
or if resample.preset
was supplied. This is an n
ntrees
matrix that has the same meaning as resample.preset
.
For classification forests, these items are further provided or will replace the regression version
NClassThe number of classes
ProbOut-of-bag predicted probability
For survival forests, these items are further provided or will replace the regression version
timepointsordered observed failure times
NFailThe number of observed failure times
PredictionOut-of-bag prediciton of hazard function
Zhu, R., Zeng, D., & Kosorok, M. R. (2015) "Reinforcement Learning Trees." Journal of the American Statistical Association. 110(512), 1770-1784.
Xu, T., Zhu, R., & Shao, X. (2023) "On Variance Estimation of Random Forests with Infinite-Order U-statistics." arXiv preprint arXiv:2202.09008.
Formentini, S. E., Wei L., & Zhu, R. (2022) "Confidence Band Estimation for Survival Random Forests." arXiv preprint arXiv:2204.12038.