| Title: | Semi-Parametric Dimension Reduction Models Using Orthogonality Constrained Optimization |
|---|---|
| Description: | Utilize an orthogonality constrained optimization algorithm of Wen & Yin (2013) <DOI:10.1007/s10107-012-0584-1> to solve a variety of dimension reduction problems in the semiparametric framework, such as Ma & Zhu (2012) <DOI:10.1080/01621459.2011.646925>, Ma & Zhu (2013) <DOI:10.1214/12-AOS1072>, Sun, Zhu, Wang & Zeng (2017) <doi:10.48550/arXiv.1704.05046> and Zhou & Zhu (2018+) <doi:10.48550/arXiv.1802.06156>. It also serves as a general purpose optimization solver for problems with orthogonality constraints. Parallel computing for approximating the gradient is enabled through `OpenMP'. |
| Authors: | Ruilin Zhao [aut], Wenzhuo Zhou [aut], Ruoqing Zhu [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-0753-5716>), Jiyang Zhang [ctb], Peng Xu [ctb] |
| Maintainer: | Ruoqing Zhu <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 2.0.0 |
| Built: | 2026-06-01 16:59:09 UTC |
| Source: | https://github.com/teazrq/orthodr |
Fits the CP-SIR model for right-censored survival outcomes. This model is
correct only under strong assumptions, but since it requires only an SVD,
its solution is commonly used as the initial value in
orthoDr_surv optimization.
CP_SIR(x, y, censor, bw = silverman(1, length(y)))CP_SIR(x, y, censor, bw = silverman(1, length(y)))
x |
A matrix (n x p) of features (continuous only). |
y |
A numeric vector of observed survival times. |
censor |
A numeric vector of censoring indicators (1 = event, 0 = censored). |
bw |
Numeric; kernel bandwidth for nonparametric estimation
(one-dimensional). Default: Silverman's rule via |
A list with components:
valuesEigenvalues of the estimation matrix.
vectorsEstimated directions (columns), ordered by eigenvalues.
Sun, Q., Zhu, R., Wang, T. and Zeng, D. (2017). Counting process based dimension reduction method for censored outcomes. arXiv preprint doi:10.48550/arXiv.1704.05046.
# This is setting 1 in Sun et. al. (2017) with reduced sample size library(MASS) set.seed(1) N = 200; P = 6 V=0.5^abs(outer(1:P, 1:P, "-")) dataX = as.matrix(mvrnorm(N, mu=rep(0,P), Sigma=V)) failEDR = as.matrix(c(1, 0.5, 0, 0, 0, rep(0, P-5))) censorEDR = as.matrix(c(0, 0, 0, 1, 1, rep(0, P-5))) T = rexp(N, exp(dataX %*% failEDR)) C = rexp(N, exp(dataX %*% censorEDR - 1)) ndr = 1 Y = pmin(T, C) Censor = (T < C) # fit the model cpsir.fit = CP_SIR(dataX, Y, Censor) distance(failEDR, cpsir.fit$vectors[, 1:ndr, drop = FALSE], "dist")# This is setting 1 in Sun et. al. (2017) with reduced sample size library(MASS) set.seed(1) N = 200; P = 6 V=0.5^abs(outer(1:P, 1:P, "-")) dataX = as.matrix(mvrnorm(N, mu=rep(0,P), Sigma=V)) failEDR = as.matrix(c(1, 0.5, 0, 0, 0, rep(0, P-5))) censorEDR = as.matrix(c(0, 0, 0, 1, 1, rep(0, P-5))) T = rexp(N, exp(dataX %*% failEDR)) C = rexp(N, exp(dataX %*% censorEDR - 1)) ndr = 1 Y = pmin(T, C) Censor = (T < C) # fit the model cpsir.fit = CP_SIR(dataX, Y, Censor) distance(failEDR, cpsir.fit$vectors[, 1:ndr, drop = FALSE], "dist")
Cross distance matrix. An extension to the dist() function. Calculate the Gaussian kernel distance between rows of X1 and rows of X2
dist_cross(x1, x2)dist_cross(x1, x2)
x1 |
first data matrix |
x2 |
second data matrix |
A distance matrix, with its (i, j)th element being the Gaussian kernel distance between ith row of X1 jth row of X2.
# two matrices set.seed(1) x1 = matrix(rnorm(10), 5, 2) x2 = matrix(rnorm(6), 3, 2) dist_cross(x1, x2)# two matrices set.seed(1) x1 = matrix(rnorm(10), 5, 2) x2 = matrix(rnorm(6), 3, 2) dist_cross(x1, x2)
Calculate a distance metric between two linear subspaces spanned by the
columns of s1 and s2. Several metrics are supported.
distance(s1, s2, type = "dist", x = NULL)distance(s1, s2, type = "dist", x = NULL)
s1 |
A matrix whose columns span the first subspace. |
s2 |
A matrix whose columns span the second subspace. |
type |
Character; the distance measure to use:
|
x |
A matrix of covariate values, required only when
|
A numeric scalar: the distance (or correlation) between the two
subspaces. Interpretation depends on type.
# two spaces failEDR = as.matrix(cbind(c(1, 1, 0, 0, 0, 0), c(0, 0, 1, -1, 0, 0))) B = as.matrix(cbind(c(0.1, 1.1, 0, 0, 0, 0), c(0, 0, 1.1, -0.9, 0, 0))) distance(failEDR, B, "dist") distance(failEDR, B, "trace") N=300 P=6 dataX = matrix(rnorm(N*P), N, P) distance(failEDR, B, "canonical", dataX)# two spaces failEDR = as.matrix(cbind(c(1, 1, 0, 0, 0, 0), c(0, 0, 1, -1, 0, 0))) B = as.matrix(cbind(c(0.1, 1.1, 0, 0, 0, 0), c(0, 0, 1.1, -0.9, 0, 0))) distance(failEDR, B, "dist") distance(failEDR, B, "trace") N=300 P=6 dataX = matrix(rnorm(N*P), N, P) distance(failEDR, B, "canonical", dataX)
An almost direct R translation of Xia, Zhang & Xu's (2010) hMAVE Matlab
code, with additional options for supplying a custom initial value.
This algorithm does not use the orthogonality constrained
optimization framework of ortho_optim.
hMave(x, y, censor, m0, B0 = NULL)hMave(x, y, censor, m0, B0 = NULL)
x |
A matrix (n x p) of features (continuous only). |
y |
A numeric vector of observed survival times. |
censor |
A numeric vector of censoring indicators (1 = event, 0 = censored). |
m0 |
Integer; number of structural dimensions to estimate. |
B0 |
An optional initial |
A list with components:
BThe estimated direction matrix (p x m0).
cvLeave-one-out cross-validation error.
Xia, Y., Zhang, D., & Xu, J. (2010). Dimension reduction and semiparametric estimation of survival models. Journal of the American Statistical Association, 105(489), 278–290. doi:10.1198/jasa.2009.tm09372.
# generate some survival data set.seed(1) P = 7 N = 150 dataX = matrix(runif(N*P), N, P) failEDR = as.matrix(cbind(c(1, 1.3, -1.3, 1, -0.5, 0.5, -0.5, rep(0, P-7)))) T = exp(dataX %*% failEDR + rnorm(N)) C = runif(N, 0, 15) Y = pmin(T, C) Censor = (T < C) # fit the model hMave.fit = hMave(dataX, Y, Censor, 1)# generate some survival data set.seed(1) P = 7 N = 150 dataX = matrix(runif(N*P), N, P) failEDR = as.matrix(cbind(c(1, 1.3, -1.3, 1, -0.5, 0.5, -0.5, rep(0, P-7)))) T = exp(dataX %*% failEDR + rnorm(N)) C = runif(N, 0, 15) Y = pmin(T, C) Censor = (T < C) # fit the model hMave.fit = hMave(dataX, Y, Censor, 1)
Calculate the Gaussian kernel weights between rows of X1 and rows of X2
kernel_weight(x1, x2, kernel = "gaussian", dist = "euclidean")kernel_weight(x1, x2, kernel = "gaussian", dist = "euclidean")
x1 |
first data matrix |
x2 |
second data matrix |
kernel |
the kernel function, currently only using Gaussian kernel |
dist |
the distance metric, currently only using the Euclidean distance |
A distance matrix, with its (i, j)th element being the kernel weights for the ith row of X1 jth row of X2.
# two matrices set.seed(1) x1 = matrix(rnorm(10), 5, 2) x2 = matrix(rnorm(6), 3, 2) kernel_weight(x1, x2)# two matrices set.seed(1) x1 = matrix(rnorm(10), 5, 2) x2 = matrix(rnorm(6), 3, 2) kernel_weight(x1, x2)
A general-purpose optimization solver for problems with orthogonality
constraints on the columns of B. Implements the curvilinear search
algorithm of Wen & Yin (2013), which ensures that B remains on the
Stiefel manifold throughout optimization.
If no analytical gradient is provided, a numerical approximation is used.
ortho_optim( B, fn, grad = NULL, ..., maximize = FALSE, control = list(), maxitr = 500, verbose = 0 )ortho_optim( B, fn, grad = NULL, ..., maximize = FALSE, control = list(), maxitr = 500, verbose = 0 )
B |
Initial |
fn |
Objective function. The first argument must be |
grad |
Gradient function. The first argument must be |
... |
Additional named arguments passed to |
maximize |
Logical; if |
control |
A list of tuning parameters for the optimizer:
|
maxitr |
Maximum number of iterations. Default: |
verbose |
Integer; if > 0, prints iteration progress. Default: 0. |
An object of class c("orthoDr", "fit", "optim"), which is a
list containing:
BThe optimized matrix (columns are orthonormal).
fnThe final objective function value.
fn_SeqNumeric vector of objective values at each iteration
(length maxitr, padded with zeros after convergence).
itrNumber of iterations performed.
convergeConvergence code.
method"true gradient" or "approx. gradient".
Wen, Z. and Yin, W. (2013). A feasible method for optimization with orthogonality constraints. Mathematical Programming, 142(1-2), 397–434. doi:10.1007/s10107-012-0584-1.
Zhang, H. and Hager, W. W. (2004). A nonmonotone line search technique and its application to unconstrained optimization. SIAM Journal on Optimization, 14(4), 1043–1056. doi:10.1137/S1052623403426556.
# Eigenvalue problem: minimize -0.5 * tr(B'A B) s.t. B'B = I library(pracma) set.seed(1) n <- 100; k <- 6 A <- matrix(rnorm(n * n), n, n) A <- t(A) %*% A B <- gramSchmidt(matrix(rnorm(n * k), n, k))$Q fx <- function(B, A) -0.5 * sum(diag(t(B) %*% A %*% B)) gx <- function(B, A) -A %*% B fit <- ortho_optim(B, fx, gx, A = A) fx(fit$B, A) # Compare with the analytical solution from eigen() sol <- eigen(A)$vectors[, 1:k] fx(sol, A)# Eigenvalue problem: minimize -0.5 * tr(B'A B) s.t. B'B = I library(pracma) set.seed(1) n <- 100; k <- 6 A <- matrix(rnorm(n * n), n, n) A <- t(A) %*% A B <- gramSchmidt(matrix(rnorm(n * k), n, k))$Q fx <- function(B, A) -0.5 * sum(diag(t(B) %*% A %*% B)) gx <- function(B, A) -A %*% B fit <- ortho_optim(B, fx, gx, A = A) fx(fit$B, A) # Compare with the analytical solution from eigen() sol <- eigen(A)$vectors[, 1:k] fx(sol, A)
Fits a personalized dose model using the direct learning or pseudo-direct
learning method of Zhou & Zhu (2021). The direction matrix B
identifies the subspace of covariates that informs optimal dosing.
orthoDr_pdose( x, a, r, ndr = 2, B.initial = NULL, bw = NULL, lambda = 0.1, K = sqrt(length(r)), method = c("direct", "pseudo_direct"), keep.data = FALSE, control = list(), maxitr = 500, verbose = 0, ncore = 0 )orthoDr_pdose( x, a, r, ndr = 2, B.initial = NULL, bw = NULL, lambda = 0.1, K = sqrt(length(r)), method = c("direct", "pseudo_direct"), keep.data = FALSE, control = list(), maxitr = 500, verbose = 0, ncore = 0 )
x |
A matrix or data.frame of features (continuous only). |
a |
A numeric vector of observed dose levels. |
r |
A numeric vector of observed rewards (outcomes). |
ndr |
Number of directions to estimate. |
B.initial |
Initial |
bw |
Kernel bandwidth, assuming each variable has unit variance.
If |
lambda |
Penalty level for kernel ridge regression. If a vector of
values is provided, GCV is used to select the best value.
Default: |
K |
Number of grid points for dose levels. Default:
|
method |
The method: |
keep.data |
Logical; if |
control |
A list of optimizer tuning parameters (see
|
maxitr |
Maximum number of iterations. Default: |
verbose |
Integer; if > 0, prints iteration progress. Default: 0. |
ncore |
Integer; number of cores for parallel computation via OpenMP.
0 = auto-detect. Default: |
An object of class c("orthoDr", "fit", "pdose"), which is a
list containing:
BThe estimated direction matrix (columns are orthonormal).
fnThe final objective function value.
itrNumber of iterations performed.
convergeConvergence code.
methodThe method used ("direct" or
"pseudo_direct").
keep.dataWhether original data was retained.
Zhou, W. and Zhu, R. (2021). A parsimonious personalized dose model via dimension reduction. Biometrika, 108(3), 643–659. doi:10.1093/biomet/asaa094.
# generate personalized dose data exampleset <- function(size, ncov) { X <- matrix(runif(size * ncov, -1, 1), ncol = ncov) A <- runif(size, 0, 2) Edr <- as.matrix(c(0.5, -0.5)) D_opt <- X %*% Edr + 1 mu <- 2 + 0.5 * (X %*% Edr) - 7 * abs(D_opt - A) R <- rnorm(length(mu), mu, 1) R <- R - min(R) list(X = X, A = A, R = R, D_opt = D_opt, mu = mu) } set.seed(123) n <- 150; p <- 2; ndr <- 1 train <- exampleset(n, p) test <- exampleset(500, p) # direct learning method orthofit <- orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = 0.1, method = "direct", K = sqrt(n), keep.data = TRUE, maxitr = 150, verbose = 0, ncore = 2) dose <- predict(orthofit, test$X) mean((test$D_opt - dose$pred)^2) # pseudo-direct learning method orthofit <- orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = seq(0.1, 0.2, 0.01), method = "pseudo_direct", K = as.integer(sqrt(n)), keep.data = TRUE, maxitr = 150, ncore = 2) dose <- predict(orthofit, test$X) mean((test$D_opt - dose$pred)^2)# generate personalized dose data exampleset <- function(size, ncov) { X <- matrix(runif(size * ncov, -1, 1), ncol = ncov) A <- runif(size, 0, 2) Edr <- as.matrix(c(0.5, -0.5)) D_opt <- X %*% Edr + 1 mu <- 2 + 0.5 * (X %*% Edr) - 7 * abs(D_opt - A) R <- rnorm(length(mu), mu, 1) R <- R - min(R) list(X = X, A = A, R = R, D_opt = D_opt, mu = mu) } set.seed(123) n <- 150; p <- 2; ndr <- 1 train <- exampleset(n, p) test <- exampleset(500, p) # direct learning method orthofit <- orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = 0.1, method = "direct", K = sqrt(n), keep.data = TRUE, maxitr = 150, verbose = 0, ncore = 2) dose <- predict(orthofit, test$X) mean((test$D_opt - dose$pred)^2) # pseudo-direct learning method orthofit <- orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = seq(0.1, 0.2, 0.01), method = "pseudo_direct", K = as.integer(sqrt(n)), keep.data = TRUE, maxitr = 150, ncore = 2) dose <- predict(orthofit, test$X) mean((test$D_opt - dose$pred)^2)
Fits a semiparametric sufficient dimension reduction model for continuous
outcomes using the methods of Ma & Zhu (2012, 2013). The direction matrix
B is estimated via orthogonality-constrained optimization on the
Stiefel manifold, ensuring .
Kernel density estimation operates on the scaled projections ,
so the procedure is robust to the scale of x. However, raw feature
values enter the Nadaraya–Watson estimator unscaled — users with features of
very different variance should standardise x beforehand. The outcome
y is internally scaled.
orthoDr_reg( x, y, method = c("sir", "save", "phd", "local", "seff"), ndr = 2, B.initial = NULL, bw = NULL, keep.data = FALSE, control = list(), maxitr = 500, verbose = 0, ncore = 0 )orthoDr_reg( x, y, method = c("sir", "save", "phd", "local", "seff"), ndr = 2, B.initial = NULL, bw = NULL, keep.data = FALSE, control = list(), maxitr = 500, verbose = 0, ncore = 0 )
x |
A numeric matrix of predictors (n observations |
y |
A numeric vector of continuous outcomes (length n). |
method |
The dimension reduction method. One of |
ndr |
Integer; number of directions to estimate (default 2). Must be
|
B.initial |
Initial |
bw |
X-kernel bandwidth. |
keep.data |
Logical; if |
control |
A named list of optimizer tolerances. Recognised entries and their defaults:
|
maxitr |
Maximum number of iterations (default 500). Clamped to
|
verbose |
Integer; set > 0 to print iteration progress and total elapsed time. Default 0. |
ncore |
Integer; number of CPU threads for parallel gradient approximation. 0 (default) uses all available threads. |
Semiparametric sliced inverse regression (Ma & Zhu 2012).
Semiparametric sliced average variance estimation.
Semiparametric principal Hessian directions.
Locally efficient estimator (Ma & Zhu 2013,
Section~3.1) with a normal posited model for the conditional
density. This yields a mean-regression estimator — it captures
dependence of on through the conditional
mean only, not higher moments. For with default
initialisation, a multi-start strategy runs LOCAL from SIR, SAVE,
and PHD starters and selects the result that maximises the
nonparametric of the Nadaraya–Watson regression of
on .
Semiparametric efficient estimator (Ma & Zhu 2013,
Section 3.2). Uses a two-step procedure: (1) obtain a root-
consistent initial via SAVE; (2) estimate the
nonparametric nuisance components (conditional density
and its derivative, plus ) once at ; (3) optimise the
efficient score with those estimates held fixed.
An object of class c("orthoDr", "fit", "reg"), a list with
components:
BEstimated direction matrix with
orthonormal columns.
fnFinal objective value.
itrNumber of iterations performed.
convergeConvergence code (0 = success).
methodMethod used (e.g., "sir").
keep.dataWhether original data was stored.
If keep.data = TRUE, the list also contains x, y,
and bw for prediction.
Ma, Y. and Zhu, L. (2012). A semiparametric approach to dimension reduction. Journal of the American Statistical Association, 107(497), 168–179. doi:10.1080/01621459.2011.646925.
Ma, Y. and Zhu, L. (2013). Efficient estimation in sufficient dimension reduction. Annals of Statistics, 41(1), 250–268. doi:10.1214/12-AOS1072.
set.seed(1) N <- 100; P <- 4 X <- matrix(rnorm(N * P), N, P) # Mean model — SIR Y <- -1 + X[, 1] + rnorm(N) fit_sir <- orthoDr_reg(X, Y, ndr = 1, method = "sir") fit_sir$B # Variance model — PHD Y <- -1 + X[, 1]^2 + rnorm(N) fit_phd <- orthoDr_reg(X, Y, ndr = 1, method = "phd") fit_phd$B # Efficient estimator (SEFF) — works on both mean and variance models fit_seff <- orthoDr_reg(X, Y, ndr = 1, method = "seff") fit_seff$B # Custom initial + prediction B0 <- matrix(c(1, 0, 0, 0), 4, 1) fit <- orthoDr_reg(X, Y, ndr = 1, B.initial = B0, keep.data = TRUE) predict(fit, X[1:5, ])set.seed(1) N <- 100; P <- 4 X <- matrix(rnorm(N * P), N, P) # Mean model — SIR Y <- -1 + X[, 1] + rnorm(N) fit_sir <- orthoDr_reg(X, Y, ndr = 1, method = "sir") fit_sir$B # Variance model — PHD Y <- -1 + X[, 1]^2 + rnorm(N) fit_phd <- orthoDr_reg(X, Y, ndr = 1, method = "phd") fit_phd$B # Efficient estimator (SEFF) — works on both mean and variance models fit_seff <- orthoDr_reg(X, Y, ndr = 1, method = "seff") fit_seff$B # Custom initial + prediction B0 <- matrix(c(1, 0, 0, 0), 4, 1) fit <- orthoDr_reg(X, Y, ndr = 1, B.initial = B0, keep.data = TRUE) predict(fit, X[1:5, ])
Fits a counting-process-based semiparametric dimension reduction (IR-CP)
model for right-censored survival outcomes (Sun, Zhu, Wang & Zeng, 2017).
Three estimating equations are available: "forward", "dn",
and "dm".
orthoDr_surv( x, y, censor, method = "dm", ndr = ifelse(method == "forward", 1, 2), B.initial = NULL, bw = NULL, keep.data = FALSE, control = list(), maxitr = 500, verbose = 0, ncore = 0 )orthoDr_surv( x, y, censor, method = "dm", ndr = ifelse(method == "forward", 1, 2), B.initial = NULL, bw = NULL, keep.data = FALSE, control = list(), maxitr = 500, verbose = 0, ncore = 0 )
x |
A matrix or data.frame of features. Columns are not automatically scaled to unit variance. |
y |
A numeric vector of observed survival times. |
censor |
A numeric vector of censoring indicators (1 = event, 0 = censored). |
method |
The estimating equation: |
ndr |
Number of directions to estimate. Default: |
B.initial |
Initial |
bw |
Kernel bandwidth, assuming each variable has unit variance.
If |
keep.data |
Logical; if |
control |
A list of optimizer tuning parameters (see
|
maxitr |
Maximum number of iterations. Default: |
verbose |
Integer; if > 0, prints iteration progress. Default: 0. |
ncore |
Integer; number of cores for parallel computation via OpenMP.
0 = auto-detect. Default: |
An object of class c("orthoDr", "fit", "surv"), which is a
list containing:
BThe estimated direction matrix (columns are orthonormal).
fnThe final objective function value.
itrNumber of iterations performed.
convergeConvergence code.
methodThe method used (e.g., "dm").
keep.dataWhether original data was retained.
Sun, Q., Zhu, R., Wang, T. and Zeng, D. (2017). Counting process based dimension reduction method for censored outcomes. arXiv preprint doi:10.48550/arXiv.1704.05046.
orthoDr_pdose, CP_SIR, view_dr_surv
# Setting 1 from Sun et al. (2017), reduced sample size library(MASS) set.seed(1) N <- 200; P <- 6 V <- 0.5^abs(outer(1:P, 1:P, "-")) dataX <- as.matrix(mvrnorm(N, mu = rep(0, P), Sigma = V)) failEDR <- as.matrix(c(1, 0.5, 0, 0, 0, rep(0, P - 5))) censorEDR <- as.matrix(c(0, 0, 0, 1, 1, rep(0, P - 5))) T <- rexp(N, exp(dataX %*% failEDR)) C <- rexp(N, exp(dataX %*% censorEDR - 1)) ndr <- 1 Y <- pmin(T, C) Censor <- (T < C) # forward model (1-d only) forward.fit <- orthoDr_surv(dataX, Y, Censor, method = "forward") distance(failEDR, forward.fit$B, "dist") # counting process model dn.fit <- orthoDr_surv(dataX, Y, Censor, method = "dn", ndr = ndr) distance(failEDR, dn.fit$B, "dist") # martingale model dm.fit <- orthoDr_surv(dataX, Y, Censor, method = "dm", ndr = ndr) distance(failEDR, dm.fit$B, "dist")# Setting 1 from Sun et al. (2017), reduced sample size library(MASS) set.seed(1) N <- 200; P <- 6 V <- 0.5^abs(outer(1:P, 1:P, "-")) dataX <- as.matrix(mvrnorm(N, mu = rep(0, P), Sigma = V)) failEDR <- as.matrix(c(1, 0.5, 0, 0, 0, rep(0, P - 5))) censorEDR <- as.matrix(c(0, 0, 0, 1, 1, rep(0, P - 5))) T <- rexp(N, exp(dataX %*% failEDR)) C <- rexp(N, exp(dataX %*% censorEDR - 1)) ndr <- 1 Y <- pmin(T, C) Censor <- (T < C) # forward model (1-d only) forward.fit <- orthoDr_surv(dataX, Y, Censor, method = "forward") distance(failEDR, forward.fit$B, "dist") # counting process model dn.fit <- orthoDr_surv(dataX, Y, Censor, method = "dn", ndr = ndr) distance(failEDR, dn.fit$B, "dist") # martingale model dm.fit <- orthoDr_surv(dataX, Y, Censor, method = "dm", ndr = ndr) distance(failEDR, dm.fit$B, "dist")
The prediction function for orthoDr fitted models
## S3 method for class 'orthoDr' predict(object, testx, ...)## S3 method for class 'orthoDr' predict(object, testx, ...)
object |
A fitted orthoDr object |
testx |
Testing data |
... |
... |
The predicted object
# generate some survival data N = 100; P = 4; dataX = matrix(rnorm(N*P), N, P) Y = exp(-1 + dataX[,1] + rnorm(N)) Censor = rbinom(N, 1, 0.8) # fit the model with keep.data = TRUE orthoDr.fit = orthoDr_surv(dataX, Y, Censor, ndr = 1, method = "dm", keep.data = TRUE) #predict 10 new observations predict(orthoDr.fit, matrix(rnorm(10*P), 10, P)) # generate some personalized dose scenario exampleset <- function(size,ncov){ X = matrix(runif(size*ncov,-1,1),ncol=ncov) A = runif(size,0,2) Edr = as.matrix(c(0.5,-0.5)) D_opt = X %*% Edr + 1 mu = 2 + 0.5*(X %*% Edr) - 7*abs(D_opt-A) R = rnorm(length(mu),mu,1) R = R - min(R) datainfo = list(X=X,A=A,R=R,D_opt=D_opt,mu=mu) return(datainfo) } # generate data set.seed(123) n = 150 p = 2 ndr =1 train = exampleset(n,p) test = exampleset(500,p) # the direct learning method orthofit = orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = 0.1, method = "direct", K = as.integer(sqrt(n)), keep.data = TRUE, maxitr = 150, verbose = FALSE, ncore = 2) predict(orthofit,test$X) # the pseudo direct learning method orthofit = orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = seq(0.1,0.2,0.01), method = "pseudo_direct", K = as.integer(sqrt(n)), keep.data = TRUE, maxitr = 150, verbose = FALSE, ncore = 2) predict(orthofit,test$X)# generate some survival data N = 100; P = 4; dataX = matrix(rnorm(N*P), N, P) Y = exp(-1 + dataX[,1] + rnorm(N)) Censor = rbinom(N, 1, 0.8) # fit the model with keep.data = TRUE orthoDr.fit = orthoDr_surv(dataX, Y, Censor, ndr = 1, method = "dm", keep.data = TRUE) #predict 10 new observations predict(orthoDr.fit, matrix(rnorm(10*P), 10, P)) # generate some personalized dose scenario exampleset <- function(size,ncov){ X = matrix(runif(size*ncov,-1,1),ncol=ncov) A = runif(size,0,2) Edr = as.matrix(c(0.5,-0.5)) D_opt = X %*% Edr + 1 mu = 2 + 0.5*(X %*% Edr) - 7*abs(D_opt-A) R = rnorm(length(mu),mu,1) R = R - min(R) datainfo = list(X=X,A=A,R=R,D_opt=D_opt,mu=mu) return(datainfo) } # generate data set.seed(123) n = 150 p = 2 ndr =1 train = exampleset(n,p) test = exampleset(500,p) # the direct learning method orthofit = orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = 0.1, method = "direct", K = as.integer(sqrt(n)), keep.data = TRUE, maxitr = 150, verbose = FALSE, ncore = 2) predict(orthofit,test$X) # the pseudo direct learning method orthofit = orthoDr_pdose(train$X, train$A, train$R, ndr = ndr, lambda = seq(0.1,0.2,0.01), method = "pseudo_direct", K = as.integer(sqrt(n)), keep.data = TRUE, maxitr = 150, verbose = FALSE, ncore = 2) predict(orthofit,test$X)
Fits the partial SAVE model for dose-response settings. This model relies
on strong assumptions; its solution is primarily used as an initial value
in orthoDr_pdose optimization.
pSAVE(x, a, r, ndr = 2, nslices0 = 2)pSAVE(x, a, r, ndr = 2, nslices0 = 2)
x |
A matrix (n x p) of features (continuous only). |
a |
A numeric vector of observed dose levels. |
r |
A numeric vector of rewards (outcomes). |
ndr |
Integer; number of structural dimensions to estimate.
Default: |
nslices0 |
Integer; number of slices used for SAVE.
Default: |
A matrix whose columns are the estimated basis vectors of the central subspace, ordered by eigenvalues.
Feng, Z., Wen, M.X., Yu, Z. and Zhu, L. (2013). On partial sufficient dimension reduction with applications to partially linear multi-index models. Journal of the American Statistical Association, 108(501), 237–256. doi:10.1080/01621459.2013.849167.
set.seed(1) N <- 200; P <- 4 X <- matrix(rnorm(N * P), N, P) dose <- runif(N, 0, 2) reward <- X[, 1] + rnorm(N) pSAVE(X, dose, reward, ndr = 1)set.seed(1) N <- 200; P <- 4 X <- matrix(rnorm(N * P), N, P) dose <- runif(N, 0, 2) reward <- X[, 1] + rnorm(N) pSAVE(X, dose, reward, ndr = 1)
Compute the Silverman rule-of-thumb bandwidth for kernel density estimation.
silverman(d, n)silverman(d, n)
d |
Integer; number of dimensions. |
n |
Integer; number of observations. |
A numeric scalar: the Silverman bandwidth.
silverman(1, 300)silverman(1, 300)
The clinical variables of the SKCM dataset. The original data was obtained from The Cancer Genome Atlas (TCGA).
skcm.clinicalskcm.clinical
Contains 469 subjects with 156 failures. Each row contains one subject, subject ID is indicated by row name. Variables include Time, Censor, Gender and Age. Age has 8 missing values.
https://www.cancer.gov/ccg/research/genome-sequencing/tcga
The expression of top 20 genes of cutaneous melanoma literature based on the MelGene Database.
skcm.melgeneskcm.melgene
Each row contains one subject, subject ID is indicated by row name. Gene names in the columns. The columns are scaled.
Chatzinasiou, Foteini, Christina M. Lill, Katerina Kypreou, Irene Stefanaki, Vasiliki Nicolaou, George Spyrou, Evangelos Evangelou et al. "Comprehensive field synopsis and systematic meta-analyses of genetic association studies in cutaneous melanoma." Journal of the National Cancer Institute 103, no. 16 (2011): 1227-1235.
https://www.cancer.gov/ccg/research/genome-sequencing/tcga
Produce 2D or 3D plots of right censored survival data based on a given dimension reduction space
view_dr_surv( x, y, censor, B = NULL, bw = NULL, FUN = "log", type = "2D", legend.add = TRUE, xlab = "Reduced Direction", ylab = "Time", zlab = "Survival" )view_dr_surv( x, y, censor, B = NULL, bw = NULL, FUN = "log", type = "2D", legend.add = TRUE, xlab = "Reduced Direction", ylab = "Time", zlab = "Survival" )
x |
A matrix or data.frame for features (continuous only). The algorithm will not scale the columns to unit variance |
y |
A vector of observed time |
censor |
A vector of censoring indicator |
B |
The dimension reduction subspace, can only be 1 dimensional |
bw |
A Kernel bandwidth (3D plot only) for approximating the survival function, default is the Silverman's formula |
FUN |
A scaling function applied to the time points |
type |
|
legend.add |
Should legend be added (2D plot only) |
xlab |
x axis label |
ylab |
y axis label |
zlab |
z axis label |
Invisible NULL. Called for its side-effect (producing a plot).
Sun, Q., Zhu, R., Wang, T. and Zeng, D. (2017). Counting process based dimension reduction method for censored outcomes. arXiv preprint doi:10.48550/arXiv.1704.05046.
# generate some survival data N = 100; P = 4; dataX = matrix(rnorm(N*P), N, P) Y = exp(-1 + dataX[,1] + rnorm(N)) Censor = rbinom(N, 1, 0.8) orthoDr.fit = orthoDr_surv(dataX, Y, Censor, ndr = 1, method = "dm") view_dr_surv(dataX, Y, Censor, orthoDr.fit$B)# generate some survival data N = 100; P = 4; dataX = matrix(rnorm(N*P), N, P) Y = exp(-1 + dataX[,1] + rnorm(N)) Censor = rbinom(N, 1, 0.8) orthoDr.fit = orthoDr_surv(dataX, Y, Censor, ndr = 1, method = "dm") view_dr_surv(dataX, Y, Censor, orthoDr.fit$B)