Title: | Density Convoluted Support Vector Machines |
---|---|
Description: | Implements an efficient algorithm for solving sparse-penalized support vector machines with kernel density convolution. This package is designed for high-dimensional classification tasks, supporting lasso (L1) and elastic-net penalties for sparse feature selection and providing options for tuning kernel bandwidth and penalty weights. The 'dcsvm' is applicable to fields such as bioinformatics, image analysis, and text classification, where high-dimensional data commonly arise. Learn more about the methodology and algorithm at Wang, Zhou, Gu, and Zou (2023) <doi:10.1109/TIT.2022.3222767>. |
Authors: | Boxiang Wang [aut, cre], Le Zhou [aut], Yuwen Gu [aut], Hui Zou [aut] |
Maintainer: | Boxiang Wang <[email protected]> |
License: | GPL-2 |
Version: | 0.0.1 |
Built: | 2025-01-11 05:52:06 UTC |
Source: | https://github.com/cran/dcsvm |
This package provides tools to perform density-convoluted support vector machine (DCSVM) modeling for high-dimensional data classification.
This package implements the density-convoluted SVM for high-dimensional classification.
Package: | dcsvm |
Type: | Package |
Version: | 0.0.1 |
Date: | 2025-01-08 |
License: | GPL-2 |
The package dcsvm
contains five main functions:
dcsvm
cv.dcsvm
coef.dcsvm
plot.dcsvm
plot.cv.dcsvm
Boxiang Wang, Le Zhou, Yuwen Gu, and Hui Zou
Maintainer:
Boxiang Wang <[email protected]>
Wang, B., Zhou, L., Gu, Y., and Zou, H. (2023) Density-Convoluted Support Vector Machines for High-Dimensional Classification, IEEE Transactions on Information Theory, Vol. 69(4), 2523-2536,
Computes the coefficients at specified lambda
values for a cv.dcsvm
object.
## S3 method for class 'cv.dcsvm' coef(object, s = c("lambda.1se", "lambda.min"), ...)
## S3 method for class 'cv.dcsvm' coef(object, s = c("lambda.1se", "lambda.min"), ...)
object |
A fitted |
s |
Value(s) of the L1 tuning parameter |
... |
Other arguments that can be passed to |
Compute Coefficients from a "cv.dcsvm" Object
Computes coefficients at chosen values of lambda
from the cv.dcsvm
object.
This function computes the coefficients for lambda
values suggested by cross-validation.
The returned object depends on the choice of s
and any additional arguments passed to the dcsvm
method.
cv.dcsvm
and predict.cv.dcsvm
methods.
data(colon) colon$x <- colon$x[ ,1:100] # Use only the first 100 columns for this example set.seed(1) cv <- cv.dcsvm(colon$x, colon$y, lam2=1, nfolds=5) c1 <- coef(cv, s="lambda.1se")
data(colon) colon$x <- colon$x[ ,1:100] # Use only the first 100 columns for this example set.seed(1) cv <- cv.dcsvm(colon$x, colon$y, lam2=1, nfolds=5) c1 <- coef(cv, s="lambda.1se")
Computes the coefficients or indices of nonzero coefficients at specified lambda
values from a fitted dcsvm
model.
## S3 method for class 'dcsvm' coef(object, s = NULL, type = c("coefficients", "nonzero"), ...)
## S3 method for class 'dcsvm' coef(object, s = NULL, type = c("coefficients", "nonzero"), ...)
object |
A fitted |
s |
Value(s) of the L1 tuning parameter |
type |
|
... |
Not used. Other arguments to |
Compute Coefficients for Sparse Density-Convoluted SVM
Computes the coefficients or returns the indices of nonzero coefficients at chosen values of lambda
from a fitted dcsvm
object.
s
is the vector of lambda
values at which predictions are requested. If s
is not in the lambda sequence used for fitting the model, the coef
function uses linear interpolation. The new values are interpolated using a fraction of coefficients from both left and right lambda
indices.
Either the coefficients at the requested values of lambda
, or a list of the indices of the nonzero coefficients for each lambda
.
data(colon) fit <- dcsvm(colon$x, colon$y, lam2=1) c1 <- coef(fit, type="coefficients", s=c(0.1, 0.005)) c2 <- coef(fit, type="nonzero")
data(colon) fit <- dcsvm(colon$x, colon$y, lam2=1) c1 <- coef(fit, type="coefficients", s=c(0.1, 0.005)) c2 <- coef(fit, type="nonzero")
This dataset contains 62 colon tissue samples with 2000 gene expression levels. Among these samples, 40 are tumor tissues (coded as 1) and 22 are normal tissues (coded as -1).
data(colon)
data(colon)
Simplified Gene Expression Data from Alon et al. (1999)
Gene expression data (2000 genes for 62 samples) from a DNA microarray experiment of colon tissue samples (Alon et al., 1999).
A list with the following elements:
x |
A matrix of 62 rows and 2000 columns representing the gene expression levels of 62 colon tissue samples. Each row corresponds to a sample, and each column corresponds to a gene. |
y |
A numeric vector of length 62 representing the tissue type (1 for tumor; -1 for normal). |
The data were introduced in Alon et al. (1999).
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., and Levine, A.J. (1999). “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proceedings of the National Academy of Sciences, 96(12), 6745–6750.
# Load the dcsvm library library(dcsvm) # Load the dataset data(colon) # Check the dimensions of the data dim(colon$x) # Count the number of samples in each class sum(colon$y == -1) sum(colon$y == 1)
# Load the dcsvm library library(dcsvm) # Load the dataset data(colon) # Check the dimensions of the data dim(colon$x) # Count the number of samples in each class sum(colon$y == -1) sum(colon$y == 1)
Performs cross-validation for the sparse density-convoluted SVM to estimate the optimal tuning parameter lambda
.
cv.dcsvm(x, y, lambda = NULL, hval = 1, pred.loss = c("misclass", "loss"), nfolds = 5, foldid, ...)
cv.dcsvm(x, y, lambda = NULL, hval = 1, pred.loss = c("misclass", "loss"), nfolds = 5, foldid, ...)
x |
A matrix of predictors, i.e., the |
y |
A vector of binary class labels, i.e., the |
lambda |
Default is |
hval |
The bandwidth parameter for kernel smoothing. Default is 1. |
pred.loss |
|
nfolds |
The number of folds. Default is 5. The allowable range is from 3 to the sample size. Larger |
foldid |
An optional vector with values between 1 and |
... |
Other arguments that can be passed to |
Cross-Validation for Sparse Density-Convoluted SVM
Conducts a k-fold cross-validation for dcsvm
and returns the suggested values of the L1 parameter lambda
.
This function runs dcsvm
on the sparse density-convoluted SVM by excluding each fold in turn, then computes the mean cross-validation error and standard deviation. It is adapted from the cv
functions in the gcdnet
and glmnet
packages.
A cv.dcsvm
object is returned, which includes the cross-validation fit:
lambda |
The |
cvm |
A vector of length |
cvsd |
A vector of length |
cvupper |
The upper curve: |
cvlower |
The lower curve: |
nzero |
Number of non-zero coefficients at each |
name |
"Mis-classification error", for plotting purposes. |
dcsvm.fit |
A fitted |
lambda.min |
The |
lambda.1se |
The largest value of |
cv.min |
The minimum cross-validation error. |
cv.1se |
The cross-validation error associated with |
dcsvm
, plot.cv.dcsvm
, predict.cv.dcsvm
, and coef.cv.dcsvm
methods.
data(colon) colon$x <- colon$x[ ,1:100] # Use only the first 100 columns for this example n <- nrow(colon$x) set.seed(1) id <- sample(n, trunc(n / 3)) cvfit <- cv.dcsvm(colon$x[-id, ], colon$y[-id], lam2=1, nfolds=5) plot(cvfit) predict(cvfit, newx=colon$x[id, ], s="lambda.min")
data(colon) colon$x <- colon$x[ ,1:100] # Use only the first 100 columns for this example n <- nrow(colon$x) set.seed(1) id <- sample(n, trunc(n / 3)) cvfit <- cv.dcsvm(colon$x[-id, ], colon$y[-id], lam2=1, nfolds=5) plot(cvfit) predict(cvfit, newx=colon$x[id, ], s="lambda.min")
Fits the density-convoluted support vector machine (DCSVM) through kernel density convolutions.
dcsvm( x, y, nlambda = 100, lambda.factor = ifelse(nobs < nvars, 0.01, 1e-04), lambda = NULL, lam2 = 0, kern = c("gaussian", "uniform", "epanechnikov"), hval = 1, pf = rep(1, nvars), pf2 = rep(1, nvars), exclude, dfmax = nvars + 1, pmax = min(dfmax * 1.2, nvars), standardize = TRUE, eps = 1e-08, maxit = 1e+06, istrong = TRUE )
dcsvm( x, y, nlambda = 100, lambda.factor = ifelse(nobs < nvars, 0.01, 1e-04), lambda = NULL, lam2 = 0, kern = c("gaussian", "uniform", "epanechnikov"), hval = 1, pf = rep(1, nvars), pf2 = rep(1, nvars), exclude, dfmax = nvars + 1, pmax = min(dfmax * 1.2, nvars), standardize = TRUE, eps = 1e-08, maxit = 1e+06, istrong = TRUE )
x |
A numeric matrix with |
y |
A numeric vector of length |
nlambda |
Number of |
lambda.factor |
Ratio of the smallest to the largest |
lambda |
An optional user-specified sequence of |
lam2 |
Users may tune |
kern |
Type of kernel method for smoothing. Options are |
hval |
The bandwidth parameter for kernel smoothing. Default is 1. |
pf |
A numeric vector of length |
pf2 |
A numeric vector of length |
exclude |
Indices of predictors to exclude from the model. Equivalent to assigning an infinite penalty factor. Default is none. |
dfmax |
Maximum number of nonzero coefficients allowed in the model. Default is |
pmax |
Maximum number of variables allowed to ever be nonzero during the computation. Default is |
standardize |
Logical indicating whether predictors should be standardized to unit variance. Default is |
eps |
Convergence threshold. The algorithm stops when |
maxit |
Maximum number of iterations allowed. Default is |
istrong |
Logical indicating whether to use the strong rule for faster computation. Default is |
An object of class dcsvm
containing the following components:
b0 |
Intercept values for each |
beta |
Sparse matrix of coefficients for each |
df |
Number of nonzero coefficients for each |
dim |
Dimensions of the coefficient matrix. |
lambda |
Sequence of |
npasses |
Total number of iterations across all |
jerr |
Warnings and errors. 0 if no errors. |
call |
The matched call. |
print.dcsvm
, predict.dcsvm
, coef.dcsvm
, plot.dcsvm
, and cv.dcsvm
.
# Load the data data(colon) # Fit the elastic-net penalized DCSVM with lambda2 to be 1 fit <- dcsvm(colon$x, colon$y, lam2 = 1) print(fit) # Coefficients at some lambda value c1 <- coef(fit, s = 0.005) # Make predictions predict(fit, newx = colon$x[1:10, ], s = c(0.01, 0.005))
# Load the data data(colon) # Fit the elastic-net penalized DCSVM with lambda2 to be 1 fit <- dcsvm(colon$x, colon$y, lam2 = 1) print(fit) # Coefficients at some lambda value c1 <- coef(fit, s = 0.005) # Make predictions predict(fit, newx = colon$x[1:10, ], s = c(0.01, 0.005))
Depicts the cross-validation curves for the sparse density-convoluted SVM.
## S3 method for class 'cv.dcsvm' plot(x, sign.lambda, ...)
## S3 method for class 'cv.dcsvm' plot(x, sign.lambda, ...)
x |
A fitted |
sign.lambda |
Specifies whether to plot against |
... |
Other graphical parameters to |
Plot the Cross-Validation Curve of Sparse Density-Convoluted SVM
Plots the cross-validation curve against a function of lambda
values, including upper and lower standard deviation curves.
This function visualizes the cross-validation curves for a cv.dcsvm
object, which plots the relationship between lambda
values and cross-validation error.
No return value, only called for plots.
data(colon) colon$x <- colon$x[ ,1:100] # Use only the first 100 columns for this example set.seed(1) cv <- cv.dcsvm(colon$x, colon$y, lam2=1, nfolds=5) plot(cv)
data(colon) colon$x <- colon$x[ ,1:100] # Use only the first 100 columns for this example set.seed(1) cv <- cv.dcsvm(colon$x, colon$y, lam2=1, nfolds=5) plot(cv)
Plots the solution paths as a coefficient profile plot for a fitted dcsvm
model.
## S3 method for class 'dcsvm' plot(x, xvar = c("norm", "lambda"), color = FALSE, label = FALSE, ...)
## S3 method for class 'dcsvm' plot(x, xvar = c("norm", "lambda"), color = FALSE, label = FALSE, ...)
x |
A fitted |
xvar |
Specifies the X-axis. If |
color |
If |
label |
If |
... |
Other graphical parameters to |
Plot Coefficients for Sparse Density-Convoluted SVM
Plots the solution paths for a fitted dcsvm
object.
This function generates a coefficient profile plot showing the solution paths of the sparse density-convoluted SVM.
No return value, only called for plots.
print.dcsvm
, predict.dcsvm
, coef.dcsvm
, plot.dcsvm
, and cv.dcsvm
.
data(colon) fit <- dcsvm(colon$x, colon$y) oldpar <- par(mfrow = c(1,3)) #changes par() and stores original par() # Plots against the L1-norm of the coefficients plot(fit) # Plots against the log-lambda sequence plot(fit, xvar="lambda", label=TRUE) # Plots with colors plot(fit, color=TRUE) # Reset to user's option par(oldpar)
data(colon) fit <- dcsvm(colon$x, colon$y) oldpar <- par(mfrow = c(1,3)) #changes par() and stores original par() # Plots against the L1-norm of the coefficients plot(fit) # Plots against the log-lambda sequence plot(fit, xvar="lambda", label=TRUE) # Plots with colors plot(fit, color=TRUE) # Reset to user's option par(oldpar)
Predicts class labels for new data based on the cross-validated lambda
values from a cv.dcsvm
object.
## S3 method for class 'cv.dcsvm' predict(object, newx, s = c("lambda.1se", "lambda.min"), ...)
## S3 method for class 'cv.dcsvm' predict(object, newx, s = c("lambda.1se", "lambda.min"), ...)
object |
A fitted |
newx |
A matrix of new values for |
s |
Value(s) of the L1 tuning parameter |
... |
Not used. Other arguments to |
Make Predictions from a "cv.dcsvm" Object
This function predicts the class labels of new observations using the sparse density-convoluted SVM at the lambda
values suggested by cv.dcsvm
.
This function uses the cross-validation results to make predictions. It is adapted from the predict.cv
function in the glmnet
and gcdnet
packages.
Predicted class labels or fitted values, depending on the choice of s
and any arguments passed to the dcsvm
method.
cv.dcsvm
, and coef.cv.dcsvm
methods.
data(colon) colon$x <- colon$x[ , 1:100] # Use only the first 100 columns for this example set.seed(1) cv <- cv.dcsvm(colon$x, colon$y, lam2=1, nfolds=5) predict(cv$dcsvm.fit, newx=colon$x[2:5, ], s=cv$lambda.1se, type="class")
data(colon) colon$x <- colon$x[ , 1:100] # Use only the first 100 columns for this example set.seed(1) cv <- cv.dcsvm(colon$x, colon$y, lam2=1, nfolds=5) predict(cv$dcsvm.fit, newx=colon$x[2:5, ], s=cv$lambda.1se, type="class")
Predicts binary class labels or fitted values for a dcsvm
model using new data.
## S3 method for class 'dcsvm' predict(object, newx, s = NULL, type = c("class", "link"), ...)
## S3 method for class 'dcsvm' predict(object, newx, s = NULL, type = c("class", "link"), ...)
object |
A fitted |
newx |
A matrix of new values for |
s |
Value(s) of the L1 tuning parameter |
type |
|
... |
Not used. Other arguments to |
Make Predictions for Sparse Density-Convoluted SVM
This function predicts the binary class labels or the fitted values of a dcsvm
object.
s
represents the new lambda
values for making predictions. If s
is not part of the original lambda
sequence generated by dcsvm
, predict.dcsvm
uses linear interpolation to compute predictions by combining adjacent lambda
values in the original sequence. This functionality is adapted from the predict
methods in the glmnet
and gcdnet
packages.
Returns either the predicted class labels or the fitted values, depending on the choice of type
.
data(colon) fit <- dcsvm(colon$x, colon$y, lam2=1) print(predict(fit, type="class", newx=colon$x[2:5, ]))
data(colon) fit <- dcsvm(colon$x, colon$y, lam2=1) print(predict(fit, type="class", newx=colon$x[2:5, ]))
Prints a summary of the dcsvm
object, showing the solution paths.
## S3 method for class 'dcsvm' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'dcsvm' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
A fitted |
digits |
Specifies the significant digits to use in the output. Default is |
... |
Additional arguments to |
Print a DCSVM Object
Print a summary of the dcsvm
solution paths.
This function prints a two-column matrix with columns Df
and Lambda
. The Df
column shows the number of nonzero coefficients, and the Lambda
column displays the corresponding lambda
value. It is adapted from the print
function in the gcdnet
and glmnet
packages.
A two-column matrix with one column showing the number of nonzero coefficients and the other column showing the lambda
values.
print.dcsvm
, predict.dcsvm
, coef.dcsvm
, plot.dcsvm
, and cv.dcsvm
.
data(colon) fit <- dcsvm(colon$x, colon$y) print(fit)
data(colon) fit <- dcsvm(colon$x, colon$y) print(fit)