如何解决插入符号中的PCA阈值调整
我正在尝试使用插入符号从某些数据构建分类器。 我想尝试的方法之一是使用PCA预处理的数据中的简单LDA。 我发现了如何使用插入符号:
fitControl <- trainControl("repeatedcv",number=10,repeats = 10,preProcOptions = list(thresh = 0.9))
ldaFit1 <- train(label ~ .,data = tab,method = "lda2",preProcess = c("center","scale","pca"),trControl = fitControl)
如预期的插入符号所示,我们将LDA的精度与不同尺寸值进行了比较:
Linear Discriminant Analysis
158 samples
1955 predictors
3 classes: '1','2','3'
Pre-processing: centered (1955),scaled (1955),principal component
signal extraction (1955)
Resampling: Cross-Validated (10 fold,repeated 10 times)
Summary of sample sizes: 142,142,143,...
Resampling results across tuning parameters:
dimen Accuracy Kappa
1 0.5498987 0.1151681
2 0.5451340 0.1298590
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was dimen = 1.
我想做的是将PCA阈值添加到调整参数中,但是我找不到解决方法。
使用插入符号是否有一个简单的解决方案?还是需要用不同的预处理选项重复训练步骤并最终选择最佳值?
解决方法
由于误用指出的链接,我设法将PCA的方差解释阈值集成到参数调整中:
library(caret)
library(recipes)
library(MASS)
# Setting up a vector of thresholds to try out
pca_varex <- c(0.8,0.9,0.95,0.97,0.98,0.99,0.995,0.999)
# Setting up recipe
initial_recipe <- recipe(train,formula = label ~ .) %>%
step_center(all_predictors()) %>%
step_scale(all_predictors())
# Define the modelgrid
models <- model_grid() %>%
share_settings(data = train,trControl = caret::trainControl(method = "repeatedcv",number = 10,repeats = 10),method = "lda2")
# Add models with different PCA thresholds
for (i in pca_varex) {
models <- models %>% add_model(model_name = sprintf("varex_%s",i),x = initial_recipe %>%
step_pca(all_predictors(),threshold = i))
}
# Train
models <- models %>% train(.)
尽管通过查找modelgrid和配方文档,tidymodels包可能是最简单的方法(https://www.tidymodels.org/)。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。