如何解决最优化的重复交叉验证
我想执行 10 次 5 折交叉验证并输出准确度。让我们考虑一下我的功能:
library(caret)
cross_validation <- function(y,x) {
acc <- c()
idx <- 1:length(y)
for (j in 1:10) {
folds <- createFolds(y,k = 5,list = TRUE,returnTrain = FALSE)
for (i in 1:5) {
training <- idx[-unlist(folds[i])]
model <- glm(y[training] ~ .,data = x[training,],family = binomial())
preds <- predict(model,newdata = x[unlist(folds[i]),type = "response")
preds[preds>0.5] <- 1
preds[preds <=0.5] <- 0
acc[i] <- confusionMatrix(as.factor(y[unlist(folds[i])]),as.factor(preds))$overall[1]
}
}
mean(acc,na.rm = T)
}
还要注意它确实适用于示例:
set.seed(42)
y <- sample(0:1,100,T)
df <- data.frame("norm" = rnorm(100),"Exp" = rexp(100))
> cross_validation(y,x)
[1] 0.45
但是我发现它有点不方便和低效。我认为我以尽可能慢的方式(但也是最直观的)编码它。这段代码的问题是双循环。我在想是否有可能省略它,但我不知道如何。你认为双循环可以避免吗?
解决方法
您可以在插入符号中使用 train
函数和重复的简历:
y = sample(0:1,100,T)
df = data.frame("Norm" = rnorm(100),"Exp" = rexp(100))
ctrl = trainControl(method="repeatedcv",number=5,repeats=10)
set.seed(42)
res = train(y=factor(y),x=df,trControl=ctrl,method="glm")
您获得整体准确度:
Accuracy Kappa parameter Resample
1 0.5500000 0.0000000 none Fold1.Rep01
2 0.5789474 0.0000000 none Fold2.Rep01
3 0.5714286 0.0000000 none Fold3.Rep01
4 0.5500000 0.0000000 none Fold4.Rep01
5 0.4000000 -0.2903226 none Fold5.Rep01
6 0.4000000 -0.2631579 none Fold1.Rep02
> res
Generalized Linear Model
100 samples
2 predictor
2 classes: '0','1'
No pre-processing
Resampling: Cross-Validated (5 fold,repeated 10 times)
Summary of sample sizes: 80,81,79,80,...
Resampling results:
Accuracy Kappa
0.5196291 -0.07190123
这给出了所有训练折叠的准确性:
head(res$resample)
Accuracy Kappa parameter Resample
1 0.5500000 0.0000000 none Fold1.Rep01
2 0.5789474 0.0000000 none Fold2.Rep01
3 0.5714286 0.0000000 none Fold3.Rep01
4 0.5500000 0.0000000 none Fold4.Rep01
5 0.4000000 -0.2903226 none Fold5.Rep01
6 0.4000000 -0.2631579 none Fold1.Rep02
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。