如何解决如何获得 R 中 k 折交叉验证的每一折的系数、z 分数和 p 值?
我正在使用 glm 执行 5 折交叉验证来执行逻辑回归。这是使用内置汽车数据集的可重现示例
library(caret)
data("mtcars")
str(mtcars)
mtcars$vs<-as.factor(mtcars$vs)
df0<-na.omit(mtcars)
set.seed(123)
train.control <- trainControl(method = "cv",number = 5)
# Train the model
model <- train(vs ~.,data = mtcars,method = "glm",trControl = train.control)
print(model)
summary(model)
model$resample
confusionMatrix(model)
pred.mod <- predict(model)
confusionMatrix(data=pred.mod,reference=mtcars$vs)
> print(model)
Generalized Linear Model
32 samples
10 predictors
2 classes: '0','1'
No pre-processing
resampling: Cross-Validated (5 fold)
Summary of sample sizes: 25,26,25,27,25
resampling results:
Accuracy Kappa
0.9095238 0.8164638
> summary(model)
Call:
NULL
Deviance Residuals:
Min 1Q Median 3Q Max
-1.181e-05 -2.110e-08 -2.110e-08 2.110e-08 1.181e-05
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.117e+01 1.589e+07 0 1
mpg 2.451e+00 5.979e+04 0 1
cyl -3.908e+01 2.947e+05 0 1
disp -1.927e-02 8.518e+03 0 1
hp 3.129e-01 2.283e+04 0 1
drat -2.735e+01 9.696e+05 0 1
wt -1.248e+01 6.437e+05 0 1
qsec 1.565e+01 3.845e+05 0 1
am -4.562e+01 3.632e+05 0 1
gear -2.835e+01 5.448e+05 0 1
carb 1.788e+01 2.971e+05 0 1
(dispersion parameter for binomial family taken to be 1)
Null deviance: 4.3860e+01 on 31 degrees of freedom
Residual deviance: 7.2154e-10 on 21 degrees of freedom
AIC: 22
Number of Fisher Scoring iterations: 25
> model$resample
Accuracy Kappa Resample
1 0.8571429 0.6956522 Fold1
2 0.8333333 0.6666667 Fold2
3 0.8571429 0.7200000 Fold3
4 1.0000000 1.0000000 Fold4
5 1.0000000 1.0000000 Fold5
> confusionMatrix(model)
Cross-Validated (5 fold) Confusion Matrix
(entries are percentual average cell counts across resamples)
Reference
Prediction 0 1
0 50.0 3.1
1 6.2 40.6
Accuracy (average) : 0.9062
> pred.mod <- predict(model)
> confusionMatrix(data=pred.mod,reference=mtcars$vs)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 18 0
1 0 14
Accuracy : 1
95% CI : (0.8911,1)
No information Rate : 0.5625
P-Value [Acc > NIR] : 1.009e-08
Kappa : 1
Mcnemar's Test P-Value : NA
Sensitivity : 1.0000
Specificity : 1.0000
Pos Pred Value : 1.0000
Neg Pred Value : 1.0000
Prevalence : 0.5625
Detection Rate : 0.5625
Detection Prevalence : 0.5625
Balanced Accuracy : 1.0000
'Positive' Class : 0
这一切都很好,但我想获取每个折叠的摘要(模型)信息(意味着执行 summary() 时获得的系数、p 值、z 分数等), 如果可能的话,以及每个折叠的灵敏度和特异性。有人可以帮忙吗?
解决方法
这是一个有趣的问题。您要查找的值无法直接从 model
对象获得,但可以通过了解训练数据的哪些观察值属于哪个折叠来重新计算。如果您在 model
函数中指定 savePredictions = "all"
,则可以从 trainControl
中提取此信息。通过对每个 k 折叠的预测,您可以执行以下操作:
#first of all,save all predictions from all folds
set.seed(123)
train.control <- trainControl(method = "cv",number = 5,savePredictions =
"all")
# Train the model
model <- train(vs ~.,data = mtcars,method = "glm",trControl = train.control)
#now we can extract the statistics you are looking for
fold <- unique(pred$Resample)
mystat <- function(model,x){
pred <- model$pred
df <- pred[pred$Resample==x,]
cm <- confusionMatrix(df$pred,df$obs)
control <- trainControl(method = "none")
newdat <- mtcars[pred$rowIndex,]
fit <- train(vs~.,data=newdat,trControl=control)
summ <- summary(model)
z_p <- summ$coefficients[,3:4]
return(list(cm,z_p))
}
stat <- lapply(fold,mystat,model=model)
names(stat) <- fold
请注意,通过在 method="none"
force trainControl
中指定 train
使模型适合整个训练集,无需任何重采样或参数调整。
在这种形式中,它不是一个漂亮的函数,但它可以满足您的需求,您可以随时调整它以使其更通用。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。