如何解决为什么在Rcaret中针对相同火车数据上的相同游侠模型在ConfusionMatrix上获得不同的特异性结果? 厘米火车 CM测试
我是R
的新手,并使用插入符号实现了ml
。我正在尝试来自 UCI 的银行营销响应的二进制classification
-yes/no
问题。
问题:在基于train
数据的训练模型上,该模型将最大Specificity
显示为0.4974
,但是当我在同一图上绘制ConfusionMatrix
时我得到train
0.9684
的{{1}}数据。 Specificity
和Specificity
中model summary
的如此高的差异看起来很奇怪,无法理解。 (由于这是confusion matrix
数据,因此请考虑在imbalanced
上将Specificity
类用于yes
。
赞赏任何帮助或引起对此的解释
以下代码供参考:
从UCI下载数据accuracy
培训/测试
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip"
temp <- tempfile()
download.file(url,temp)
data <- read.table(unz(temp,"bank-additional.csv"))
unlink(temp)
离群值处理
library(tidyverse)
library(caret)
set.seed(123)
data_split_index <- createDataPartition(data$y,p = 0.7,list = FALSE)
train <- data[data_split_index,]
test <- data[-data_split_index,]
y_train = subset(train,select = y)
X_train = subset(train,select = -y)
# Data imbalance
table(y_train)
table(y_test)
一种热编码
# Outlier capping function
outlier_cap <- function(x,na.rm = TRUE){
upper_cap = quantile(x,0.99,na.rm = T)
lower_cap = quantile(x,0.01,na.rm = T)
x[x > upper_cap] = upper_cap
x[x < lower_cap] = lower_cap
x <- x
}
# capping outliers
train <- train %>% mutate_if(is.numeric,outlier_cap)
数据预处理
pp_dummy <- dummyVars(y ~ .,data = train)
train <- predict(pp_dummy,newdata = train)
train <- data.frame(train)
test <- predict(pp_dummy,newdata = test)
test <- data.frame(test)
将y附加到火车上
pp_standardize <- preProcess(train,method = c("center","scale"))
train <- predict(pp_standardize,newdata = train)
pp_range <- preProcess(train,method = c("range"))
train <- predict(pp_range,newdata = train)
调整
train$y <- Y_train # as.factor(Y)
模型游侠py
fitCtrl <- trainControl(
method = "cv",# repeats = 5,number = 10,savePredictions = "final",classprobs = TRUE,summaryFunction = twoClassSummary
)
在下面两行代码的结果中出现
set.seed(123)
system.time(
model_ranger_py <- train(pull(y) ~ cons.conf.idx+contact.telephone+duration+emp.var.rate+job.self.employed+month.jun+month.mar+month.may+month.nov+poutcome.success+prevIoUs,data = train,metric = "Spec",method = "ranger",trControl = fitCtrl)
)
model_ranger_py
(更新)
在confusionMatrix(predict(model_ranger_py,newdata = train),pull(train$y))
和train
数据比较之间获得甚至奇怪的结果
厘米火车
test
confusionMatrix(predict(model_ranger_py,pull(train$y))
################################ output ################################
Confusion Matrix and Statistics
Reference
Prediction no yes
no 2562 10
yes 6 306
Accuracy : 0.9945
95% CI : (0.991,0.9968)
No information Rate : 0.8904
P-Value [Acc > NIR] : <0.0000000000000002
Kappa : 0.9714
Mcnemar's Test P-Value : 0.4533
Sensitivity : 0.9977
Specificity : 0.9684
CM测试
plot_pred_type_distribution(df_train)
confusionMatrix(predict(model_ranger_py,newdata = test),pull(y_test))
################################ output ################################
Confusion Matrix and Statistics
Reference
Prediction no yes
no 166 8
yes 934 127
Accuracy : 0.2372
95% CI : (0.2138,0.262)
No information Rate : 0.8907
P-Value [Acc > NIR] : 1
Kappa : 0.0229
Mcnemar's Test P-Value : <0.0000000000000002
Sensitivity : 0.1509
Specificity : 0.9407
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。