微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

为什么在Rcaret中针对相同火车数据上的相同游侠模型在ConfusionMatrix上获得不同的特异性结果? 厘米火车 CM测试

如何解决为什么在Rcaret中针对相同火车数据上的相同游侠模型在ConfusionMatrix上获得不同的特异性结果? 厘米火车 CM测试

我是R的新手,并使用插入符号实现了ml。我正在尝试来自 UCI 银行营销响应的二进制classification-yes/no问题。

问题:在基于train数据的训练模型上,该模型将最大Specificity显示0.4974,但是当我在同一图上绘制ConfusionMatrix时我得到train 0.9684的{​​{1}}数据。 SpecificitySpecificitymodel summary的如此高的差异看起来很奇怪,无法理解。 (由于这是confusion matrix数据,因此请考虑在imbalanced上将Specificity类用于yes

赞赏任何帮助或引起对此的解释

enter image description here

以下代码供参考:

从UCI下载数据

(无法通过R代码下载,但可以通过链接手动下载)

accuracy

培训/测试

url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip"

temp <- tempfile()
download.file(url,temp)

data <- read.table(unz(temp,"bank-additional.csv"))
unlink(temp)

离群值处理

library(tidyverse)
library(caret)

set.seed(123)

data_split_index <- createDataPartition(data$y,p = 0.7,list = FALSE)


train <- data[data_split_index,]
test <- data[-data_split_index,]


y_train = subset(train,select = y)
X_train = subset(train,select = -y)


# Data imbalance
table(y_train)
table(y_test)

一种热编码

# Outlier capping function
outlier_cap <- function(x,na.rm = TRUE){
  
  upper_cap = quantile(x,0.99,na.rm = T)
  lower_cap = quantile(x,0.01,na.rm = T)
  
  x[x > upper_cap] = upper_cap
  x[x < lower_cap] = lower_cap
  
  x <- x
}

# capping outliers
train <- train %>% mutate_if(is.numeric,outlier_cap) 

数据预处理

pp_dummy <- dummyVars(y ~ .,data = train)

train <- predict(pp_dummy,newdata = train)
train <- data.frame(train)

test <- predict(pp_dummy,newdata = test)
test <- data.frame(test)

将y附加到火车上

pp_standardize <- preProcess(train,method = c("center","scale"))

train <- predict(pp_standardize,newdata = train)



pp_range <- preProcess(train,method = c("range"))

train <- predict(pp_range,newdata = train)

调整

train$y <- Y_train # as.factor(Y)

模型游侠py

fitCtrl <- trainControl(
  method = "cv",# repeats = 5,number = 10,savePredictions = "final",classprobs = TRUE,summaryFunction = twoClassSummary
) 

在下面两行代码的结果中出现

set.seed(123)

system.time(
  model_ranger_py <-  train(pull(y) ~ cons.conf.idx+contact.telephone+duration+emp.var.rate+job.self.employed+month.jun+month.mar+month.may+month.nov+poutcome.success+prevIoUs,data = train,metric = "Spec",method = "ranger",trControl = fitCtrl)
)
model_ranger_py

(更新)

confusionMatrix(predict(model_ranger_py,newdata = train),pull(train$y)) train数据比较之间获得甚至奇怪的结果

厘米火车

test
confusionMatrix(predict(model_ranger_py,pull(train$y))

################################ output ################################
Confusion Matrix and Statistics

          Reference
Prediction   no  yes
       no  2562   10
       yes    6  306
                                             
               Accuracy : 0.9945             
                 95% CI : (0.991,0.9968)    
    No information Rate : 0.8904             
    P-Value [Acc > NIR] : <0.0000000000000002
                                             
                  Kappa : 0.9714             
                                             
 Mcnemar's Test P-Value : 0.4533             
                                             
            Sensitivity : 0.9977             
            Specificity : 0.9684

enter image description here

CM测试

plot_pred_type_distribution(df_train)
confusionMatrix(predict(model_ranger_py,newdata = test),pull(y_test)) 

################################ output ################################
Confusion Matrix and Statistics

          Reference
Prediction  no yes
       no  166   8
       yes 934 127
                                             
               Accuracy : 0.2372             
                 95% CI : (0.2138,0.262)    
    No information Rate : 0.8907             
    P-Value [Acc > NIR] : 1                  
                                             
                  Kappa : 0.0229             
                                             
 Mcnemar's Test P-Value : <0.0000000000000002
                                             
            Sensitivity : 0.1509             
            Specificity : 0.9407 

enter image description here

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。