在 R 中运行 randomForest 时出错：“y - ymean 中的错误：二元运算符的非数字参数”

如何解决在 R 中运行 randomForest 时出错：“y - ymean 中的错误：二元运算符的非数字参数”

birth <- import("smoker_data1.xlsx")


## Splitting the dataset in test and train datasets

mysplit <- sample.split(birth,SplitRatio = 0.65)
train <- subset(birth,mysplit == T)
test <- subset(birth,mysplit == F)

## Build Random Forest model on the test set

mod1 <- randomForest(smoke~.,train)

错误消息：错误：y - ymean 中的错误：二元运算符的非数字参数**

解决方法

我认为最好的方法是先检查smoke变量的数据类型。如果可能，请尝试使用 as.factor() 更改变量。

library(readxl)
birth <- read_excel("smoker_data1.xlsx")
## Splitting the dataset in test and train datasets
mysplit <- sample.split(birth,SplitRatio = 0.65)
train <- subset(birth,mysplit == T)
test <- subset(birth,mysplit == F)
train$smoke <- as.factor(train$smoke)
## Build Random Forest model on the test set

mod1 <- randomForest(smoke~.,train)

你给的数据我已经试过了，只需要在拟合randomForest函数之前正确指定数据类型即可。

data1$baby_wt <- as.numeric(data1$baby_wt)
data1$income <- as.factor(data1$income)
data1$mother_a <- as.numeric(data1$mother_a)
data1$smoke <- as.factor(data1$smoke)
data1$gestation <- as.numeric(data1$gestation)
data1$mother_wt <- as.numeric(data1$mother_wt)


library(caret)
library(randomForest)
predictors <- names(data1)[!names(data1) %in% "smoke"]
inTrainingSet <- createDataPartition(data1$smoke,p=0.7,list=F)
train<- data1[inTrainingSet,]
test<- data1[-inTrainingSet,]
library(randomForest)
m.rf = randomForest(smoke~.,data=train,mtry=sqrt(ncol(x)),ntree=5000,importance=T,proximity=T,probability=T)
m.rf
#############################################
# Test Performance
#############################################
m.pred = predict(m.rf,test[-4],response="class")
m.table <- table(m.pred,test$smoke)
library(caret)
confusionMatrix(m.table)