如何提高朴素贝叶斯分类器视觉的 R 图的质量/图形

如何解决如何提高朴素贝叶斯分类器视觉的 R 图的质量/图形

我尝试了一个朴素贝叶斯分类器，看看我是否可以预测一个人，根据他们的年龄和估计工资，是否会购买特定的车辆。我在可视化部分得到的图看起来不太平滑和干净，白线穿过我的图。我假设图形/分辨率是问题，但我不确定。

这是数据集的一个片段

Age EstimatedSalary Purchased
19         19000         0
35         20000         0
26         43000         0
27         57000         0
19         76000         0
27         58000         0

这是代码

# Loading the data set

data <- read.csv(" *A csv sheet on people's age,salaries and whether or not they will purchase a certain vehicle* ")
data <- data[,3:5]
attach(data)

# Encoding the dependent variable

data$Purchased <- factor(data$Purchased,levels = c(0,1))
attach(data)

# Splitting the dataset

library(caTools)
set.seed(404)
split <- sample.split(Purchased,SplitRatio = 0.75)
train_set <- subset(data,split == T)
test_set <- subset(data,split == F)

# Feature scaling

train_set[-3] <- scale(train_set[-3])
test_set[-3] <- scale(test_set[-3])

# Training the model

library(e1071)
classifier <- naiveBayes(x = train_set[-3],y = train_set$Purchased)

# Predicting test results

y_pred <- predict(classifier,newdata = test_set[-3])

# Construct the confusion matrix

(cm <- table(test_set[,3],y_pred))

下面是我用来可视化结果的代码

# Visualising the results

library(ElemStatLearn)
set <- test_set
x1 <- seq(min(set[,1]) - 1,max(set[,1]) + 1,by = 0.01)
x2 <- seq(min(set[,2]) - 1,2]) + 1,by = 0.01)
grid_set <- expand.grid(x1,x2)
colnames(grid_set) <- c("Age","EstimatedSalary")
y_grid <- predict(classifier,newdata = grid_set)
plot(set[,-3],main = "Naive Bayes: Test set",xlab = "Age",ylab = "EstimatedSalary",xlim = range(x1),ylim = range(x2))
contour(x1,x2,matrix(as.numeric(y_grid),length(x1),length(x2)),add = T)
points(grid_set,pch = ".",col = ifelse(y_grid == 1,"Springgreen3","tomato"))
points(set,pch = 21,bg = ifelse(set[,3] == 1,"green4","red3"))

Naive Bayes classifier plot on the test set predictions

想知道白线在绘图上下运行的原因以及为什么它看起来不平滑？

解决方法

所以我想出了是什么给了我奇怪的线条和低质量的分辨率。将“cex = n”参数添加到图中的“points()”函数中，n = 5 解决了这个问题。

修改后的代码块

set <- test_set
x1 <- seq(min(set[,1]) - 1,max(set[,1]) + 1,by = 0.01)
x2 <- seq(min(set[,2]) - 1,2]) + 1,by = 0.01)
grid_set <- expand.grid(x1,x2)
colnames(grid_set) <- c("Age","EstimatedSalary")
y_grid <- predict(classifier,newdata = grid_set)
plot(set[,-3],main = "Naive Bayes: Test set",xlab = "Age",ylab = "EstimatedSalary",xlim = range(x1),ylim = range(x2))
contour(x1,x2,matrix(as.numeric(y_grid),length(x1),length(x2)),add = T)
points(grid_set,pch = ".",col = ifelse(y_grid == 1,"Springgreen3","tomato"),cex = 5)
points(set,pch = 21,bg = ifelse(set[,3] == 1,"green4","red3"))

修改后的代码行

points(grid_set,cex = 5)

然而，我仍然想知道这背后的原因，因为 R 中关于函数和参数的解释对我来说不是那么清楚。

非常感谢您提供的任何帮助！