微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何解决最佳集群数问题?

如何解决如何解决最佳集群数问题?

我正在处理一些数据,在数据中我们发现一些值为零 (0) 但我们不想忽略它。因此,为此我们首先缩放数据,然后尝试计算它,但没有进行计算。

我们使用的数据

dput(read.csv("LHWC.csv"))
structure(list(Station = c("Petroleum","Sulfide","PAH","PES","Acrylic","PP","Rayon"),NDDZ91 = c(20.27,15.16,0.05,74.23500995),NDDZ92 = c(35.13,47.67,0.06,44.15139004,58.86852005),NDDZ94 = c(38.01,30.23,0.13,17.922209,53.766627),NDDZ95 = c(485.92,75.05,35.47228577,70.94457153),NDDZ96 = c(6.8,10.51,5.263920851,26.31960425),NDDZ97 = c(130.26,38.75,0.1,66.9289217,0),NDDZ98 = c(26.42,21.64,0.27,11.62692114,69.76152682
),NDDZ99 = c(21.47,35.97,0.07,30.51552678,61.03105356),NDDZ103 = c(75.24,122.53,0.12,57.2039725,114.407945),NDDZ105 = c(9.09,6.95,35.11363161,70.22726323),NDDZ106 = c(17.84,58.52,0.21,49.43776022,49.43776022),NDDZ107 = c(106.44,75.92,83.9088046,41.9544023,41.9544023),NDDZ112 = c(367.74,92.23,186.2662335,NDDZ113 = c(26.96,10.85,0.08,248.4375,NDDZ114 = c(42.02,72.1,0.14,45.56461801,136.693854),NDDZ115 = c(17.57,28.46,0.04,112.339267
),NDDZ116 = c(72,81.19,0.03,36.33232944),NDDZ117 = c(382.57,195.59,8.461490717,4.230745359,16.92298143),NDDZ119 = c(150,42.54,29.69151766,59.38303531),NDDZ122 = c(452.87,277.4,76.46222766,305.8489106),NDDZ123 = c(61.51,81.67,34.05399385,102.1619815)),class = "data.frame",row.names = c(NA,-7L))

我们应用的代码

df = read.csv("LHWC.csv")
rownames(df) = c(df$Station)
data = df[,-1]
d = scale(data)
library(NbClust)
nbclust_out <- NbClust(d,distance = "euclidean",min.nc = 2,max.nc = 20,method = "ward.D")
nbclust_plot <- data.frame(clusters = nbclust_out$Best.nc[1,])
# select only indices which select between 2 and 20 clusters
nbclust_plot <- subset(nbclust_plot,clusters >= 2 & clusters <= 20)
library(ggplot2)
# create plot
ggplot(nbclust_plot) +
  aes(x = clusters) +
  geom_histogram(bins = 30L,fill = "#0c4c8a") +
  labs(x = "Number of clusters",y = "Frequency among all indices",title = "Optimal number of clusters") +
  theme_minimal()

但问题是运行此代码nbclust_out <- NbClust(d,method = "ward.D") 得到的答复是: “NbClust 中的错误(数据,距离 =“欧几里得”,min.nc = 2,max.nc = 5,: TSS 矩阵是不定的。一定有太多的缺失值。无法计算指数。”

但我们已经进行了缩放以去除

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。