微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何解决这个聚类分析?

如何解决如何解决这个聚类分析?

我正在计算最佳聚类数。我使用 NbClust 函数进行计算,但它如何显示缺失值过多但我不知道,我的数据中没有缺失值。

这表明 “NbClust 中的错误(数据 = df,距离 = “欧几里德”,min.nc = 2,max.nc = 20,: TSS 矩阵是不定的。一定有太多的缺失值。无法计算指数。”

我正在使用的数据

dput(read.csv("cluster.csv"))
df = structure(list(St = c("PE","SU","PA","OC","PE","AC","PP","RA"),NDDZ91 = c(0.253576604,0.0551232,-0.53169303,-0.533246481,-0.533634844,-0.529751216,2.349376982),NDDZ92 = c(0.4633855,0.952926247,-0.905688982,-0.908031282,0.815565566,-0.904127448,1.390097848
),NDDZ94 = c(0.971257769,0.602251213,-0.82539626,-0.831562179,0.018490857,-0.826819164,1.718596929),NDDZ95 = c(2.428086592,-0.050766856,-0.502772844,-0.503557157,-0.289546405,-0.502953839,-0.075535652),NDDZ96 = c(0.073650972,0.482511184,-0.669130113,-0.675742407,-0.664721917,-0.09563249,2.224807178),NDDZ97 = c(2.108725851,0.193018074,-0.616096838,-0.618190279,0.782927149,-0.618190279
),NDDZ98 = c(0.422792635,0.224274925,-0.66324044,-0.674453783,-0.191577267,-0.670300693,2.222805316),NDDZ99 = c(-0.045504148,0.621635607,-1.030110408,-1.033331082,0.370677267,-1.028730119,1.774685616),NDDZ103 = c(0.543822029,1.4294128,-0.862935822,-0.865183039,0.206064797,-0.863310358,1.277312632),NDDZ105 = c(-0.242116717,-0.327002284,-0.599905416,-0.602682046,0.790140631,-0.598715431,2.18296331
),NDDZ106 = c(-0.394116657,1.166937427,-1.070650174,-1.078708713,0.81841561,0.81841561),NDDZ107 = c(1.493844177,0.766047601,-1.041282102,-1.04295136,0.956552995,-0.043914579,-1.044382153,-0.043914579),NDDZ112 = c(2.137032432,0.085031825,-0.601376567,-0.601897927,-0.601153126,0.785414418,-0.601153126),NDDZ113 = c(-0.102481763,-0.288855624,-0.41345193,-0.41414606,-0.414377436,-0.413220553,2.45975392,-0.413220553
),NDDZ114 = c(0.100876842,0.716344963,-0.756031568,-0.758896113,0.173403417,-0.756850009,2.038002477),NDDZ115 = c(-0.058558995,0.221455542,-0.509307832,-0.505965142,-0.510336352,-0.507765052,2.378242882),NDDZ116 = c(1.377841856,1.640112838,-0.676090962,-0.676661736,-0.676947124,-0.67409325,0.359931628),NDDZ117 = c(2.177231217,0.849368214,-0.539426784,-0.539639833,-0.479549446,-0.53892967,-0.509594639,-0.41945906
),NDDZ119 = c(2.215308855,0.141088501,-0.679450372,-0.680029439,-0.106916185,-0.678099214,0.466197068),NDDZ122 = c(1.743810041,0.768581504,-0.772598602,-0.773098804,-0.348192016,0.926695082),NDDZ123 = c(0.634144889,1.11554263,-0.833927192,-0.834643558,-0.021473135,-0.832255672,1.60486771)),class = "data.frame",row.names = c(NA,-8L))

到目前为止我所做的代码工作

rownames(df) = c(df$St)
df = df[,-1]
library(NbClust)
nbclust_out <- NbClust(
  data = df,distance = "euclidean",min.nc = 2,max.nc = 20,method = "ward.D",)

但这错误显示为“NbClust中的错误(数据= df,距离=“欧几里德”,min.nc = 2,max.nc = 20,: TSS 矩阵是不定的。一定有太多的缺失值。无法计算指数。”

解决方法

max.nc 高于数据集中的行,这可能会导致您的问题。使用其他包:

#remove factor column
df$St <- NULL

#scale df
df.scaled <- scale(df)

#scree plot
scree <- fviz_nbclust(df.scaled,FUNcluster = kmeans,method = "wss",k.max = 7)

#parallel analysis
paral <- fa.parallel(df.scaled,fa = "pc")

根据下面的图,我建议使用 3 个集群。但是并行分析会给出错误,即您的数据集中有一个超海伍德案例,请仔细检查您的结果。

scree

parallel analysis

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。