如何解决如何将使用 Hellinger 度量的层次聚类分析应用于 LDA 模型?
我在做LDA分析,我有主题,但是我需要根据Hellinger距离对主题进行聚类。我需要将LDA模型生成的20个主题分组并以树状图呈现。我分享了部分代码。
textos <-select(Base_Articulos,Articulo,Evento,Ano)
textorder <- textos[order(textos$Ano),]
bd_duplicados <- textos[duplicated(textos),]
bd_unicos <- unique (textos)
bd_unicos <- na.omit(bd_unicos)
ap_td <- tibble(textos) ap_td
tidy_articulo <- ap_td %>% unnest_tokens(word,Evento)
espstopwords <- tibble(word = c(stopwords(kind = "es"))) enpstopwords <- tibble(word = c(stopwords(kind = "en")))
miastopwords <- tibble(word = c("colombia","study","bogota","colombian","colombiano","t","medellin","n","k","b","hom","cc","92","85","m","1","l","sp","50","155.000","155","59","64","70","80","18","ri","2","3","4","5","6","7","8","9"))
tidy_articulo <- tidy_articulo %>% anti_join(espstopwords) tidy_articulo <- tidy_articulo %>% anti_join(enpstopwords) tidy_articulo <- tidy_articulo %>% anti_join(miastopwords)
ap_td <- mutate(ap_td,Evento = as.character(ap_td$Evento))
tidy_articulo %>% count(word,sort = TRUE)
word_counts <- tidy_articulo %>% count(Articulo,word,sort = TRUE) %>% ungroup()
word_counts
desc_dtm <- word_counts %>% cast_dtm(Articulo,n)
desc_dtm
ap_lda <- LDA(desc_dtm,k = 20,control = list(seed = 1234))
ap_lda
ap_topics <- tidy(ap_lda,matrix = "beta")
ap_documents <- tidy(ap_lda,matrix = "gamma")
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。