微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何将使用 Hellinger 度量的层次聚类分析应用于 LDA 模型?

如何解决如何将使用 Hellinger 度量的层次聚类分析应用于 LDA 模型?

我在做LDA分析,我有主题,但是我需要根据Hellinger距离对主题进行聚类。我需要将LDA模型生成的20个主题分组并以树状图呈现。我分享了部分代码

textos <-select(Base_Articulos,Articulo,Evento,Ano) 
textorder <- textos[order(textos$Ano),]

bd_duplicados <- textos[duplicated(textos),] 
bd_unicos <- unique (textos) 
bd_unicos <- na.omit(bd_unicos) 
ap_td <- tibble(textos) ap_td

tidy_articulo <- ap_td %>% unnest_tokens(word,Evento)

espstopwords <- tibble(word = c(stopwords(kind = "es"))) enpstopwords <- tibble(word = c(stopwords(kind = "en")))

miastopwords <- tibble(word = c("colombia","study","bogota","colombian","colombiano","t","medellin","n","k","b","hom","cc","92","85","m","1","l","sp","50","155.000","155","59","64","70","80","18","ri","2","3","4","5","6","7","8","9"))

tidy_articulo <- tidy_articulo %>% anti_join(espstopwords) tidy_articulo <- tidy_articulo %>% anti_join(enpstopwords) tidy_articulo <- tidy_articulo %>% anti_join(miastopwords)

ap_td <- mutate(ap_td,Evento = as.character(ap_td$Evento))

tidy_articulo %>% count(word,sort = TRUE)

word_counts <- tidy_articulo %>% count(Articulo,word,sort = TRUE) %>% ungroup()

word_counts

desc_dtm <- word_counts %>% cast_dtm(Articulo,n)

desc_dtm

ap_lda <- LDA(desc_dtm,k = 20,control = list(seed = 1234))

ap_lda

ap_topics <- tidy(ap_lda,matrix = "beta") 

ap_documents <- tidy(ap_lda,matrix = "gamma")

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。