如何解决如何查找和绘制多个短语总计的频率?
我有语料库,我试图找到按年份总计的多个短语的频率并绘制它。例如,如果“美国经济”和“加拿大经济”这两个词在 2004 年分别被提及 2 次,我希望在 2004 年的频率为 4。
我已经设法为单个标记做到了这一点,但在尝试短语时遇到了麻烦。 这是我以前为单个令牌所做的代码。
a_corpus <- corpus(df,text = "text")
my_dict <- dictionary(list(america = c("America","President")))
freq_grouped_creators <- textstat_frequency(dfm(tokens(a_corpus)),groups = a_corpus$Year)
freq_word_creators <- subset(freq_grouped_creators,freq_grouped_creators$feature %in% my_dict$america)
# collapsing rows by year to total frequencies for tokens
freq_word_creators_2 <- freq_word_creators %>%
group_by(group) %>%
summarize(Sum_frequency = sum(frequency))
# plotting
ggplot(freq_word_creators_2,aes(x = group,y =
Sum_frequency)) +
geom_point() +
scale_y_continuous(limits = c(0,300),breaks = c(seq(0,300,30))) +
xlab(NULL) +
ylab("Frequency") +
theme(axis.text.x = element_text(angle = 90,hjust = 1))
解决方法
无需在 dplyr 中操作频率 - 更简单的方法是选择短语,然后创建一个 dfm,将其转换为 data.frame 以直接与 ggplot2。
library("quanteda")
## Package version: 3.0.9000
## Unicode version: 13.0
## ICU version: 69.1
## Parallel computing: 12 of 12 threads used.
## See https://quanteda.io for tutorials and examples.
library("quanteda.textstats")
a_corpus <- tail(data_corpus_inaugural,10)
economic_phrases <- c("middle class","social security","strong economy")
toks <- tokens(a_corpus)
toks <- tokens_compound(toks,phrase(economic_phrases),concatenator = " ") %>%
tokens_keep(economic_phrases)
dfmat <- dfm(toks)
dfmat
## Document-feature matrix of: 10 documents,2 features (65.00% sparse) and 4 docvars.
## features
## docs middle class social security
## 1985-Reagan 0 0
## 1989-Bush 0 0
## 1993-Clinton 0 0
## 1997-Clinton 2 0
## 2001-Bush 0 1
## 2005-Bush 0 1
## [ reached max_ndoc ... 4 more documents ]
freq_word_creators_2 <- data.frame(convert(dfmat,to = "data.frame"),Year = dfmat$Year)
# plotting
library("ggplot2")
ggplot(freq_word_creators_2,aes(x = Year,y = middle.class)) +
geom_point() +
# scale_y_continuous(limits = c(0,300),breaks = c(seq(0,300,30))) +
xlab(NULL) +
ylab("Frequency") +
theme(axis.text.x = element_text(angle = 90,hjust = 1))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。