如何解决R中因子的问题折叠级别 解决方案 1解决方案 2
我有一个杂乱的因子变量,其级别比应有的多。这些案例来自一项公开调查,许多参与者写错了或只是以不同的方式回应了类似的答案。
这是代表我的问题的示例 df:
df <- data.frame(ID=seq(1:10),Nationality=c("espanol","spaniol","ESPANOL","spanish","colombia","Colombian","British","brit","ESPanol","UK")
)
我想要的输出是这样的:
> df
ID Nationality
1 1 Spanish
2 2 Spanish
3 3 Spanish
4 4 Spanish
5 5 Colombian
6 6 Colombian
7 7 British
8 8 British
9 9 Spanish
10 10 British
为了将这 10 个人为的因子水平降低到应有的 3(西班牙文、哥伦比亚文、英国文),我试图这样做:
library(forcats)
levels(df$Nationality) <- fct_collapse(df$Nationality,Spanish = c("espanol","ESPanol"),Colombian = c("colombia","Colombian"),British = c("British","UK")
)
这有效地将我的“国籍”因素降低到 3 个级别,但输出看起来像这样并且与第一个不对应:
> df
ID Nationality
1 1 Colombian
2 2 British
3 3 British
4 4 Spanish
5 5 Spanish
6 6 Spanish
7 7 Spanish
8 8 Spanish
9 9 Colombian
10 10 British
在我使用的更大的数据集中,它也不起作用,但输出更糟,因为所有案例都变成了“西班牙语”,而且我没有任何线索说明为什么会发生这种情况。
在此先感谢您的帮助! 最好, 卢卡斯
解决方法
您是否曾尝试将国籍作为首要考虑因素?
df <- data.frame(ID=seq(1:10),Nationality=c("espanol","spaniol","ESPANOL","spanish","colombia","Colombian","British","brit","ESPanol","UK")
)
library(forcats)
df2 <- df %>%
mutate(Nationality = factor(Nationality)) %>%
mutate(Nationality = fct_collapse(Nationality,Spanish = c("espanol","ESPanol"),Colombian = c("colombia","Colombian"),British = c("British","UK")))
#more concise
mutate(across(Nationality,~ fct_collapse(factor(.),"UK")
)))
,
以下是一些使用内置函数的解决方案:
解决方案 1
此解决方案假定列 Nationality
是一个字符变量
cases <- c(espanol = "Spanish",spaniol = "Spanish",ESPANOL = "Spanish",spanish = "Spanish",British = "British",brit = "British",ESPanol = "Spanish",UK = "British",colombia = "Colombian",Colombian = "Colombian")
df$Nationality <- factor(cases[df$Nationality])
解决方案 2
df$Nationality <- as.factor(df$Nationality)
levels(df$Nationality) <- list(Spanish = c("espanol","UK"))
输出数据
# ID Nationality
# 1 1 Spanish
# 2 2 Spanish
# 3 3 Spanish
# 4 4 Spanish
# 5 5 Colombian
# 6 6 Colombian
# 7 7 British
# 8 8 British
# 9 9 Spanish
# 10 10 British
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。