如何解决R 条件 rowSums 替换为基于百分比的总和
如果这些行代表数据的
示例数据:
姓名 | Year1 | Year2 | Year3 | 总计 | 百分比 |
---|---|---|---|---|---|
约翰 | 1 | 2 | 1 | 4 | 0.7029877 |
保罗 | 230 | 100 | 150 | 480 | 84.358524 |
乔治 | 41 | 30 | 10 | 81 | 14.235501 |
林戈 | 2 | 1 | 1 | 4 | 0.7029877 |
# Code for example data
name <- c("John","Paul","George","Ringo")
Year1 <- c(1,230,41,2)
Year2 <- c(2,100,30,1)
Year3 <- c(1,150,10,1)
df <- data.frame(name,Year1,Year2,Year3)
df$Total <- rowSums(select(df,Year1:Year3))
df$Percent <- df$Total/sum(df$Total)*100
在解决方案中,John 和 Ringo 将合并为一个“其他”解决方案,因为两者的百分比都
# Code for example solution
name <- c("Paul","Other(n=2)")
Year1 <- c(230,3)
Year2 <- c(100,3)
Year3 <- c(150,2)
df2 <- data.frame(name,Year3)
df2$Total <- rowSums(select(df2,Year1:Year3))
df2$Percent <- df2$Total/sum(df2$Total)*100
示例解决方案:
姓名 | Year1 | Year2 | Year3 | 总计 | 百分比 |
---|---|---|---|---|---|
保罗 | 230 | 100 | 150 | 480 | 84.358524 |
乔治 | 41 | 30 | 10 | 81 | 14.235501 |
其他(n=2) | 3 | 3 | 2 | 8 | 1.405975 |
解决方法
library(tidyverse) # or use forcats::fct_lump(...
df %>%
mutate(name_lumped = fct_lump(name,w = Percent,prop = 0.01)) %>%
group_by(name_lumped) %>%
summarize(across(Year1:Percent,sum))
# A tibble: 3 x 6
name_lumped Year1 Year2 Year3 Total Percent
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 George 41 30 10 81 14.2
2 Paul 230 100 150 480 84.4
3 Other 3 3 2 8 1.41
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。