微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

facet_grid的百分比直方图:x变量是一个因素 这提供了两个*不要*独立集成到100%的面板:

如何解决facet_grid的百分比直方图:x变量是一个因素 这提供了两个*不要*独立集成到100%的面板:

我想使用facet_grid将百分比直方图(积分为100%)分成两个方面。但是,拆分为多个构面时,每个构面本身未集成到100%。过去有这样的问题has been resolved here,但是我无法将这种解决方案转换为x是一个因子的当前情况,因此使用stat(density)的直方图不起作用。

我的数据

具有两列的数据框。 equipment表示一个家庭是否有足够的家庭教育设备,children_n表示一个孩子的数量

library(tidyverse)
library(magrittr)

df <- 
structure(list(equipment = c(1,1,1),children_n = c(4,4,2,3,7,5,8,6,9,4)),row.names = c(NA,-1059L),class = c("tbl_df","tbl","data.frame"))


df

## # A tibble: 1,059 x 2
##    equipment children_n
##        <dbl>      <dbl>
##  1         1          4
##  2         0          4
##  3         1          2
##  4         1          2
##  5         0          2
##  6         1          1
##  7         1          1
##  8         1          3
##  9         1          2
## 10         1          3
## # ... with 1,049 more rows

如果孩子人数超过6岁,我希望将这些情况归为“ 6+”类别。

df %<>%
  mutate_at(vars(children_n),as.character) %>%
  mutate_at(vars(children_n),recode,"9" = "6_plus","8" = "6_plus","7" = "6_plus","6" = "6_plus") %>%
  mutate_at(vars(children_n),fct_relevel,"1","2","3","4","5","6_plus")

glimpse(df)

## Rows: 1,059
## Columns: 2
## $ equipment  <dbl> 1,...
## $ children_n <fct> 4,6_plus,...

现在,我想在两个单独的面板中绘制儿童人数的比例:一个面板用于配备足够设备的家庭,另一个面板用于不配备设备的家庭:

df %>%
  ggplot(data = .,aes(x = children_n,y = equipment)) + 
  geom_histogram(aes(y = (..count..)/sum(..count..)),stat = "count",fill = "darkblue") +
  geom_text(aes(label = scales::percent(((..count..)/sum(..count..)),accuracy = 1),y = ((..count..)/sum(..count..)) ),stat= "count",vjust = -.5,color = "darkblue") +
  scale_y_continuous(labels = scales::percent) +
  facet_grid(~ equipment,labeller = as_labeller(c("1" = "have enough equipment","0" = "don't have enough equipment")))

这提供了两个*不要*独立集成到100%的面板:


two_panels_dont_integrate

试图解决问题

我发现this question描述了相同的意图和问题。选择的解决方案建议将geom_histogram定义为密度,以便将其积分到100%。但这在我的情况下不起作用,因为stat(density)要求x变量将是连续的,这与我的情况中x是一个因子不同。

df %>%
  ggplot(data = .,y = equipment)) + 
  geom_histogram(aes(y = stat(density) * 6),binwidth = 6,fill = "darkblue") +
  facet_grid(~ equipment,"0" = "don't have enough equipment")))

错误:StatBin需要连续的x变量:x变量为 离散的。也许您想要stat =“ count”?

其他方法建议使用..PANEL..,而其他方法则强烈反对。 如何以适当的方式使这两个方面显示独立整合到100%的百分比?

解决方法

可以这样实现:

  1. 将构面变量映射到group aes
  2. 使用例如tapply获取每个组或构面的总数

顺便说一句:我将用于规范化的代码放在辅助函数中,以减少代码重复和可读性

library(tidyverse)
library(magrittr)

df %<>%
  mutate_at(vars(children_n),as.character) %>%
  mutate_at(vars(children_n),recode,"9" = "6_plus","8" = "6_plus","7" = "6_plus","6" = "6_plus") %>%
  mutate_at(vars(children_n),fct_relevel,"1","2","3","4","5","6_plus")

help <- function(count,group) {
  count / tapply(count,group,sum)[group]
}

df %>%
  ggplot(data = .,aes(x = children_n,y = equipment,group = equipment)) + 
  geom_histogram(aes(y = help(..count..,..group..)),stat = "count",fill = "darkblue") +
  geom_text(aes(label = scales::percent(help(..count..,..group..),accuracy = 1),y = help(..count..,..group..) ),stat= "count",vjust = -.5,color = "darkblue") +
  scale_y_continuous(labels = scales::percent) +
  facet_grid(~ equipment,labeller = as_labeller(c("1" = "have enough equipment","0" = "don't have enough equipment")))
#> Warning: Ignoring unknown parameters: binwidth,bins,pad

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。