微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何在ggplot2中应用带有分面的after_stat? 正确的图表无分层:g在移动平均线的导入期内带有“幽灵”条的图形:g + facet_grid(race~., scales="free_y")我的代码

如何解决如何在ggplot2中应用带有分面的after_stat? 正确的图表无分层:g在移动平均线的导入期内带有“幽灵”条的图形:g + facet_grid(race~., scales="free_y")我的代码

我正在使用移动平均线来消除疫苗分配中的星期几影响,以查看按各种因素分层的总体趋势。我可以创建一个滚动平均值条形图,以正确显示整体数据。但是当我分层或创建构面时,高度下降的“幽灵”条形出现在导入期(应该没有条形)。我怎样才能避免这种情况?

正确的图表(无分层):g

enter image description here

在移动平均线的导入期内带有“幽灵”条的图形:g + facet_grid(race~.,scales="free_y")

enter image description here

我的代码

library(tidyverse)
# Make fake data: count of doses per day for 70 days,increasing over the 70 days,with a 50% variance per day-of-week
nPerDay <- floor(sample(5:10,70,replace=T) * (1 + ((1:70)*3/70)) * (.5 + (.5*(1:70 %% 7)/6)))
# Use that to create a data frame where one record is the administration of one dose,giving the dose,vaccine brand,1st or 2nd dose,pt race,& pt gender
doses <- data.frame(Admin_date = rep(as.Date("2020-12-31") + 1:70,nPerDay),whichDose = factor(c(rep(1,sum(nPerDay[1:30])),sample(1:2,sum(nPerDay[31:70]),replace=T))),gender=sample(c("F","M"),sum(nPerDay),replace=T),race=sample(LETTERS[1:5],c(.45,.25,.15,.1,.05),brand=sample(c("Pf","Mo"),replace=T)
)

# plot the doses administered each day,with stacked bars',with bars' color indicating # of 1st or second dose
(ggplot(data=doses,mapping=aes(x=Admin_date))#,fill=whichDose))
  + geom_bar(position = "stack")
  + geom_line(aes(y=..count..,fill=NULL),stat = "bin",binwidth=1)
)

# Change the bars in the prior plot into rolling 7-day averages,but keep the line as a daily total count.
g <- (
  ggplot(data=doses,fill=whichDose)) 
  + geom_bar(position = "stack",mapping = aes(y=zoo::rollmean(..count..,7,align="right",fill=NA)),stat="bin",binwidth=1
  )
  + geom_line(aes(y=..count..,binwidth=1)
  + labs(y="doses",fill="Which dose,\n7d avg count")
)
g # display this base graph

# explore tha data
g + facet_grid(race~.,scales="free_y") # See if the increasing trend and 1st vs 2nd dose porportions or similar across races.

我知道我可以通过创建一个中间数据框来避免这种情况,该框预先计算了我想要的分层的移动平均线。但是必须有一种方法可以在 R 中根据 https://yjunechoe.github.io/posts/2020-09-26-demystifying-stat-layers-ggplot2/after_stat() 函数动态执行此操作。但我想不通。我希望有一个简单的解决方案,我可以与我的工作团队分享,以便其他人(R 经验较少的人)可以向基本图添加分面函数以探索我们拥有的许多因素 - 我们拥有的不仅仅是性别、种族、品牌和whichDose。如果我能摆脱鬼条,他们可以添加这样的代码来获得其他分层:

# look at other stratifications
g + facet_grid(gender,scales="free_y")
g + facet_grid(race~brand,scales="free_y")
g + facet_grid(race~gender,scales="free_y")

解决方法

问题在于,在计算完统计数据之后,在统计数据之后进行的任何计算都不一定会考虑面板。这会给 zoo::rollmean 带来问题,因为它只看到单个值向量。因此,您必须按面板循环数据。

library(tidyverse)

nPerDay <- floor(sample(5:10,70,replace=T) * 
                   (1 + ((1:70)*3/70)) * (.5 + (.5*(1:70 %% 7)/6)))
doses <- data.frame(
  Admin_date = rep(as.Date("2020-12-31") + 1:70,nPerDay),whichDose = factor(c(rep(1,sum(nPerDay[1:30])),sample(1:2,sum(nPerDay[31:70]),replace=T))),gender=sample(c("F","M"),sum(nPerDay),replace=T),race=sample(LETTERS[1:5],c(.45,.25,.15,.1,.05),brand=sample(c("Pf","Mo"),replace=T)
)


ggplot(data=doses[order(doses$race,doses$Admin_date),],mapping=aes(x=Admin_date)) +
  geom_bar(position = "identity",mapping = aes(y=after_stat(
             unlist(lapply(split(count,PANEL),zoo::rollmean,7,align = "right",fill = NA))
           )),stat="bin",binwidth=1
  ) +
  geom_line(aes(y=..count..,fill=NULL),stat = "bin",binwidth=1) + 
  labs(y="doses",fill="Which dose,\n7d avg count") +
  facet_grid(race ~ .,scales = "free_y")
#> Warning: Removed 30 rows containing missing values (geom_bar).

reprex package (v1.0.0) 于 2021 年 2 月 20 日创建

,

类似于@teunbrand 的方法(他对问题的简明解释值得称赞,我没有什么可补充的)但是利用 dplyr 和辅助函数可以实现您想要的结果,例如所以:

library(tidyverse)

set.seed(42)

# Make fake data: count of doses per day for 70 days,increasing over the 70 days,with a 50% variance per day-of-week
nPerDay <- floor(sample(5:10,replace = T) * (1 + ((1:70) * 3 / 70)) * (.5 + (.5 * (1:70 %% 7) / 6)))
# Use that to create a data frame where one record is the administration of one dose,giving the dose,vaccine brand,1st or 2nd dose,pt race,& pt gender
doses <- data.frame(
  Admin_date = rep(as.Date("2020-12-31") + 1:70,replace = T))),gender = sample(c("F",replace = T),race = sample(LETTERS[1:5],brand = sample(c("Pf",replace = T)
)

my_rollmean <- function(count,group) {
  data.frame(group = group,count = count) %>% 
    group_by(group) %>% 
    mutate(roll = zoo::rollmean(count,fill = NA)) %>% 
    pull(roll)
}

# Change the bars in the prior plot into rolling 7-day averages,but keep the line as a daily total count.
g <- ggplot(data = doses,mapping = aes(x = Admin_date)) +
  geom_bar(
    position = "stack",mapping = aes(y = my_rollmean(..count..,..PANEL..)),binwidth = 1
  ) +
  geom_line(aes(y = ..count..,fill = NULL),binwidth = 1) +
  labs(y = "doses",fill = "Which dose,\n7d avg count")

# explore tha data
g + facet_grid(race ~ .,scales = "free_y") # See if the increasing trend and 1st vs 2nd dose porportions or similar across races.
#> Warning: Removed 30 rows containing missing values (position_stack).

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。