如何在ggplot2中应用带有分面的after_stat？正确的图表无分层：g在移动平均线的导入期内带有“幽灵”条的图形：g + facet_grid(race~., scales="free_y")我的代码

如何解决如何在ggplot2中应用带有分面的after_stat？正确的图表无分层：g在移动平均线的导入期内带有“幽灵”条的图形：g + facet_grid(race~., scales="free_y")我的代码

我正在使用移动平均线来消除疫苗分配中的星期几影响，以查看按各种因素分层的总体趋势。我可以创建一个滚动平均值条形图，以正确显示整体数据。但是当我分层或创建构面时，高度下降的“幽灵”条形出现在导入期（应该没有条形）。我怎样才能避免这种情况？

正确的图表（无分层）：g

在移动平均线的导入期内带有“幽灵”条的图形：g + facet_grid(race~.,scales="free_y")

我的代码

library(tidyverse)
# Make fake data: count of doses per day for 70 days,increasing over the 70 days,with a 50% variance per day-of-week
nPerDay <- floor(sample(5:10,70,replace=T) * (1 + ((1:70)*3/70)) * (.5 + (.5*(1:70 %% 7)/6)))
# Use that to create a data frame where one record is the administration of one dose,giving the dose,vaccine brand,1st or 2nd dose,pt race,& pt gender
doses <- data.frame(Admin_date = rep(as.Date("2020-12-31") + 1:70,nPerDay),whichDose = factor(c(rep(1,sum(nPerDay[1:30])),sample(1:2,sum(nPerDay[31:70]),replace=T))),gender=sample(c("F","M"),sum(nPerDay),replace=T),race=sample(LETTERS[1:5],c(.45,.25,.15,.1,.05),brand=sample(c("Pf","Mo"),replace=T)
)

# plot the doses administered each day,with stacked bars',with bars' color indicating # of 1st or second dose
(ggplot(data=doses,mapping=aes(x=Admin_date))#,fill=whichDose))
  + geom_bar(position = "stack")
  + geom_line(aes(y=..count..,fill=NULL),stat = "bin",binwidth=1)
)

# Change the bars in the prior plot into rolling 7-day averages,but keep the line as a daily total count.
g <- (
  ggplot(data=doses,fill=whichDose)) 
  + geom_bar(position = "stack",mapping = aes(y=zoo::rollmean(..count..,7,align="right",fill=NA)),stat="bin",binwidth=1
  )
  + geom_line(aes(y=..count..,binwidth=1)
  + labs(y="doses",fill="Which dose,\n7d avg count")
)
g # display this base graph

# explore tha data
g + facet_grid(race~.,scales="free_y") # See if the increasing trend and 1st vs 2nd dose porportions or similar across races.

我知道我可以通过创建一个中间数据框来避免这种情况，该框预先计算了我想要的分层的移动平均线。但是必须有一种方法可以在 R 中根据 https://yjunechoe.github.io/posts/2020-09-26-demystifying-stat-layers-ggplot2/ 和 after_stat() 函数动态执行此操作。但我想不通。我希望有一个简单的解决方案，我可以与我的工作团队分享，以便其他人（R 经验较少的人）可以向基本图添加分面函数以探索我们拥有的许多因素 - 我们拥有的不仅仅是性别、种族、品牌和whichDose。如果我能摆脱鬼条，他们可以添加这样的代码来获得其他分层：

# look at other stratifications
g + facet_grid(gender,scales="free_y")
g + facet_grid(race~brand,scales="free_y")
g + facet_grid(race~gender,scales="free_y")

解决方法

问题在于，在计算完统计数据之后，在统计数据之后进行的任何计算都不一定会考虑面板。这会给 zoo::rollmean 带来问题，因为它只看到单个值向量。因此，您必须按面板循环数据。

library(tidyverse)

nPerDay <- floor(sample(5:10,70,replace=T) * 
                   (1 + ((1:70)*3/70)) * (.5 + (.5*(1:70 %% 7)/6)))
doses <- data.frame(
  Admin_date = rep(as.Date("2020-12-31") + 1:70,nPerDay),whichDose = factor(c(rep(1,sum(nPerDay[1:30])),sample(1:2,sum(nPerDay[31:70]),replace=T))),gender=sample(c("F","M"),sum(nPerDay),replace=T),race=sample(LETTERS[1:5],c(.45,.25,.15,.1,.05),brand=sample(c("Pf","Mo"),replace=T)
)


ggplot(data=doses[order(doses$race,doses$Admin_date),],mapping=aes(x=Admin_date)) +
  geom_bar(position = "identity",mapping = aes(y=after_stat(
             unlist(lapply(split(count,PANEL),zoo::rollmean,7,align = "right",fill = NA))
           )),stat="bin",binwidth=1
  ) +
  geom_line(aes(y=..count..,fill=NULL),stat = "bin",binwidth=1) + 
  labs(y="doses",fill="Which dose,\n7d avg count") +
  facet_grid(race ~ .,scales = "free_y")
#> Warning: Removed 30 rows containing missing values (geom_bar).

^{由 reprex package (v1.0.0) 于 2021 年 2 月 20 日创建}

类似于@teunbrand 的方法（他对问题的简明解释值得称赞，我没有什么可补充的）但是利用 dplyr 和辅助函数可以实现您想要的结果，例如所以：

library(tidyverse)

set.seed(42)

# Make fake data: count of doses per day for 70 days,increasing over the 70 days,with a 50% variance per day-of-week
nPerDay <- floor(sample(5:10,replace = T) * (1 + ((1:70) * 3 / 70)) * (.5 + (.5 * (1:70 %% 7) / 6)))
# Use that to create a data frame where one record is the administration of one dose,giving the dose,vaccine brand,1st or 2nd dose,pt race,& pt gender
doses <- data.frame(
  Admin_date = rep(as.Date("2020-12-31") + 1:70,replace = T))),gender = sample(c("F",replace = T),race = sample(LETTERS[1:5],brand = sample(c("Pf",replace = T)
)

my_rollmean <- function(count,group) {
  data.frame(group = group,count = count) %>% 
    group_by(group) %>% 
    mutate(roll = zoo::rollmean(count,fill = NA)) %>% 
    pull(roll)
}

# Change the bars in the prior plot into rolling 7-day averages,but keep the line as a daily total count.
g <- ggplot(data = doses,mapping = aes(x = Admin_date)) +
  geom_bar(
    position = "stack",mapping = aes(y = my_rollmean(..count..,..PANEL..)),binwidth = 1
  ) +
  geom_line(aes(y = ..count..,fill = NULL),binwidth = 1) +
  labs(y = "doses",fill = "Which dose,\n7d avg count")

# explore tha data
g + facet_grid(race ~ .,scales = "free_y") # See if the increasing trend and 1st vs 2nd dose porportions or similar across races.
#> Warning: Removed 30 rows containing missing values (position_stack).

如何在ggplot2中应用带有分面的after_stat？ 正确的图表无分层：g在移动平均线的导入期内带有“幽灵”条的图形：g + facet_grid(race~., scales="free_y")我的代码