如何从模型预测中解释 confci 函数？为什么这可能与 ggplot2 上的引导 CI 图不匹配？

如何解决如何从模型预测中解释 confci 函数？为什么这可能与 ggplot2 上的引导 CI 图不匹配？

我正在尝试协调 ggplot 上看到的置信区间（通过使用自举 CI）和我从 lmer 模型计算 CI 时的置信区间。我不确定如何计算 CI。那么我将如何使用新的均值和预测的 CI 绘制原始点？

set.seed(111)
oviposition.index <- rnorm(20,2,1.3)
species <- rep(c("A","B"),each = 10)
month <- rep(c("Jan","Feb"),times = 10)
plot <- rep(c("1","2"),times = 10)
df <- data.frame(oviposition.index,species,month,plot)

mod <- lmer(oviposition.index ~ species + (1|month/plot),df)
summary(mod)
confint(mod)

模型摘要和置信区间

Fixed effects:
            Estimate Std. Error      df t value Pr(>|t|)  
(Intercept)   1.1303     0.3684  3.5822   3.068   0.0432 *
speciesB      0.8198     0.5131 17.0000   1.598   0.1285 



                2.5 %   97.5 %
.sig01       0.0000000 1.130819
.sig02       0.0000000 1.130819
.sigma       0.8232305 1.540289
(Intercept)  0.3600376 1.900525
speciesB    -0.1836920 1.823253

我的看法：物种 A：

下 CI = 1.1303 - 0.3600376 = 0.7702624（与图表不匹配）

上 CI = 1.1303 + 1.900525 = 3.030825（不匹配图表）

物种 B：

下 CI = 0.8198 - (-0.1836920 )= 1.003492（大致匹配图表）

上置信区间 = 0.8198 + (1.823253) = 2.643053（大致匹配图表）

剧情展示

ggplot(df,aes(x = species,y = oviposition.index,color = species)) + geom_point() +
   geom_hline(yintercept = 1) + 
   stat_summary(fun.data=mean_cl_boot,geom="errorbar",width=0.2,colour="black") + 
   stat_summary(fun = mean,color = "black",geom ="point",size = 3,show.legend = FALSE)

解决方法

还有几点要添加到@Nate 的答案中。

认为 Hmisc 的 bootstrap 函数（mean_cl_boot 使用的）是错误的，因为它没有考虑分组结构的想法基本上是正确的。

我稍微修改了您的拟合函数，以便更方便地查看物种 A 的置信区间（通过在公式中包含 -1 来抑制截距。我也尝试了使用和不使用 lmerTest ，以便在下面更详细地讨论一些比较。

library(lme4)
mod0 <- lmer(oviposition.index ~ species-1 + (1|month/plot),df)
library(lmerTest)
mod1 <- as(mod0,"lmerModLmerTest")

library(broom.mixed)

f <- function(m,mod = mod0,...) {
  tt <- tidy(mod,conf.int = TRUE,effects = "fixed",conf.method = m,...)
  as.data.frame(tt)[1,c("estimate","conf.low","conf.high")]
}
ctab <- rbind(
    hmboot = Hmisc::smean.cl.boot(oviposition.index[1:10]),hmwald = Hmisc::smean.cl.normal(oviposition.index[1:10]),wald = f("Wald"),wald_t_satt = f("Wald",mod1),wald_t_kr = f("Wald",mod1,ddf.method = "Kenward-Roger"),profile = f("profile"),pboot = f("boot")
)
print(ctab,digits =3)

Wald 检验基于 ML 估计中似然曲面的估计曲率。它通常最快但最不准确；它总是提供对称 CI。它可以基于估计的正态抽样分布的假设或基于 t 分布；如果是后者，那么您需要指定一些近似 t 分布的“自由度”参数的方法。
轮廓似然基于测量整个似然面。它比 Wald 更可靠（也更慢），但不会考虑小样本量
参数引导是最可靠但最慢的方法。它基于从模型模拟新数据集。

这里的结论是，这些方法都大约为 CI 提供了相同的估计。 naive bootstrap（如您在上面使用的）给出了（稍微）最窄的 CI，而具有 Kenward-Roger 自由度的 Wald 估计给出了最宽的（可能过于保守，因为参数 bootstrap (pboot) 可能给出最佳答案）。（Satterthwaite ddf 近似在本例中完全失效。）

            estimate conf.low conf.high  
hmboot          1.13   0.4397      1.68 ## naive bootstrap
hmwald          1.13   0.4005      1.86 ## naive Wald (t-distrib)
wald_lmer       1.13   0.4082      1.85 ## mixed-model Wald (Z-distrib)
wald_t_satt     1.13      NaN       NaN ## mixed-model Wald (Satterthwaite)
wald_t_kr       1.13   0.0586      2.20 ## mixed-model Wald (Kenward-Roger)
profile         1.13   0.3600      1.90 ## likelihood profile CI
pboot           1.13   0.4111      1.82 ## parametric bootstrap CI

如果我们更高级一点（下面的代码），我们可以获得两组的 CI：

library(Hmisc)
f <- function(m,w = 1:2,...)
  tt[1:2,c("term","estimate","conf.high")]
}

h <- function(sfun) {
  tab <- do.call(rbind,lapply(split(df,species),function(d) sfun(d$oviposition.index)))
  tab <- data.frame(term = paste0("species",c("A","B")),setNames(as.data.frame(tab),"conf.high")))
  return(tab)
}
h(smean.cl.normal)

tab2 <- dplyr::bind_rows(list(
    hmisc_boot = h(smean.cl.boot),hmisc_normal = h(smean.cl.normal),wald_lmer = f("Wald"),boot = f("boot")),.id = "method")

tab2$method <- factor(tab2$method,levels = unique(tab2$method))
ggplot(tab2,aes(x=term,y = estimate,colour = method)) +
  geom_pointrange(aes(ymin=conf.low,ymax = conf.high),position = position_dodge(width=0.25)) +
  geom_hline(yintercept = 1,lty = 2)

置信区间不会相同，因为您的混合效应模型具有 ggplot（实际上是 Hmisc）引导 CI 函数没有的分组变量。最终，这会导致混合效应模型在这种情况下估计更多的错误，我们在 CI 中看到了这一点。

也就是说 lmer 的 CI 与您已经绘制的接近。 groupA（截距）为 1.1303 亩和 0.3684 se，而 groupB 为 ~1.94 亩（1.13 + 0.81）和更大的方差为 0.5131 se。我认为您对组差异的解释不会随着任何一种 CI 计算而改变。