使用加权 t 检验汇总多列

如何解决使用加权 t 检验汇总多列

我有以下数据，想计算加权 p 值。我查看了 dplyr summarise multiple columns using t.test。但我的版本应该使用重量。我可以使用 Code2 来做到这一点。但是有超过 30 列。如何有效地计算加权 p 值？

代码 1

# A tibble: 877 x 5
   cat     population farms farmland weight
   <chr>        <dbl> <dbl>    <dbl>  <dbl>
 1 Treated       9.89  8.00     12.3  1    
 2 Control      10.3   7.81     12.1  0.714
 3 Control      10.2   8.04     12.4  0.156
 4 Control      10.3   7.97     12.1  0.340
 5 Control      10.9   8.87     12.7  2.85 
 6 Control      10.4   8.35     12.5  0.934
 7 Control      10.5   8.58     12.9  0.193
 8 Control      10.6   8.57     12.6  0.276
 9 Control      10.2   8.54     12.5  0.344
10 Control      10.5   8.76     12.6  0.625
# … with 867 more rows

代码 2

wtd.t.test(
  x = df$population[df$cat == "Treated"],y = df$population[df$cat == "Control"],weight = df$weight[df$cat == "Treated"],weighty = df$weight[df$cat == "Control"])$coefficients[3]

解决方法

我们可以将 summarise 与 across 一起使用

library(dplyr)
df %>%
   summarise(across(c(population:farmland),~ weights::wtd.t.test(x = .[cat == 'Treated'],y = .[cat == 'Control'],weight = weight[cat == 'Treated'],weighty= weight[cat == 'Control'])$coefficients[3]))

或者使用 lapply/sapply

sapply(df[2:4],function(v)
         weights::wtd.t.test(x = v[df$cat == "Treated"],y = v[df$cat == "Control"],weight = df$weight[df$cat == "Treated"],weighty = df$weight[df$cat == "Control"])$coefficients[3])