如何解决多项T检验以找到主要效果的功能
df <- data.frame (rating1 = c(1,5,2,4,5),rating2 = c(2,1,2),rating3 = c(0,0),race = c("black","asian","white","black","white"),gender = c("male","female","male","female")
)
我想对组平均值(例如,等级1中的亚洲人平均值)和每个等级的总体平均值(例如,等级1)进行t检验。下面是我对亚洲人的等级为1的代码。
asian_df <- df %>%
filter(race == "asian")
t.test(asian_df$rating1,mean(df$rating1))
然后在等级2的黑人中奔跑
black_df <- df %>%
filter(race == "black")
t.test(black_df$rating2,mean(df$rating2))
我该如何编写一个使每个小组的t检验自动化的函数?到目前为止,我必须手动将变量名称更改为实质上针对每个种族,每个性别和每个等级(等级1至等级3)运行。谢谢!
解决方法
执行多次t检验会增加I型错误的风险,您将需要adjust for multiple comparisons才能使结果有效/有意义。您可以通过遍历变量来运行t检验,例如
library(tidyverse)
df <- data.frame (rating1 = c(5,8,7,9,6,5,5),rating2 = c(2,4,3,1,1),rating3 = c(0,2,race = c("asian","asian","black","white","black"),gender = c("male","female","male","male")
)
for (rac in unique(df$race)){
tmp_df <- df %>%
filter(race == rac)
print(rac)
print(t.test(tmp_df$rating1,rep(mean(df$rating1),length(tmp_df$rating1))))
}
[1] "asian"
Welch Two Sample t-test
data: tmp_df$rating1 and rep(mean(df$rating1),length(tmp_df$rating1))
t = 0.19518,df = 3,p-value = 0.8577
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.550864 2.884198
sample estimates:
mean of x mean of y
7.250000 7.083333
[1] "black"
Welch Two Sample t-test
data: tmp_df$rating1 and rep(mean(df$rating1),length(tmp_df$rating1))
t = -1.5149,df = 4,p-value = 0.2044
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.5022651 0.7355985
sample estimates:
mean of x mean of y
6.200000 7.083333
[1] "white"
Welch Two Sample t-test
data: tmp_df$rating1 and rep(mean(df$rating1),length(tmp_df$rating1))
t = 3.75,df = 2,p-value = 0.06433
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1842176 2.6842176
sample estimates:
mean of x mean of y
8.333333 7.083333
for (gend in unique(df$gender)){
tmp_df <- df %>%
filter(gender == gend)
print(gend)
print(t.test(tmp_df$rating1,length(tmp_df$rating1))))
}
[1] "male"
Welch Two Sample t-test
data: tmp_df$rating1 and rep(mean(df$rating1),length(tmp_df$rating1))
t = -2.0979,df = 5,p-value = 0.09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.4107761 0.2441094
sample estimates:
mean of x mean of y
6.000000 7.083333
[1] "female"
Welch Two Sample t-test
data: tmp_df$rating1 and rep(mean(df$rating1),length(tmp_df$rating1))
t = 3.5251,p-value = 0.01683
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.2933469 1.8733198
sample estimates:
mean of x mean of y
8.166667 7.083333
由于进行了多次测试(在本示例中为5次t检验),因此您出现假阳性的机率1 - (1 - 0.05)^5 = 22.62%
Bonferroni correction,该方法基本上会获取所需的p值(在这种情况下,p
另一种方法是使用ANOVA比较所有条件下的均值,然后使用Tukey的HSD确定哪些组不同。 Tukey的HSD是一个事后测试,因此您无需考虑多个测试,并且结果是有效的。使这种方法适应您的问题可能是更好的解决方法,例如
anova_one_way <- aov(rating1 + rating2 + rating3 ~ race + gender,data = df)
summary(anova_one_way)
Df Sum Sq Mean Sq F value Pr(>F)
race 2 266.70 133.35 14.01 0.00243 **
gender 1 140.08 140.08 14.72 0.00497 **
Residuals 8 76.13 9.52
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
TukeyHSD(anova_one_way)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = rating1 + rating2 + rating3 ~ race + gender,data = df)
$race
diff lwr upr p adj
black-asian -7.050000 -12.963253 -1.136747 0.0224905
white-asian 4.416667 -2.315868 11.149201 0.2076254
white-black 11.466667 5.029132 17.904201 0.0023910
$gender
diff lwr upr p adj
male-female -3.416667 -7.523829 0.6904958 0.0913521
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。