如何解决为混合模型选择哪个公式
编辑:我已在 Cross Validated 上发布了以下内容的更详细问题,因为有人告诉我以下内容更适合发布在那里。
我正在尝试针对重复数据编写混合模型,但我很难编写公式。
我的数据库由学校的不同考试成绩组成。 每行包含一个从 0 到 100 的结果。对于每个结果,我都知道学校、参加的年份和考试科目。我把学校分为两种(A和B);我的目标是衡量一所学校的类型(A 或 B)是否对其结果有影响。但是,正如我所说,我对每所学校都重复了测量,因为我按学校划分了 20 个年级,每年(2014 年至 2019 年)一个,四个科目各一个。 这是我的数据示例:
Result School_ID Year Subject School_type
1 19 1 2015 math A
2 35 1 2015 english A
3 4 1 2015 history A
4 16 1 2015 philosophy A
5 55 1 2016 math A
6 62 1 2016 english A
7 74 1 2016 history A
8 66 1 2016 philosophy A
9 32 1 2017 math A
10 16 1 2017 english A
11 42 1 2017 history A
12 52 1 2017 philosophy A
13 95 1 2018 math A
14 8 1 2018 english A
15 35 1 2018 history A
16 41 1 2018 philosophy A
17 12 1 2019 math A
18 40 1 2019 english A
19 56 1 2019 history A
20 65 2 2019 philosophy B
21 12 2 2015 math B
22 23 2 2015 english B
23 45 2 2015 history B
24 90 2 2015 philosophy B
25 3 2 2016 math B
26 66 2 2016 english B
27 51 2 2016 history B
28 26 2 2016 philosophy B
29 4 2 2017 math B
因为我有重复的数据,所以我知道我必须使用使用 lmer
的混合模型(例如)。
但是,我正在努力寻找一个好的公式。
首先我尝试了这个:
Result ~ School_type * Subject + (1|Year)
但我不确定这是一个好的公式。 我也试过这个
Result ~ School_type * Subject + (Year|School_ID)
但它没有给出任何显着的结果。起初我认为我的变量 School_type
对结果没有显着影响。这真是出乎意料。然后我想,把学校当成一个随机变量,这个公式抑制了学校对我成绩的影响,这显然会降低School_type的影响,因为它附属于学校。
换句话说,我想了解学校的特点对其结果的影响,但如果我“抑制”个人的影响,这里是学校,在随机效应中,我显然不会得到任何结果。
不过,我觉得我的第一个公式:
Result ~ School_type * Subject + (1|Year)
这给了我预期的结果,没有考虑到学校重复数据的事实。
根据要求,这里是 dput(my_data_frame)。
structure(list(result = c(69.9,72.4,67.1,84.4,84.9,68,78.1,65.1,69.9,77.5,80.2,84.7,89.3,82.6,63.8,40.8,72.2,71.4,77.2,79.9,93.5,65,67.7,91.8,79.6,73.4,80.9,85.7,66.5,84.6,80.8,87.3,94,87.8,86.2,36.6,37.6,18,30,34.8,32.5,21.9,29,22.7,47.3,70,60.8,42.1,18.6,49.2,33.9,34.9,47.1,29.2,34.5,70.3,56,67.8,60.9,50.3,40.4,20.8,45.4,57.7,26.5,40.1,49.1,52.4,22.8,46.5,42.4,54.1,51.3,27.2,42.9,61.6,45.1,86.5,69.2,58.4,46.8,77,46.9,73.1,50.1,61.4,75,75.4,53.2,71.9,49.5,27.4,48.3,51.9,68.8,69,44.6,39,59.3,70.9,80.1,73.5,77.9,57.3,76.8,67,63.2,89.8,79.5,70.8,78.5,79.4,80.5,72,68.6,91.7,75.6,77.8,73,85.3,64.6,88.2,76.9,88.5,76.6,81.8,26.6,30.9,27.9,33.5,27.8,8.7,31.8,23.5,54.8,42.7,46.1,49.6,27,35.2,32,62.9,56.6,70.2,44.3,39.3,37,24,52.2,44.2,30.4,33.8,65.7,42.8,36.5,43.1,49.7,34.3,10.4,47.4,43.4,58.5,93.2,81.7,66.3,76.2,68.2,45,46.7,57.4,89.6,81.6,73.8,61.1,43.2,41.9,74.8,56.9,71.5,59.6,39.8,76.3,67.3,54.5,78,92.6,64,75.2,62.3,75.9,87.1,64.8,79.3,85.4,81.3,74.4,90.6,68.7,71.3,92.4,72.1,86,90.8,66.6,62.8,85.6,80.6,49.4,28,45.6,35.6,38.1,38.8,32.2,26.7,55.2,26.8,21.8,55.4,44,65.2,27.6,51.1,63,32.9,52.3,44.8,58.1,15.2,53.3,25,57.8,51.7,59.4,83.6,74.2,63.4,47.2,69.4,87.9,78.7,70.7,34.6,42.3,54.7,62.1,33.6,51,31.1,72.5,66.9,70.5,63.6,96.6,78.4,57.1,59.9,61.7,65.9,90.1,83.2,83.5,83.8,87,67.2,60.7,76.1,82,82.2,48.1,33,13.5,35.7,42.6,23.6,35.4,41.6,24.6,78.8,73.6,41.1,68.5,38.6,23.9,55.6,67.9,41.3,50.6,44.9,46.4,54.6,41.5,53.6,81,37.2,48,56.8,77.4,59.2,77.3,63.5,72.7,40.2,66,58.3,80.4,72.8,54.2,54,65.3,68.4,79,54.4,84.8,74.9,64.9,74.1,76,91.2,81.2,71,64.5,84.3,83.3,85.1,38.5,46,44.1,49.3,37.9,26.9,36.9,32.3,45.2,43.3,29.4,40.5,46.3,28.7,31.7,66.1,49.9,55.7,64.4,88.3,53.4,60.6,57.2,62.6,65.5,66.7,52.7,56.3,73.7,34.7,50,48.2,59.8,53,53.7,52.8,90,64.1,2.5,57.9,42.2,53.1,51.5,80.7,61.8,54.9,69.7,48.8,59.5,58.7,54.3,83.9,52.1,55.1,63.3,57.5,68.3,47.7,70.4,61,89.2,60.3,50.2,83.1,64.7,58.8,62,48.5,64.3,58,78.9,59.7,59.1,96.5,47.9,60,39.7,63.1,44.7,56.4,56.2,42,47.5,38.4,41.7,48.4,53.5,53.9,47.8,74,61.5,47,48.9,74.6,52.9,64.8),school_ID = structure(c(1L,2L,3L,4L,5L,6L,7L,8L,9L,10L,11L,12L,13L,14L,15L,16L,17L,18L,19L,20L,21L,22L,23L,24L,25L,26L,27L,28L,29L,30L,31L,32L,33L,34L,35L,36L,37L,1L,37L),.Label = c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37"),class = "factor"),year = structure(c(5L,1L),.Label = c("2015","2016","2017","2018","2019"),subject = structure(c(1L,4L),.Label = c("Math","History","Philosiphy","English"),school_type = structure(c(1L,3L),.Label = c("A","B","C"),class = "factor")),row.names = c(NA,-664L),class = c("tbl_df","tbl","data.frame"))
解决方法
这可能更适合 CrossValidated,因为它更多地与您应该设置的模型类型有关,而不是如何设置,但这里是。
让我们研究一下您的模型。
result ~ school_type * subject + (1|year)
这表示结果取决于学校类型、学科及其相互作用(即学科之间的差异因学校类型而异),并且结果因年份的随机效应而异(即,您只对年份之间的差异,而不是在年份之间进行统计比较)。
问题:
- 这会警告您有一个“奇异拟合”——年间变异估计为零。这可能是因为您的样本中没有很多年,并且不一定是致命的,但确实表明模型可能不是最佳的(参见例如 here 和 here)
- 正如您暗示/建议的那样,此模型没有说明您对每个学校有多个测量值这一事实。它假设学校之间没有差异/学校内部没有相关性,这几乎肯定是一个问题。
result ~ school_type * subject + (year|school_ID)
除了上面列出的学校类型/学科相互作用之外,该模型还假设学校之间存在差异和年份之间的差异和年份的差异影响跨学校可能是相关的(例如,在第 1 年的影响高于平均水平的学校可能在第 2 年的平均影响也高于平均水平)。
- 这个模型也很单一
- 它(几乎)是这个观察性设计的最大模型;理论上,您还可以添加
(1|year)
以允许跨学校一致的逐年变化(当前模型仅允许跨学校的逐年变化) - 你说“没有什么是重要的”。这不是我所看到的(和这不是确定您是否拥有正确模型的好方法!令人失望,但完全有可能拥有正确的模型并且没有任何意义)。您的意思是学校类型的影响不显着(在传统的 p
例如,以下是应用于此拟合的 car::Anova()
结果:
Analysis of Deviance Table (Type II Wald chisquare tests)
Response: result
Chisq Df Pr(>Chisq)
school_type 5.3273 2 0.069692 .
subject 1030.9423 3 < 2.2e-16 ***
school_type:subject 22.1616 6 0.001132 **
请注意,在解释主效应的检验时,您应该非常小心:阅读 ?car::Anova
的“详细信息”部分(如果您将要进行类型 3 测试)。
一般来说,我还建议您在确定对模型结构感到满意之前不要查看显着性检验。
我要再推荐两个模型:
result ~ school_type * subject + (1|year) + (1|school_ID) + (1|year:school_ID)
这允许不同年份、学校之间的差异,以及学校内跨年份的独立差异。这是明智的(并且比上面的最大模型更简洁;它忽略了不同学校之间可能的逐年变化的相关性),但它仍然给出了一个单一的模型。
result ~ school_type * subject + year + (1|school_ID/year)
这会将年份效应从随机更改为固定,当您没有很多用于估计方差的水平时,这是一种合理的策略。 ((1|school_ID/year)
术语的意思是“学校内变化和年间变化嵌套在学校内”,与之前模型中其他两个随机效应术语的组合相同)
这个模型不是单一的。
所有模型除了第一个(忽略学校间变异)对模型的 school_type*subject
部分给出几乎相同的结果,这并不奇怪,因为我们基本上只是以略有不同的方式分割(学校×年份)的变化。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。