微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

R:选择项目样本,同时控制多个变量的差异

如何解决R:选择项目样本,同时控制多个变量的差异

我有一个数据集,其中包含属于 3 个组(A、H 或 V)的单词列表和 2 个连续变量(词长和词频):

mydata = structure(list(word = c("elastisch","rutschig","verklebt","dumpf","hallend","formbar","gelb","braun","blond","klebrig","blass","blendend","schlaff","bunt","singend","lauwarm","strahlend","biegsam","durchsichtig","verbal","erleuchtet","schrill","erloschen","dehnbar","beige","farbig","gepunktet","heiser","musikalisch","schweigend","schreiend","schwer","transparent","flackernd","blinkend","stumpf","gedimmt","lautlos","gefleckt","pappig","feucht","stumm","eisig","taub","steif","weich","leise","kalt","fein","laut","warm","still"),group = c("H","H","A","V","A"),length = c(9L,8L,5L,7L,4L,9L,12L,6L,10L,11L,5L),frequency = c(1.114,1.519,1.176,0.903,1.079,2.328,2.305,2.255,1.716,2.199,1.944,1.505,1.724,1.146,1.699,1.255,1.633,1.204,1.591,1.23,1.531,1.041,1.447,1.477,1.544,1.845,3.72,1,0.699,1.756,0.301,1.982,0.477,2.241,2.064,1.431,2.718,2.236,2.651,2.877,3.311,2.838,3.333,2.937,3.435)),class = "data.frame",row.names = c(NA,-52L))

现在我需要从每个组(A、V 和 H)中选择 5 个项目的子样本,以便这 3 个新子样本之间的长度和频率差异尽可能小,理想情况下不具有统计显着性。我通常手动执行此操作并且需要很多时间,但是有什么方法可以使此过程自动化?感谢您提供任何提示/想法。

解决方法

好吧,安迪·埃格斯 (Andy Eggers) 上面建议的一种非优雅的蛮力方法是选择随机样本直到满足条件,例如:

cycle = 1

repeat {
  
  print(cycle)
  cycle = cycle+1
  
  subsample = mydata %>% group_by(group) %>% slice_sample(n = 5) ## how many items should be selected from each group
  
  res.aov.freq <- aov(frequency ~ group,data = subsample)
  res.aov.freq.p = anova(res.aov.freq)$"Pr(>F)"[1] ## save ANOVA p-value for frequency
  
  res.aov.len <- aov(length ~ group,data = subsample)
  res.aov.len.p = anova(res.aov.len)$"Pr(>F)"[1] ## save ANOVA p-value for length
  
  cond = (res.aov.freq.p > .05)&
    (res.aov.len.p > .05) ## set required p-values for both variables

  if ((cond == TRUE)|
      (cycle == 1000)){ ## after how many cycles the script should stop if no solution found
    
    break
    
  }
}
,

原来有一个特殊的R包来解决这个问题(LexOPS): https://jackedtaylor.github.io/LexOPSdocs/index.html

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。