微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何获取具有几乎相同总体均值和标准差的子样本?

如何解决如何获取具有几乎相同总体均值和标准差的子样本?

如果这是我的数据框:

> length <- rep(11:17,200)
> mean(length)
[1] 14
> sd(length)
[1] 2.001

如何从数据框(长度)中随机抽取子样本但具有几乎相同的均值和标准差?

解决方法

您可以从长度上重复绘制,直到找到满足您要求的足够样本。它不漂亮,但很管用。

length <- rep(11:17,200)

# save mean and sd the subsamples should have
aimed_mean <- mean(length)
aimed_sd <- sd(length)

# set number of replications / iterations
n_replication <- 1000

# set size of sample
size_sample <- 40

# set desired number of samples
n_sample <- 3

# set deviation from mean and sd you can accept
deviation_mean <- 1.5
deviation_sd <- 1.5

# create empty container for resulting samples
samples <- list(n_replication)

# Repeatedly sample from length
i <- 0
sample_count <- 0

repeat {
  
  i <- i+1
  
  # take a sample from length
  sample_length <- sample(length,size_sample)
  
  # keep the sample when is is close enough
  if(abs(aimed_mean - mean(sample_length)) < deviation_mean &
  abs(aimed_mean - mean(sample_length)) < deviation_sd){
    
    samples[[i]] <- sample_length
    sample_count <- sample_count + 1
    
  }
  
  if(i == n_replication | sample_count == n_sample){
    break
  }
  
}

# your samples
samples

# test whether it worked
lapply(samples,function(x){abs(mean(x)-aimed_mean)<deviation_mean})
lapply(samples,function(x){abs(sd(x)-aimed_sd)<deviation_sd})

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。