如何使用 sensobol 检查 Sobol 敏感度指数的收敛性？

如何解决如何使用 sensobol 检查 Sobol 敏感度指数的收敛性？

我想使用 sensobol 库，通过使用从原始样本中提取的大小递减的子样本重新计算敏感度指数来检查 Sobol 敏感度指数的收敛性。

在这里，我展示了一个使用 Ishigami 函数作为模型的示例代码。由于使用我实际使用的模型计算模型输出需要很长时间，因此我想避免重新计算不同样本大小的模型输出，但希望使用整个样本的子样本进行此检查。

我已经编写了运行通过的代码，但是，一旦样本大小不等于初始样本大小，结果似乎“不正确”。

初始设置

library(sensobol)

# Define settings
matrices <- c("A","B","AB","BA")
N <- 1000
params <- paste("X",1:3,sep = "")
first <- total <- "azzini"
order <- "first"
R <- 10
type <- "percent"
conf <- 0.95                                                                    

# Create sample matrix using Sobol' (1967) quasi-random numbers
mat <- sobol_matrices(matrices = matrices,N = N,params = params,order = order,type = "QRN")   

# Compute model output using Ishigami function as model
Y <- ishigami_Fun(mat)

将 Sobol 指数作为基准结果进行修正

# Compute and bootstrap Sobol' indices for entire sample N
ind <- sobol_indices(matrices = c("A","BA"),Y = Y,boot = TRUE,first = "azzini",total = "azzini",order = "first",R = R,type = type,conf = conf)
cols <- colnames(ind)[1:length(params)]
ind[,(cols):= round(.SD,3),.SDcols = (cols)]

检查收敛

现在，为了分析是否达到收敛，我想使用从原始样本中提取的大小递减的子样本重新计算敏感度指数

# function to compute sensitivity indices,depending on the sample size and the model output vector
fct_conv <- function(N,Y) {
  # compute how many model runs are performed in the case of the Azzini estimator
  nr_model_runs <- 2*N*(length(params)+1)   # length(params) = k
  
  # extract sub-sample of model output
  y_sub <- Y[1:nr_model_runs]
  
  # compute and bootstrap Sobol' indices
  ind_sub <- sobol_indices(matrices = c("A",Y = y_sub,conf = conf)
  cols <- colnames(ind_sub)[1:length(params)]
  ind_sub[,.SDcols = (cols)]
  
  return(ind_sub)
}

让我们将基准测试结果 (ind) 与其他两个输出进行比较：使用完整样本 (fct_conv) 运行 ind_full_sample 和使用稍微减少的样本运行 fct_conv (ind_red_sample)。

ind_full_sample <- fct_conv(1000,Y)
ind_red_sample  <- fct_conv(999,Y)

ind
ind_full_sample
ind_red_sample

看来，一旦样本量减少，结果就没有意义了。这是为什么？我很乐意提供任何提示或想法！

解决方法

结果没有意义，因为您在抽样时没有考虑样本矩阵的顺序。试试下面的

# Load the required packages:
library(sensobol) 
library(data.table)
library(ggplot2)

# Create function to swiftly check convergence (you do not need bootstrap)

sobol_convergence <- function(Y,N,sample.size,seed = 666) {
  dt <- data.table(matrix(Y,nrow = N))
  set.seed(seed) # To permit replication
  subsample <- unlist(dt[sample(.N,sample.size)],use.names = FALSE)
  ind <- sobol_indices(matrices = matrices,Y = subsample,N = sample.size,params = params,first = first,total = total,order = order)
  return(ind)
}

# Define sequence of sub-samples at which you want to check convergence

sample.size <- seq(100,1000,50) # every 50

# Run function

ind.list <- lapply(sample.size,function(n) 
  sobol_convergence(Y = Y,N = N,sample.size = n))

# Extract total number of model runs C and results in each run

Cost <- indices <- list()
for(i in 1:length(ind.list)) {
  Cost[[i]] <- ind.list[[i]]$C
  indices[[i]] <- ind.list[[i]]$results
}

names(indices) <- Cost

# Final dataset

final.dt <- rbindlist(indices,idcol = "Cost")[,Cost:= as.numeric(Cost)]

# Plot results 

ggplot(final.dt,aes(Cost,original,color = sensitivity)) +
  geom_line() +
  labs(x = "Total number of model runs",y = "Sobol' indices") +
  facet_wrap(~parameters) + 
  theme_bw()