R：计算行总和MERSQI 分数，调整为缺失值/不适用类别

如何解决R：计算行总和MERSQI 分数，调整为缺失值/不适用类别

我想计算行的总和，包括对缺失数据的调整。

行总和是实际的“MERSQI”分数（对研究质量进行评分，每行 1 个研究）。每个 col 都是一个关于质量的问题，可以达到特定的最大点数。但是，在某些情况下，问题不适用于某些导致“缺失值”的研究。行总和应调整为标准分母18作为最大分数/行总和，即：（最大可实现点数=适用问题/列的最大可实现点总和）

MERSQI 总分 = 行总和 / 最大可达到点数 * 18

例如：

questions <- c(1,2,3,4,5,6,7,8,9,10) #number of question or col number
max_quest <- c(3,1.5,1,3) #maximum of every single question
study1 <- c(1.5,0.5,3) #points for every single questions for study1
study2 <- c(1,NA,3) # for study2
study3 <- c(2,3) #for study3
df <- rbind (questions,max_quest,study1,study2,study3)

对于 study1，我们将有一个行总和，结果得分为 10.5，并且没有缺失值。对于研究 2，我们的行总和为 10。我们有三个 NA，研究 2 的最高可达到的分数为 15（=18 分最高分 - NA 问题的 3*1 分），调整后的 MERSQI 分数为 12.85（=10*18/ 15)。对于研究 3：行总和 = 12.5，最大可实现点数 = 15.5 (=18 -(1.5+1+1))，调整后的 MERSQI 分数 = 15.53

您知道如何通过调整缺失值来计算行总和吗？也许遍历每一行，使用 forloop 和 if with is.na？

谢谢！

PS：MERSQI 分数的链接/解释：https://www.aliem.com/article-review-how-do-you-assess/ 和 https://pubmed.ncbi.nlm.nih.gov/26107881/

解决方法

向量的长度有问题。我编辑了数据集，使它们的长度均为 9，但这应该可行：

apply(mat[,3:5],2,FUN = function (x) {
        tot = sum(x,na.rm = TRUE)
        nas = which(is.na(x))
        total_max = sum(max_quest)
        if (!length(nas)) 
          return(tot)
        else
          return(tot * total_max / (total_max - sum(max_quest[nas])))
      })

数据：

questions <- c(1,3,4,5,6,7,8,9) #number of question or col number
max_quest <- c(3,1.5,1,3) #maximum of every single question
study1 <- c(1.5,0.5,3) #points for every single questions for study1
study2 <- c(1,NA,1) # for study2
study3 <- c(2,1) #for study3

## rename mat because cbind(...) of vectors returns matrix.
mat <- cbind (questions,max_quest,study1,study2,study3)

对于每个 study 列计算它的 sum 乘以 max_quest 的总和并除以 max_quest - NA 值。

library(dplyr)

val <- sum(df$max_quest)

df %>%
  summarise(across(starts_with('study'),~sum(.,na.rm = TRUE)* val/(val - sum(max_quest[is.na(.)]))))

数据

由于长度不兼容，共享的数据不完整。如果这些值是按列方式而不是按行方式，这也是有意义的。

questions <- c(1,9,10) 
max_quest <- c(3,3)
study1 <- c(1.5,0) 
study2 <- c(1,3)
study3 <- c(2,3)
df <- data.frame(questions,study3)

这可以通过矢量化来完成。

首先应用行总和并找到 NA 的数量：

row_sums <- apply(df,function(x) sum(x,na.rm=T))

row_NAs <- apply(df,function(x) sum(is.na(x)))

然后拉出研究和最高分：

studies <- row_sums[3:length(row_sums)]

max <- row_sums[2]

根据调整后的最大值计算 MERSQI，基于 NA：

adjusted_max <- rep(max,length(studies)) - row_NAs[3:length(row_NAs)]

MERSQI <- studies * max / adjusted_max