对数据框的每一行运行 wilcoxon rank sum test

如何解决对数据框的每一行运行 wilcoxon rank sum test

我在一个数据框中有大量生物数据，如下所示。每行都有条件、标识符（板和孔）和预期表型 (EP) 和观察表型 (OP) 的 3 个重复。

我想用 Wilcoxon 秩和检验的 p 值添加一列，测试每行/孔的 EP 和 OP 是否彼此显着不同。

head(df)

  Temp Plate Well      EP1      EP2      EP3    OP1    OP2    OP3
1 30°C    31  A01 1.395874 1.323633 1.130804 0.1352 0.1632 0.1130
2 30°C    31  A02 1.449596 1.501810 1.111663 1.1474 1.1314 1.0628
3 30°C    31  A03 1.332983 1.416245 1.081833 1.0604 1.0947 1.0790
4 30°C    31  A04 1.333371 1.556057 1.091200 0.9786 1.0009 1.0127
5 30°C    31  A05 1.362556 1.343878 1.042433 1.0152 1.0534 1.0143
6 30°C    31  A06 1.542448 1.430897 1.031030 1.0266 1.0076 0.9785

我找到了这些帖子：Run a wilcox function for each row in each group 和 Trying to run many anovas and get an F value for each row，但我似乎无法将它们放在一起并制作一个有效的脚本。我发现第一个链接中的 mapply() 函数完全无法理解，而且我无法弄清楚如何在第二个链接中获得 Wilcox 测试而不是 f.stat。

任何帮助将不胜感激。谢谢！

解决方法

首先让我们使用 dput(head(df)) 将数据以更简单的格式放置在 R 中：

df <- structure(list(Temp = c("30°C","30°C","30°C"),Plate = c(31L,31L,31L),Well = c("A01","A02","A03","A04","A05","A06"),EP1 = c(1.395874,1.449596,1.332983,1.333371,1.362556,1.542448),EP2 = c(1.323633,1.50181,1.416245,1.556057,1.343878,1.430897),EP3 = c(1.130804,1.111663,1.081833,1.0912,1.042433,1.03103),OP1 = c(0.1352,1.1474,1.0604,0.9786,1.0152,1.0266),OP2 = c(0.1632,1.1314,1.0947,1.0009,1.0534,1.0076),OP3 = c(0.113,1.0628,1.079,1.0127,1.0143,0.9785)),class = "data.frame",row.names = c("1","2","3","4","5","6"))

现在单行的 wilcox.test 是

wilcox.test(unlist(df[1,4:6]),unlist(df[1,7:9]))
# 
#   Wilcoxon rank sum exact test
# 
# data:  unlist(df[1,4:6]) and unlist(df[1,7:9])
# W = 9,p-value = 0.1
# alternative hypothesis: true location shift is not equal to 0

仅获取 p 值：

wilcox.test(unlist(df[1,7:9]))$p.value
# [1] 0.1

所以我们可以使用 apply() 来获取所有的行：

p <- apply(df[,4:9],1,function(x) wilcox.test(x[1:3],x[4:6])$p.value)
p
#   1   2   3   4   5   6 
# 0.1 0.4 0.2 0.1 0.2 0.1