如何解决如何在具有多个列,子列和统计测试的r中创建列联表?
数据
df <- structure(list(`Column 1` = c(4.6875,4.35625,4.62083333333333,3.625,4.125,4.16666666666667,4.41071428571429,3.78125,4.77083333333333,4.90625,4.75,3.85,4,4.78125,4.44791666666667,3.66666666666667,3.95833333333333,4.17916666666667,4.33333333333333,4.25634920634921
),`Column 2` = c(4.16666666666667,4.2,3.38888888888889,3.33333333333333,4.06666666666667,3.87857142857143,4.66666666666667,3.58333333333333,4.34722222222222,5,2.77777777777778,2.8,3.54166666666667,3.86666666666667,3.83888888888889),`Column 3` = c(4.42857142857143,4.2952380952381,4.57619047619048,3.64285714285714,4.26190476190476,3.6421768707483,3.17857142857143,4.76190476190476,4.58333333333333,4.85714285714286,4.22857142857143,4.14285714285714,4.53571428571429,4.31666666666667,3.71428571428571,4.09523809523809,3.48571428571429,4.19047619047619,3.83741496598639),`Column 4` = c(4.375,4.43333333333333,4.375,3.5,3.75,4.08333333333333,4.30952380952381,3.25,4.9375,4.875,3.88333333333333,4.25,4.4375,4.27083333333333,3.275,4.05),`Column 5` = c(5,4.27777777777778,4.44444444444444,4.5,3.77777777777778,4.38095238095238,4.91666666666667,4.13333333333333,4.55555555555556,4.65,3.26666666666667,4.77777777777778,4.3),`Column 6` = c(4.33333333333333,4.52222222222222,3.83333333333333,4.52380952380952,4.36111111111111,4.41666666666667,4.06666666666667
),`Column 7` = c(4.11538461538461,4.18461538461538,4.46153846153846,3.92307692307692,3.72727272727273,4.11666666666667,3.68131868131868,3.55128205128205,4.70512820512821,4.71153846153846,4.69871794871795,3.93333333333333,4.61538461538461,4.37121212121212,3.75925925925926,3.80555555555556,3.70512820512821,4.43589743589744,4.01813186813187
),`Column 8` = c(4,4.05833333333333,4.70833333333333,3.875,3.96428571428571,2.9375,4.6875,3.8375,4.1875,4.6,3.5625,3.96706349206349),`Column 9` = c(4.4,4.1,3.8,4.37142857142857,3.9,4.73333333333333,4.9,4.28,4.56,3.4,3.56,4.00444444444444),`Column 10` = c(4,4.22777777777778,4.45555555555556,3.7,4.26587301587302,3.91666666666667,4.94444444444444,4.79166666666667,4.81944444444444,4.11111111111111,4.61111111111111,4.88888888888889,4.08888888888889,4.19444444444444,4.43234126984127
),`Column 11` = c(4.33333333333333,4.05555555555556,4.71428571428571,3.88888888888889,3.22222222222222,3.44444444444444,4.11666666666667),`Column 12` = c(4.16666666666667,4.83333333333333,4.13888888888889,4.005291005291
),`Column 13` = c(4.22222222222222,4.43518518518519,4.31481481481481,3.63227513227513,4.72222222222222,4.84259259259259,4.07777777777778,4.2962962962963,4.69444444444444,4.40625,3.97916666666667,3.825,4.08086419753086),`Column 14` = c(4.11111111111111,4.22962962962963,4.3,4.07407407407407,3.94179894179894,4.75925925925926,3.80740740740741,3.98148148148148,3.35185185185185,3.82592592592593,4.22222222222222,4.18302469135802),`Column 15` = c(4.125,4.55833333333333,4.59166666666667,4.39285714285714,4.35833333333333,4.45833333333333,4.52916666666667),`Column 16` = c(4.21428571428571,4.31428571428571,4.5047619047619,3.85714285714286,3.90068027210884,3.53571428571429,4.90476190476191,4.89285714285714,3.87222222222222,4.61944444444444,3.51111111111111,4.23809523809524,4.28174603174603),`Column 17` = c(NaN,1.02777777777778,NaN,1.17142857142857,2.07142857142857,1.52777777777778,1.11111111111111,1.27777777777778,1,1.11428571428571,1.32258064516129,1.78125,1.69444444444444,2.05714285714286,1.48571428571429,2.22222222222222,1.38235294117647),`Topic 1` = c(1,0),`Topic 2` = c(0,`Topic 3` = c(0,1),`Topic 4` = c(1,`Topic 5` = c(1,`Topic 6` = c(1,`Topic 7` = c(1,`Topic 8` = c(1,1)),row.names = c(NA,-20L
),class = c("tbl_df","tbl","data.frame"))
数据概述
有17列名为列1到列17的列,还有8列名为主题1到主题8的列。实际上有超过15,000行,但为简单起见,我仅采样了20行。
Rows: 20
Columns: 25
$ `Column 1` <dbl> 4.687500,4.356250,4.620833,3.625000,4.125000,4.166667,4.410714,...
$ `Column 2` <dbl> 4.166667,4.200000,3.388889,4.000000,3.333333,4.066667,3.878571,...
$ `Column 3` <dbl> 4.428571,4.295238,4.576190,3.642857,4.261905,3.642177,...
$ `Column 4` <dbl> 4.375000,4.433333,4.375000,3.500000,3.750000,4.083333,4.309524,...
$ `Column 5` <dbl> 5.000000,4.277778,4.444444,4.500000,3.777778,4.380952,...
$ `Column 6` <dbl> 4.333333,4.522222,3.833333,4.523810,...
$ `Column 7` <dbl> 4.115385,4.184615,4.461538,3.923077,3.727273,4.116667,3.681319,...
$ `Column 8` <dbl> 4.000000,4.058333,4.708333,3.875000,4.250000,3.964286,...
$ `Column 9` <dbl> 4.400000,4.333333,4.600000,4.100000,3.800000,4.371429,...
$ `Column 10` <dbl> 4.000000,4.227778,4.455556,3.700000,4.265873,...
$ `Column 11` <dbl> 4.333333,4.055556,4.714286,...
$ `Column 12` <dbl> 4.166667,3.250000,...
$ `Column 13` <dbl> 4.222222,4.435185,4.314815,3.632275,...
$ `Column 14` <dbl> 4.111111,4.229630,4.300000,4.074074,3.941799,...
$ `Column 15` <dbl> 4.125000,4.558333,4.591667,4.392857,...
$ `Column 16` <dbl> 4.214286,4.314286,4.504762,3.857143,3.900680,...
$ `Column 17` <dbl> NaN,1.027778,1.171429,2.071429,1.527778,1.111111,1.277778,...
$ `Topic 1` <dbl> 1,0
$ `Topic 2` <dbl> 0,0
$ `Topic 3` <dbl> 0,1
$ `Topic 4` <dbl> 1,1
$ `Topic 5` <dbl> 1,1
$ `Topic 6` <dbl> 1,0
$ `Topic 7` <dbl> 1,0
$ `Topic 8` <dbl> 1,1
行中的第1列到第17列,主题1到主题8作为具有子列1
和0
的列,我想拥有N,平均值(std),中位数(IQR )中的每个组。像下面这样;
我对此有点挣扎。我曾经在R中使用TableOne
包,但是我也想不出如何获取子组。
我能做的最接近的是使用CreateTableOne,但仍然无法满足我的需求。
df.long <- df %>% pivot_longer(cols= "Topic 1":"Topic 8",names_to="Topics",values_to="state")
tabs <- CreateTableOne(vars = comps,strata = c("Topics","state"),data = df.long)
print(tabs,nonnormal=T )
我得到了这个,但不是我想要的。请注意,它按主题和状态分层,但在我要寻找的每个主题中并没有将它们组合在一起。
Stratified by Topics:state
Topic 1:0 Topic 2:0 Topic 3:0
n 10 13 14
Column 1 (median [IQR]) 4.29 [4.04,4.55] 4.45 [4.12,4.75] 4.39 [4.03,4.73]
Column 2 (median [IQR]) 3.71 [3.43,4.12] 4.17 [3.58,4.67] 4.03 [3.60,4.59]
Column 3 (median [IQR]) 4.17 [3.90,4.51] 4.32 [4.10,4.58] 4.29 [4.15,4.57]
Column 4 (median [IQR]) 4.17 [3.73,4.42] 4.38 [4.05,4.44] 4.26 [3.93,4.42]
Column 5 (median [IQR]) 4.44 [4.31,4.72] 4.56 [4.44,4.67] 4.60 [4.24,4.75]
Column 6 (median [IQR]) 4.21 [4.02,4.47] 4.36 [4.00,4.52] 4.33 [4.00,4.50]
Column 7 (median [IQR]) 4.10 [3.85,4.46] 4.18 [4.00,4.62] 4.24 [3.95,4.58]
Column 8 (median [IQR]) 4.17 [3.99,4.60] 4.25 [4.00,4.69] 4.22 [4.04,4.67]
Column 9 (median [IQR]) 4.20 [3.67,4.53] 4.40 [4.07,4.73] 4.37 [4.07,4.70]
Column 10 (median [IQR]) 4.25 [4.20,4.45] 4.43 [4.11,4.79] 4.15 [3.94,4.75]
Column 11 (median [IQR]) 4.09 [3.82,4.46] 4.33 [4.06,4.60] 4.33 [3.97,4.57]
Column 12 (median [IQR]) 4.39 [4.00,4.67] 4.36 [4.14,4.61] 4.43 [4.15,4.65]
Column 13 (median [IQR]) 4.29 [4.09,4.41] 4.30 [4.12,4.69] 4.31 [4.15,4.63]
Column 14 (median [IQR]) 4.20 [3.86,4.28] 4.23 [3.98,4.69] 4.17 [3.87,4.65]
Column 15 (median [IQR]) 4.63 [4.54,4.75] 4.59 [4.25,4.94] 4.47 [4.14,4.94]
Column 16 (median [IQR]) 4.30 [4.24,4.46] 4.33 [4.21,4.71] 4.30 [4.12,4.69]
Column 17 (median [IQR]) 1.43 [1.09,1.85] 1.43 [1.13,1.65] 1.43 [1.20,1.65]
Stratified by Topics:state
Topic 4:0 Topic 5:0 Topic 6:0
n 11 6 10
Column 1 (median [IQR]) 4.18 [3.82,4.70] 4.15 [4.03,4.31] 4.38 [4.19,4.67]
Column 2 (median [IQR]) 3.67 [3.39,4.17] 3.56 [3.43,4.05] 4.13 [3.91,4.59]
Column 3 (median [IQR]) 4.19 [3.68,4.56] 4.07 [3.61,4.26] 4.28 [3.94,4.48]
Column 4 (median [IQR]) 3.88 [3.54,4.41] 4.00 [3.39,4.39] 4.29 [4.06,4.44]
Column 5 (median [IQR]) 4.44 [4.15,4.67] 4.39 [4.19,4.54] 4.34 [4.17,4.66]
Column 6 (median [IQR]) 4.17 [4.00,4.54] 4.12 [4.02,4.31] 4.39 [4.02,4.52]
Column 7 (median [IQR]) 3.93 [3.78,4.54] 3.86 [3.71,4.14] 4.15 [3.95,4.55]
Column 8 (median [IQR]) 4.17 [3.71,4.44] 4.15 [3.69,4.31] 4.11 [3.96,4.50]
Column 9 (median [IQR]) 4.28 [3.73,4.67] 3.98 [3.82,4.27] 4.35 [4.14,4.69]
Column 10 (median [IQR]) 4.25 [4.00,4.53] 4.10 [3.96,4.20] 4.35 [4.06,4.77]
Column 11 (median [IQR]) 4.17 [3.82,4.62] 3.86 [3.83,4.01] 4.42 [4.13,4.69]
Column 12 (median [IQR]) 4.50 [4.17,4.67] 4.31 [4.01,4.46] 4.21 [4.00,4.55]
Column 13 (median [IQR]) 4.11 [4.02,4.56] 4.20 [3.90,4.29] 4.30 [4.08,4.62]
Column 14 (median [IQR]) 3.83 [3.81,4.50] 3.91 [3.83,4.17] 4.21 [3.97,4.67]
Column 15 (median [IQR]) 4.67 [4.41,4.84] 4.40 [4.19,4.70] 4.54 [4.37,4.94]
Column 16 (median [IQR]) 4.24 [3.86,4.61] 4.20 [3.67,4.33] 4.30 [3.99,4.69]
Column 17 (median [IQR]) 1.40 [1.25,1.66] 1.53 [1.16,2.00] 1.32 [1.11,1.53]
Stratified by Topics:state
Topic 7:0 Topic 8:0 Topic 1:1
n 11 5 10
Column 1 (median [IQR]) 4.17 [3.90,4.29] 4.12 [4.00,4.45] 4.29 [3.92,4.63]
Column 2 (median [IQR]) 3.67 [3.47,3.87] 3.58 [3.33,4.67] 4.03 [3.72,4.30]
Column 3 (median [IQR]) 4.10 [3.68,4.21] 4.14 [4.00,4.32] 4.25 [3.73,4.40]
Column 4 (median [IQR]) 4.05 [3.60,4.17] 4.25 [3.75,4.27] 4.18 [3.78,4.36]
Column 5 (median [IQR]) 4.33 [3.96,4.44] 4.56 [4.50,4.65] 4.44 [4.14,4.66]
Column 6 (median [IQR]) 4.00 [4.00,4.21] 4.00 [4.00,4.42] 4.25 [4.00,4.50]
Column 7 (median [IQR]) 3.93 [3.78,4.07] 4.00 [3.76,4.37] 4.02 [3.78,4.31]
Column 8 (median [IQR]) 3.97 [3.86,4.17] 4.33 [4.25,4.60] 4.08 [3.90,4.23]
Column 9 (median [IQR]) 4.07 [3.78,4.31] 4.07 [3.80,4.56] 4.33 [4.07,4.52]
Column 10 (median [IQR]) 4.19 [4.04,4.26] 4.25 [4.11,4.79] 4.00 [3.83,4.52]
Column 11 (median [IQR]) 4.12 [3.84,4.33] 3.89 [3.83,4.69]
Column 12 (median [IQR]) 4.00 [3.90,4.58] 4.14 [4.00,4.50] 4.25 [4.03,4.47]
Column 13 (median [IQR]) 4.08 [4.02,4.31] 4.30 [4.12,4.41] 4.17 [4.06,4.38]
Column 14 (median [IQR]) 3.94 [3.82,4.13] 3.98 [3.83,4.50] 4.01 [3.83,4.40]
Column 15 (median [IQR]) 4.46 [4.26,4.63] 4.46 [4.25,4.94] 4.30 [4.03,4.80]
Column 16 (median [IQR]) 4.24 [3.89,4.27] 4.33 [4.25,4.62] 4.15 [3.88,4.53]
Column 17 (median [IQR]) 1.49 [1.32,1.78] 1.78 [1.69,2.06] 1.43 [1.25,1.57]
Stratified by Topics:state
Topic 2:1 Topic 3:1 Topic 4:1
n 7 6 9
Column 1 (median [IQR]) 4.17 [3.82,4.26] 4.22 [4.01,4.33] 4.36 [4.17,4.45]
Column 2 (median [IQR]) 3.67 [3.47,3.87] 3.69 [2.99,3.87] 4.07 [3.84,4.20]
Column 3 (median [IQR]) 3.71 [3.56,4.21] 3.78 [3.66,4.03] 4.26 [4.00,4.32]
Column 4 (median [IQR]) 3.88 [3.43,4.08] 3.84 [3.59,4.24] 4.27 [4.08,4.38]
Column 5 (median [IQR]) 4.17 [3.96,4.36] 4.32 [4.28,4.37] 4.50 [4.30,4.65]
Column 6 (median [IQR]) 4.08 [4.00,4.25] 4.07 [4.02,4.40] 4.33 [4.00,4.42]
Column 7 (median [IQR]) 3.76 [3.69,4.03] 3.78 [3.72,3.96] 4.12 [4.00,4.18]
Column 8 (median [IQR]) 3.96 [3.70,4.17] 3.97 [3.68,4.04] 4.17 [4.00,4.33]
Column 9 (median [IQR]) 4.07 [3.73,4.31] 3.78 [3.44,4.25] 4.33 [4.07,4.40]
Column 10 (median [IQR]) 4.09 [3.96,4.22] 4.25 [4.23,4.26] 4.23 [4.00,4.43]
Column 11 (median [IQR]) 3.90 [3.82,4.33] 3.93 [3.53,4.10] 4.33 [4.06,4.50]
Column 12 (median [IQR]) 4.00 [3.96,4.50] 4.00 [4.00,4.28]
Column 13 (median [IQR]) 3.98 [3.73,4.20] 4.03 [3.86,4.10] 4.28 [4.12,4.31]
Column 14 (median [IQR]) 3.83 [3.81,4.01] 3.88 [3.71,4.12] 4.11 [3.98,4.23]
Column 15 (median [IQR]) 4.39 [4.18,4.60] 4.54 [4.48,4.64] 4.39 [4.17,4.56]
Column 16 (median [IQR]) 3.90 [3.70,4.24] 4.20 [3.96,4.27] 4.28 [4.21,4.33]
Column 17 (median [IQR]) 1.43 [1.29,1.92] 1.43 [1.18,1.91] 1.46 [1.11,1.72]
Stratified by Topics:state
Topic 5:1 Topic 6:1 Topic 7:1
n 14 10 9
Column 1 (median [IQR]) 4.37 [4.01,4.67] 4.15 [3.97,4.55] 4.69 [4.36,4.77]
Column 2 (median [IQR]) 3.94 [3.71,4.30] 3.47 [3.35,3.80] 4.35 [4.17,4.67]
Column 3 (median [IQR]) 4.25 [3.90,4.51] 4.12 [3.79,4.37] 4.43 [4.30,4.58]
Column 4 (median [IQR]) 4.18 [3.92,4.38] 3.92 [3.59,4.34] 4.43 [4.27,4.88]
Column 5 (median [IQR]) 4.44 [4.31,4.67] 4.47 [4.36,4.72] 4.67 [4.50,4.92]
Column 6 (median [IQR]) 4.33 [4.00,4.52] 4.12 [4.00,4.33] 4.42 [4.33,4.56]
Column 7 (median [IQR]) 4.12 [3.93,4.46] 3.90 [3.74,4.36] 4.37 [4.12,4.70]
Column 8 (median [IQR]) 4.17 [3.96,4.50] 4.17 [3.69,4.31] 4.25 [4.06,4.69]
Column 9 (median [IQR]) 4.35 [4.07,4.59] 3.98 [3.62,4.38] 4.56 [4.33,4.75]
Column 10 (median [IQR]) 4.26 [4.05,4.57] 4.15 [4.02,4.25] 4.61 [4.00,4.82]
Column 11 (median [IQR]) 4.33 [4.13,4.69] 3.86 [3.81,4.33] 4.50 [4.06,4.75]
Column 12 (median [IQR]) 4.26 [4.00,4.65] 4.42 [4.04,4.62] 4.36 [4.28,4.61]
Column 13 (median [IQR]) 4.27 [4.08,4.43] 4.17 [4.01,4.31] 4.41 [4.22,4.72]
Column 14 (median [IQR]) 4.15 [3.86,4.45] 3.91 [3.81,4.19] 4.50 [4.11,4.72]
Column 15 (median [IQR]) 4.56 [4.37,4.89] 4.53 [4.19,4.73] 4.94 [4.25,4.94]
Column 16 (median [IQR]) 4.26 [4.16,4.59] 4.23 [4.10,4.31] 4.62 [4.21,4.89]
Column 17 (median [IQR]) 1.43 [1.21,1.53] 1.78 [1.38,2.06] 1.28 [1.07,1.61]
Stratified by Topics:state
Topic 8:1 p test SMD
n 15
Column 1 (median [IQR]) 4.33 [4.06,4.65] 0.641 nonnorm 0.372
Column 2 (median [IQR]) 3.88 [3.60,4.18] 0.093 nonnorm 0.512
Column 3 (median [IQR]) 4.23 [3.74,4.48] 0.291 nonnorm 0.420
Column 4 (median [IQR]) 4.08 [3.75,4.40] 0.422 nonnorm 0.413
Column 5 (median [IQR]) 4.38 [4.15,4.67] 0.365 nonnorm 0.418
Column 6 (median [IQR]) 4.33 [4.03,4.51] 0.904 nonnorm 0.306
Column 7 (median [IQR]) 4.12 [3.86,4.45] 0.403 nonnorm 0.434
Column 8 (median [IQR]) 4.00 [3.86,4.18] 0.314 nonnorm 0.439
Column 9 (median [IQR]) 4.33 [4.04,4.50] 0.283 nonnorm 0.471
Column 10 (median [IQR]) 4.23 [4.00,4.44] 0.856 nonnorm 0.352
Column 11 (median [IQR]) 4.33 [3.98,4.50] 0.272 nonnorm 0.434
Column 12 (median [IQR]) 4.33 [4.00,4.64] 0.943 nonnorm 0.251
Column 13 (median [IQR]) 4.22 [4.07,4.38] 0.299 nonnorm 0.460
Column 14 (median [IQR]) 4.11 [3.83,4.26] 0.261 nonnorm 0.456
Column 15 (median [IQR]) 4.56 [4.24,4.75] 0.936 nonnorm 0.285
Column 16 (median [IQR]) 4.24 [3.89,4.41] 0.371 nonnorm 0.463
Column 17 (median [IQR]) 1.32 [1.14,1.51] 0.974 nonnorm 0.327
因此,重申一下,我想在桌子上做几件事。
- 第1列:第8列为行
- 主题1:主题8作为列 每个主题和次主题的
- N个
- 平均值(标准)
- 中位数(iqr)
- 对于每列,我还希望进行统计检验。也许kruskal-wallis测试?任何建议将不胜感激。
解决方法
这将以正确的结构获取信息。 (sub_topic
列只是宽格式表示的长格式。)
在每个列和主题中指定了state
值之后,您就可以执行所需的统计测试。
df %>%
pivot_longer(cols= "Column 1":"Column 17",names_to = "column",names_pattern = " (\\d+)$",values_to = "state") %>%
pivot_longer(-c(column,state),names_to = "topic",values_to = "sub_topic") %>%
group_by(topic,sub_topic,column) %>%
summarise(n = n(),m = mean(state),std = sd(state),median = median(state),iqr = IQR(state,na.rm = T)) %>%
pivot_longer(-c(column,topic,sub_topic)) %>%
pivot_wider(names_from = topic,names_prefix = "topic_",values_from = value) %>%
arrange(column,sub_topic) %>%
select(column,name,everything())
输出:
# A tibble: 170 x 11
# Groups: sub_topic [2]
column sub_topic name topic_1 topic_2 topic_3 topic_4 topic_5 topic_6 topic_7 topic_8
<chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 n 10 13 14 11 6 10 11 5
2 1 0 m 4.30 4.41 4.35 4.22 4.20 4.34 4.10 4.23
3 1 0 std 0.380 0.393 0.424 0.482 0.331 0.388 0.310 0.471
4 1 0 median 4.29 4.45 4.39 4.18 4.15 4.38 4.17 4.12
5 1 0 iqr 0.510 0.625 0.703 0.880 0.281 0.485 0.391 0.448
6 1 1 n 10 7 6 9 14 10 9 15
7 1 1 m 4.26 4.06 4.14 4.36 4.32 4.23 4.51 4.30
8 1 1 std 0.422 0.289 0.280 0.251 0.421 0.407 0.370 0.378
9 1 1 median 4.29 4.17 4.22 4.36 4.37 4.15 4.69 4.33
10 1 1 iqr 0.709 0.441 0.318 0.281 0.660 0.580 0.415 0.592
# … with 160 more rows
步骤:
- 将列旋转至长
- 将主题/子主题长一些
- 按主题-> sub_topic->列分组,执行汇总
- 更多地进行旋转以使话题再次扩大
- 安排行和列的顺序
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。