融化数据，保持某些列成对

如何解决融化数据，保持某些列成对

我有如下数据：

DT <- structure(list(ECOST = c("Choice_01","Choice_02","Choice_03","Choice_04","Choice_05","Choice_06","Choice_07","Choice_08","Choice_09","Choice_10","Choice_11","Choice_12"),control = c(18,30,47,66,86,35,31,46,55,39,41),treatment = c(31,68,36,32,42,52,58,43),control_p = c(0.163636363636364,0.272727272727273,0.427272727272727,0.6,0.781818181818182,0.318181818181818,0.281818181818182,0.418181818181818,0.5,0.354545454545455,0.372727272727273),treatment_p = c(0.319587628865979,0.360824742268041,0.474226804123711,0.701030927835051,0.88659793814433,0.371134020618557,0.329896907216495,0.43298969072165,0.536082474226804,0.402061855670103,0.597938144329897,0.443298969072165)),row.names = c(NA,-12L),class = c("tbl_df","tbl","data.frame"))

# A tibble: 12 x 5
   ECOST     control treatment control_p treatment_p
   <chr>       <dbl>     <dbl>     <dbl>       <dbl>
 1 Choice_01      18        31     0.164       0.320
 2 Choice_02      30        35     0.273       0.361
 3 Choice_03      47        46     0.427       0.474
 4 Choice_04      66        68     0.6         0.701
 5 Choice_05      86        86     0.782       0.887
 6 Choice_06      35        36     0.318       0.371
 7 Choice_07      31        32     0.282       0.330
 8 Choice_08      46        42     0.418       0.433
 9 Choice_09      55        52     0.5         0.536
10 Choice_10      39        39     0.355       0.402
11 Choice_11      55        58     0.5         0.598
12 Choice_12      41        43     0.373       0.443

我想融合这些数据，但我希望列 control 和 control_p 保持在一起，以及列 treatment 和 treatment_p 保持在一起，创建一个24 行 4 列的表格。

想要的结果：

# A tibble: 12 x 5
   ECOST     count   percentage  group
   <chr>       <dbl>     <dbl>   
 1 Choice_01      18     0.164   control
 2 Choice_02      30     0.273   control
 3 Choice_03      47     0.427   control
 4 Choice_04      66     0.6     control 
 5 Choice_05      86     0.782   control  
 6 Choice_06      35     0.318   control
 7 Choice_07      31     0.282   control   
 8 Choice_08      46     0.418   control  
 9 Choice_09      55     0.5     control   
10 Choice_10      39     0.355   control  
11 Choice_11      55     0.5     control    
12 Choice_12      41     0.373   control      
13 Choice_01      18     0.320   treatment
14 Choice_02      30     0.361   treatment
15 Choice_03      46     0.474   treatment
16 Choice_04      68     0.701   treatment
17 Choice_05      86     0.887   treatment
18 Choice_06      36     0.371   treatment
19 Choice_07      32     0.330   treatment
20 Choice_08      42     0.433   treatment
21 Choice_09      52     0.536   treatment
22 Choice_10      39     0.402   treatment
23 Choice_11      58     0.598   treatment
24 Choice_12      43     0.443   treatment

解决方法

使用 pivot_longer，进行一些数据整理，然后 pivot_wider 您可以像这样实现您想要的结果：

library(tidyr)
library(dplyr)

DT %>% 
  pivot_longer(-ECOST) %>% 
  separate(name,into = c("group","what")) %>% 
  mutate(what = ifelse(is.na(what),"count","percentage")) %>% 
  pivot_wider(names_from = "what",values_from = "value")


#> # A tibble: 24 x 4
#>    ECOST     group     count percentage
#>    <chr>     <chr>     <dbl>      <dbl>
#>  1 Choice_01 control      18      0.164
#>  2 Choice_01 treatment    31      0.320
#>  3 Choice_02 control      30      0.273
#>  4 Choice_02 treatment    35      0.361
#>  5 Choice_03 control      47      0.427
#>  6 Choice_03 treatment    46      0.474
#>  7 Choice_04 control      66      0.6  
#>  8 Choice_04 treatment    68      0.701
#>  9 Choice_05 control      86      0.782
#> 10 Choice_05 treatment    86      0.887
#> # … with 14 more rows

^{由 reprex package (v1.0.0) 于 2021 年 2 月 21 日创建}

您可以重命名列，以便明确区分 count 和 percentage 列，然后使用 pivot_longer

library(dplyr)
library(tidyr)


DT %>%
  rename_with(~paste(sub('_.*','',.),rep(c('count','percentage'),each = 2),sep = '_'),-1) %>%
  pivot_longer(cols = -ECOST,names_to = c('group','.value'),names_sep = '_')

# A tibble: 24 x 4
#   ECOST     group     count percentage
#   <chr>     <chr>     <dbl>      <dbl>
# 1 Choice_01 control      18      0.164
# 2 Choice_01 treatment    31      0.320
# 3 Choice_02 control      30      0.273
# 4 Choice_02 treatment    35      0.361
# 5 Choice_03 control      47      0.427
# 6 Choice_03 treatment    46      0.474
# 7 Choice_04 control      66      0.6  
# 8 Choice_04 treatment    68      0.701
# 9 Choice_05 control      86      0.782
#10 Choice_05 treatment    86      0.887
# … with 14 more rows

这是一种 data.table 方法，带有 workaround 用于 melt.data.table() 的限制/功能

library( data.table )
setDT(DT)
#get suffixes
suffix <- unique( sub( "(^.*)(_[a-z])","\\1",names( DT[,-1] ) ) )
#melt
DT2 <- melt( DT,id.vars = "ECOST",measure.vars = patterns( count = "[a-oq-z]$",percentage = "_p$"))
#replace factor-levels with the colnames
setattr(DT2$variable,"levels",suffix )

        ECOST  variable count percentage
 1: Choice_01   control    18  0.1636364
 2: Choice_02   control    30  0.2727273
 3: Choice_03   control    47  0.4272727
 4: Choice_04   control    66  0.6000000
 5: Choice_05   control    86  0.7818182
 6: Choice_06   control    35  0.3181818
 7: Choice_07   control    31  0.2818182
 8: Choice_08   control    46  0.4181818
 9: Choice_09   control    55  0.5000000
10: Choice_10   control    39  0.3545455
11: Choice_11   control    55  0.5000000
12: Choice_12   control    41  0.3727273
13: Choice_01 treatment    31  0.3195876
14: Choice_02 treatment    35  0.3608247
15: Choice_03 treatment    46  0.4742268
16: Choice_04 treatment    68  0.7010309
17: Choice_05 treatment    86  0.8865979
18: Choice_06 treatment    36  0.3711340
19: Choice_07 treatment    32  0.3298969
20: Choice_08 treatment    42  0.4329897
21: Choice_09 treatment    52  0.5360825
22: Choice_10 treatment    39  0.4020619
23: Choice_11 treatment    58  0.5979381
24: Choice_12 treatment    43  0.4432990
        ECOST  variable count percentage