R - 一个比例多列重新编码

如何解决R - 一个比例多列重新编码

我和一位研究员正在努力想办法让我们的数据框更干净、更整洁。这是一个reprex：

> head(Dummy1)
# A tibble: 6 x 18
     A0    A1    A2    A3    A4    A5    B0    B1    B2    B3    B4    B5    C0    C1    C2    C3    C4
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     0     0     0     0     0     1     0     0     0     0     0     1     0     0     0     0     0
2     0     0     0     0     1     0     0     0     0     0     1     0     0     0     0     0     1
3     0     0     0     1     0     0     0     0     0     1     0     0     0     0     0     1     0
4     0     0     1     0     0     0     0     0     1     0     0     0     0     0     1     0     0
5     0     1     0     0     0     0     0     1     0     0     0     0     0     1     0     0     0
6     1     0     0     0     0     0     1     0     0     0     0     0     1     0     0     0     0
# … with 1 more variable: C5 <dbl>
>

由于我们的软件注册答案的方式，我们得到了 A0 到 A5、B0 到 B5 等，而不是这样：

> head(Dummy2)
# A tibble: 6 x 3
      A     B     C
  <dbl> <dbl> <dbl>
1     5     5     5
2     4     4     4
3     3     3     3
4     2     2     2
5     1     1     1
6     0     0     0
>

是否有代码可以让我们将第一个版本，每个可能的答案作为一个带有二进制 0 NO 1 YES 的列转换为带有数字结果的单个项目列？我们试图分析的量表有 50 多个项目，每个项目的范围从 0 到 8。

感谢您的帮助！

解决方法

您可以使用 split.default 在一个数据框中拆分所有相同的组列。使用 sapply 和 max.col 获取每行中具有最高值的列号。我做了 -1，因为您的列号以 0 开头。

sapply(split.default(Dummy1,sub('\\d+','',names(Dummy1))),max.col) - 1

sub('\\d+',names(Dummy1)) 从列名中删除数字，以便它们返回 "A" "A" "A" "A" "A" "A" "B" "B" "B" "B"......，它用作在 split.default 中拆分的组。

你也可以试试这个：

library(tidyverse)

d1 %>% 
  pivot_longer(cols=everything(),names_to='col') %>% 
  # to longer data by taking all columns into 'col',the default for values column is value here,you can change that name,I am sticking with default value
  filter(value != 0) %>% 
  # keep only values having non zero status
  mutate(newval = as.numeric(str_extract(col,'\\d+$')),col = str_replace(col,'\\d+','')) %>% 
  ## replace original col by removing their numbers and create another column by only taking the numbers
  select(-value) % >% 
  # removing value column created as its a constant and converting back to wide data then unnesting every column
  pivot_wider(names_from = col,values_from =newval,values_fn = list) %>% 
  unnest(everything())

输入数据：

d1 <- data.frame(A0 = c(0,1),A1 = c(0,1,0),A2 = c(0,A3 = c(0,A4 = c(0,A5 = c(1,B0 = c(0,B1 = c(0,B2 = c(0,B3 = c(0,B4 = c(0,B5 = c(1,C0 = c(0,C1 = c(0,C2 = c(0,C3 = c(0,C4 = c(0,C5 = c(1,0))

输出：

# A tibble: 6 x 3
      A     B     C
  <dbl> <dbl> <dbl>
1     5     5     5
2     4     4     4
3     3     3     3
4     2     2     2
5     1     1     1
6     0     0     0