微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

将多列重塑为2个具有不同时间的timevars?

如何解决将多列重塑为2个具有不同时间的timevars?

我有以下数据框:

date         clinic   MALE_0_1   MALE_1_2   MALE_2_3   ...   MALE_94_95   MALE_95+   FEMALE_0_1   FEMALE_1_2   ...   FEMALE_95+
2017-01-01     A         30         25         40      ...       70          90          28            22      ...       40
2017-01-01     B         21         15         30      ...       45          27          31            40      ...       55
2017-02-01     C         29         35         45      ...       34          25          33            38      ...       45

如何创建这样的一个

date        clinic    GENDER      AGE    NUMBER_PATIENTS
2017-01-01     A      MALE       0          30
2017-01-01     A      FEMALE     0          28
2017-01-01     A      MALE       1          25
2017-01-01     A      FEMALE     1          22
                   ....
2017-01-01     A      MALE       95+        90
2017-01-01     A      FEMALE     95+        40
2017-01-01     B      MALE       0          21
2017-01-01     B      FEMALE     0          31
                   ....
2017-02-01     C      MALE       0          29
2017-02-01     C      FEMALE     0          33

MALE_0_1等于AGE = 0,MALE_1_2等于AGE = 1,等等。

下面的代码-我应该如何在times中同时包含FEMALE,“ GENDER”的MALE和“ AGE”的0:95?

df <- reshape(df,direction = "long",varying = list(names(df)[3:194]),v.names = "NUMBER_OF_PATIENTS",idvar = c("date","clinic"),timevar = c("GENDER","AGE"),times = ???)

解决方法

尝试这种接近您想要的方法:

library(tidyverse)
#Code
newdf <- df %>% 
  mutate(across(-date,~as.character(.))) %>%
  pivot_longer(-c(date,clinic)) %>%
  separate(name,c('Gender','V1','V2'),sep='_') %>%
  mutate(value=as.numeric(value))

输出:

# A tibble: 24 x 6
   date       clinic Gender V1    V2    value
   <date>     <chr>  <chr>  <chr> <chr> <dbl>
 1 2017-01-01 A      MALE   0     1        30
 2 2017-01-01 A      MALE   1     2        25
 3 2017-01-01 A      MALE   2     3        40
 4 2017-01-01 A      MALE   94    95       70
 5 2017-01-01 A      MALE   95.   NA       90
 6 2017-01-01 A      FEMALE 0     1        28
 7 2017-01-01 A      FEMALE 1     2        22
 8 2017-01-01 A      FEMALE 95.   NA       40
 9 2017-01-01 B      MALE   0     1        21
10 2017-01-01 B      MALE   1     2        15
# ... with 14 more rows
,

您可以指定要提取到pivot_longer中的模式。

tidyr::pivot_longer(df,cols = -c(date,clinic),names_to = c('GENDER','AGE'),names_pattern = '(.*?)_(\\d+\\+?)',values_to = 'NUMBER_PATIENTS')

#    date       clinic GENDER AGE   NUMBER_PATIENTS
#   <chr>      <chr>  <chr>  <chr>           <int>
# 1 2017-01-01 A      MALE   0                  30
# 2 2017-01-01 A      MALE   1                  25
# 3 2017-01-01 A      MALE   2                  40
# 4 2017-01-01 A      MALE   94                 70
# 5 2017-01-01 A      MALE   95+                90
# 6 2017-01-01 A      FEMALE 0                  28
# 7 2017-01-01 A      FEMALE 1                  22
# 8 2017-01-01 A      FEMALE 95+                40
# 9 2017-01-01 B      MALE   0                  21
#10 2017-01-01 B      MALE   1                  15
# … with 14 more rows

其中(.*?)_(\\d+\\+?)创建一个正则表达式模式以从两组列名中提取数据。第一组是第一个下划线之前的所有内容,第二组是带有可选+符号的数字。

数据

df <- structure(list(date = c("2017-01-01","2017-01-01","2017-02-01"
),clinic = c("A","B","C"),MALE_0_1 = c(30L,21L,29L),MALE_1_2 = c(25L,15L,35L),MALE_2_3 = c(40L,30L,45L),MALE_94_95 = c(70L,45L,34L),`MALE_95+` = c(90L,27L,25L),FEMALE_0_1 = c(28L,31L,33L),FEMALE_1_2 = c(22L,40L,38L),`FEMALE_95+` = c(40L,55L,45L)),class = "data.frame",row.names = c(NA,-3L))

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。