微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何将每行可变数量的列转换为 R 中的多行?

如何解决如何将每行可变数量的列转换为 R 中的多行?

这是我的问题:我有一个单独的数据库:1 行 = 1 人。对于每个人,都有一个唯一标识符(“INAMI_key”)、单独的变量(下面示例中的“code_qualif”)和一个或多个通过不同列填充的地址。地址的数量在“n_addresses”变量中表示:1 表示一个地址,2 表示两个地址,等等。不同的地址在变量“travail_ruex”(街道)和“travail_code_postalx”(邮政编码)中表示。这是它的样子:

   INAMI_key code_qualif n_adresses travail_rue1                       travail_code_postal1 travail_rue2                       travail_code_postal2 travail_rue3                       travail_code_postal3 travail_rue4                       travail_code_postal4 travail_rue5                       travail_code_postal5
 1 30000120  001         02         "RUE VAN ARTEVELDE               " " 1000"              "paul pastur                     " " 6180"              ""                                 ""                   ""                                 ""                   ""                                 ""                  
 2 30000417  001         01         "av Margueritr depasse           " " 6060"              ""                                 ""                   ""                                 ""                   ""                                 ""                   ""                                 ""                  
 3 37603435  007         01         "du Grand Veneur                 " " 1170"              ""                                 ""                   ""                                 ""                   ""                                 ""                   ""                                 ""                  
 4 38300152  007         02         "RUE WAUTERS                92-94" " 6040"              "de châtelet                     " " 6120"              ""                                 ""                   ""                                 ""                   ""                                 ""                  
 5 38707849  001         03         "de Campine                      " " 4000"              "de Chestret                     " " 4000"              "de la Gare                      " " 4020"              ""                                 ""                   ""                                 ""                  
 6 38813856  001         03         "Torhoutste steenweg             " " 8400"              "Lage Kaart                      " " 2930"              "De Vrièrestraat                 " " 8301"              ""                                 ""                   ""                                 ""                  
 7 38811084  001         04         "chaussée de Waterloo            " " 1180"              "avenue Napoléon                 " " 1420"              "rue Léon Théodor                " " 1090"              "avenue de la Basilique          " " 1081"              ""                                 ""                  
 8 39105054  001         04         "EMILE  CLAUS                    " " 1050"              "RUE DU FOYER SCHAERBEEKOIS      " " 1030"              "RUE XAVIER DE BUE               " " 1180"              "BV LAmberMONT                   " " 1030"              ""                                 ""                  
 9 39117031  001         05         "KERKSTRAAT                      " " 3850"              "Pater Richard van de Wouwerstraa" " 3271"              "Wilderenlaan                    " " 3803"              "Gyzevennestraat                 " " 3560"              "Molenveldstraat                 " " 3500"             
10 31823918  070         05         "Route de l'Etat                 " " 1380"              "Avenue Paul Hymans              " " 1200"              "Avenue WInston Churchill        " " 1180"              "Avenue Winston Churchill        " " 1180"              "avenue hippocrate               " " 1200"             

这里有在 R 中导入此示例的代码

structure(list(INAMI_key = c("30000120","30000417","37603435","38300152","38707849","38813856","38811084","39105054","39117031","31823918"),code_qualif = c("001","001","007","070"),n_adresses = c("02","01","02","03","04","05","05"),travail_rue1 = c("RUE VAN ARTEVELDE               ","av Margueritr depasse           ","du Grand Veneur                 ","RUE WAUTERS                92-94","de Campine                      ","Torhoutste steenweg             ","chaussée de Waterloo            ","EMILE  CLAUS                    ","KERKSTRAAT                      ","Route de l'Etat                 "),travail_code_postal1 = c(" 1000"," 6060"," 1170"," 6040"," 4000"," 8400"," 1180"," 1050"," 3850"," 1380"),travail_rue2 = c("paul pastur                     ","","de châtelet                     ","de Chestret                     ","Lage Kaart                      ","avenue Napoléon                 ","RUE DU FOYER SCHAERBEEKOIS      ","Pater Richard van de Wouwerstraa","Avenue Paul Hymans              "),travail_code_postal2 = c(" 6180"," 6120"," 2930"," 1420"," 1030"," 3271"," 1200"),travail_rue3 = c("","de la Gare                      ","De Vrièrestraat                 ","rue Léon Théodor                ","RUE XAVIER DE BUE               ","Wilderenlaan                    ","Avenue WInston Churchill        "),travail_code_postal3 = c(""," 4020"," 8301"," 1090"," 3803"," 1180"
),travail_rue4 = c("","avenue de la Basilique          ","BV LAmberMONT                   ","Gyzevennestraat                 ","Avenue Winston Churchill        "),travail_code_postal4 = c(""," 1081"," 3560"," 1180"),travail_rue5 = c("","Molenveldstraat                 ","avenue hippocrate               "),travail_code_postal5 = c(""," 3500"," 1200")),row.names = c(NA,-10L),class = c("tbl_df","tbl","data.frame"))

我想做的是将每个人的行数相乘以显示不同行上但在相同字段中的不同地址。例如,如果个人有 3 个地址,则为同一个人创建 3 行,保留个人变量,但将地址重新组织为具有相同名称的列:以下示例中的“travail_rue_total”和“travail_code_postal_total”。如果个人有 1 个地址,则创建一行,如果他有 5 个地址,则创建 5 行,以此类推:

   INAMI_key code_qualif n_adresses travail_rue_total                travail_code_postal_total
 1  30000120           1          2 RUE VAN ARTEVELDE                                     1000
 2  30000120           1          2 paul pastur                                           6180
 3  30000417           1          1 av Margueritr depasse                                 6060
 4  37603435           7          1 du Grand Veneur                                       1170
 5  38300152           7          2 RUE WAUTERS                92-94                      6040
 6  38300152           7          2 de châtelet                                           6120
 7  38707849           1          3 de Campine                                            4000
 8  38707849           1          3 de Chestret                                           4000
 9  38707849           1          3 de la Gare                                            4020
10  38813856           1          3 Torhoutste steenweg                                   8400
11  38813856           1          3 Lage Kaart                                            2930
12  38813856           1          3 De Vrièrestraat                                       8301
13  38811084           1          4 chaussée de Waterloo                                  1180
14  38811084           1          4 avenue Napoléon                                       1420
15  38811084           1          4 rue Léon Théodor                                      1090
16  38811084           1          4 avenue de la Basilique                                1081
17  39105054           1          4 EMILE  CLAUS                                          1050
18  39105054           1          4 RUE DU FOYER SCHAERBEEKOIS                            1030
19  39105054           1          4 RUE XAVIER DE BUE                                     1180
20  39105054           1          4 BV LAmberMONT                                         1030
21  39117031           1          5 KERKSTRAAT                                            3850
22  39117031           1          5 Pater Richard van de Wouwerstraa                      3271
23  39117031           1          5 Wilderenlaan                                          3803
24  39117031           1          5 Gyzevennestraat                                       3560
25  39117031           1          5 Molenveldstraat                                       3500
26  31823918          70          5 Route de l'Etat                                       1380
27  31823918          70          5 Avenue Paul Hymans                                    1200
28  31823918          70          5 Avenue WInston Churchill                              1180
29  31823918          70          5 Avenue Winston Churchill                              1180
30  31823918          70          5 avenue hippocrate                                     1200

这是数据的简化版本。在整个数据库中,我有 40 个地址 x 15 个变量(街道、号码、城市、邮政编码、机构......)。

谢谢!

解决方法

您可以使用 pivot_longer 获取长格式数据,使用 filter 删除空值。

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = starts_with('travail'),names_to = '.value',names_pattern = 'travail_(.*?)\\d+') %>%
  filter(rue != '')

#  INAMI_key code_qualif n_adresses rue                                code_postal
#   <chr>     <chr>       <chr>      <chr>                              <chr>      
# 1 30000120  001         02         "RUE VAN ARTEVELDE               " " 1000"    
# 2 30000120  001         02         "paul pastur                     " " 6180"    
# 3 30000417  001         01         "av Margueritr depasse           " " 6060"    
# 4 37603435  007         01         "du Grand Veneur                 " " 1170"    
# 5 38300152  007         02         "RUE WAUTERS                92-94" " 6040"    
# 6 38300152  007         02         "de châtelet                     " " 6120"    
# 7 38707849  001         03         "de Campine                      " " 4000"    
# 8 38707849  001         03         "de Chestret                     " " 4000"    
# 9 38707849  001         03         "de la Gare                      " " 4020"    
#10 38813856  001         03         "Torhoutste steenweg             " " 8400"    
# … with 20 more rows

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。