在现有df中连续完成列表数据

如何解决在现有df中连续完成列表数据

我有这个数据框：

> df
   date         val  cday
   <date>     <dbl> <dbl>
  2019-12-01     1     NA
  2019-12-02     0     NA
  2019-12-03     1     NA
  2019-12-04     0     1
  2019-12-05     0     NA
  2019-12-06     0     NA
  2019-12-07     1     1
  2019-12-08     2     NA
  2019-12-09     3     NA
  2019-12-10     3     NA
# … with 246 more rows

我想在接下来的df$cday之前从df$cday == 1连续完成df$cday == 1到最多30，我想从所有其他NAs开始重新从1开始计数想要保留。

结果应如下所示：

> df
   date         val  cday
   <date>     <dbl> <dbl>
  2019-12-01     1     NA
  2019-12-02     0     NA
  2019-12-03     1     NA
  2019-12-04     0     1
  2019-12-05     0     2
  2019-12-06     0     3
  2019-12-07     1     1
  2019-12-08     2     2
  2019-12-09     3     3
  2019-12-10     3     4
# … with 246 more rows

对此可能有一个简单的解决方案，但我找不到任何搜索内容。我会非常感谢一些提示！

解决方法

一种方法是：

library(dplyr)

df %>%
  group_by(idx = cumsum(!is.na(cday))) %>%
  mutate(cday = case_when(!all(is.na(cday)) ~ row_number())) %>%
  ungroup %>% select(-idx)

输出（带有示例的可见部分）：

# A tibble: 10 x 3
   date         val  cday
   <fct>      <int> <int>
 1 2019-12-01     1    NA
 2 2019-12-02     0    NA
 3 2019-12-03     1    NA
 4 2019-12-04     0     1
 5 2019-12-05     0     2
 6 2019-12-06     0     3
 7 2019-12-07     1     1
 8 2019-12-08     2     2
 9 2019-12-09     3     3
10 2019-12-10     3     4

上面的代码假设您当前所有非缺失情况均为1。如果序列也可以以其他整数开头，则可以使用以下方式进行调整：

df %>%
  group_by(idx = cumsum(!is.na(cday))) %>%
  mutate(cday = case_when(!all(is.na(cday)) ~ cday[1] + (row_number() - 1))) %>%
  ungroup %>% select(-idx)

我们可以使用!=中的rowid

data.table

数据

library(dplyr)
library(data.table)
df %>% 
  mutate(cday = replace(rowid(cumsum(replace_na(cday,0))),seq_len(which.max(!is.na(cday))-1),NA))
#        date val cday
#1  2019-12-01   1   NA
#2  2019-12-02   0   NA
#3  2019-12-03   1   NA
#4  2019-12-04   0    1
#5  2019-12-05   0    2
#6  2019-12-06   0    3
#7  2019-12-07   1    1
#8  2019-12-08   2    2
#9  2019-12-09   3    3
#10 2019-12-10   3    4

在现有df中连续完成列表 数据

如何解决在现有df中连续完成列表 数据

解决方法

数据

在现有df中连续完成列表数据

如何解决在现有df中连续完成列表数据