如何解决如何在两个日期之间填充日期
这是我当前的数据框的样子:
df <- data.frame(name = c("A","A","B","B")),start_date = c("2020-01-23","2019-10-15","2019-07-28","2020-03-15","2019-04-23")),end_date = c("2020-05-15","2020-01-27","2019-10-17","2020-07-25","2020-02-13")),value = c(8.1,3.3,9.1,9.4,15.3)))
name start_date end_date value
A 2020-01-23 2020-05-15 8
A 2019-10-15 2020-01-27 3
A 2019-07-28 2019-10-17 9
B 2020-03-15 2020-07-25 9
B 2019-04-23 2020-02-13 15
日期在 POSIXct 中,不一定是连续的,并且可以重叠。
我希望我的输出数据框看起来像这样:
name date value
A 2020-01-23 8.1
A 2020-01-24 8.1
A ... 8.1
A 2020-05-14 8.1
A 2020-05-15 8.1
A 2019-10-15 3.3
A 2019-10-16 3.3
A ... 3.3
A 2020-01-26 3.3
A 2020-01-27 3.3
A 2019-07-28 9.1
A 2019-07-29 9.1
A ... 9.1
A 2019-10-16 9.1
A 2019-10-17 9.1
B 2020-03-15 9.4
B 2020-03-16 9.4
B ... 9.4
B 2020-07-24 9.4
B 2020-07-25 9.4
B 2019-04-23 15.3
B 2019-04-24 15.3
B ... 15.3
B 2020-02-12 15.3
B 2020-02-13 15.3
这是我一直在尝试的:
library(data.table)
setDT(df) [,.(date = seq(as.Date(start_date),as.Date(end_date),by = "day")),by = end_date]
但我一直收到以下错误:
Error in seq.Date(as.Date(start_date),by = "day") :
'from' must be of length 1
我该怎么做?如果它们工作得更好,我愿意使用其他包而不是 data.table。
解决方法
这里,我们可能需要使用 by
作为行序列
library(data.table)
setDT(df)[,.(date = seq(as.Date(start_date),as.Date(end_date),by = 'day')),.(rn = seq_len(nrow(df)),name,value)][,rn := NULL][]
或者通过循环'start_date'、'end_date'的相应元素来创建一个list
列,以在Map
中创建一个日期序列,然后在unnest
中创建list
library(tidyr)
library(magrittr)
setDT(df)[,.(name,date = Map(seq,MoreArgs = list(by = '1 day'),as.Date(start_date),as.Date(end_date)),value)] %>%
unnest(date)
# A tibble: 731 x 3
# name date value
# <chr> <date> <dbl>
# 1 A 2020-01-23 8.1
# 2 A 2020-01-24 8.1
# 3 A 2020-01-25 8.1
# 4 A 2020-01-26 8.1
# 5 A 2020-01-27 8.1
# 6 A 2020-01-28 8.1
# 7 A 2020-01-29 8.1
# 8 A 2020-01-30 8.1
# 9 A 2020-01-31 8.1
#10 A 2020-02-01 8.1
# … with 721 more rows
,
另一种使用 purrr
df <- data.frame(name = c("A","A","B","B"),start_date = c("2020-01-23","2019-10-15","2019-07-28","2020-03-15","2019-04-23"),end_date = c("2020-05-15","2020-01-27","2019-10-17","2020-07-25","2020-02-13"),value = c(8.1,3.3,9.1,9.4,15.3))
library(dplyr)
library(purrr)
# function take in the name,start,end,value and generate a df fill as wanted
generate_fill <- function(name,value) {
tibble(name = name,date = seq(as.Date(start),as.Date(end),by = "1 day"),value = value)
}
# Map the function to original df and combine the result
bind_rows(
pmap(list(df[["name"]],df[["start_date"]],df[["end_date"]],df[["value"]]),generate_fill))
输出
# A tibble: 731 x 3
name date value
<chr> <date> <dbl>
1 A 2020-01-23 8.1
2 A 2020-01-24 8.1
3 A 2020-01-25 8.1
4 A 2020-01-26 8.1
5 A 2020-01-27 8.1
6 A 2020-01-28 8.1
7 A 2020-01-29 8.1
8 A 2020-01-30 8.1
9 A 2020-01-31 8.1
10 A 2020-02-01 8.1
# … with 721 more rows
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。