微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

填补盘中时间序列的空缺

如何解决填补盘中时间序列的空缺

我有这个时间序列(1分钟的时间范围)

structure(list(V1 = c("01/04/2007","01/04/2007","02/04/2007","02/04/2007"),V2 = c("23:01","23:03","23:04","23:05","23:06","23:07","23:08","23:09","23:14","23:15","23:17","23:19","23:20","23:25","23:26","23:28","23:29","23:31","23:32","23:34","23:39","23:43","23:45","23:46","23:55","23:56","00:02","00:03","00:06","00:09","00:13","00:15","00:16","00:17","00:18","00:20","00:22","00:23","00:33","00:41","00:42","00:43","00:47","00:48","00:50","00:51","00:55","00:56","00:59","01:00","01:01","01:02","01:04","01:05","01:07","01:09","01:11","01:12","01:18","01:19","01:20","01:21","01:22","01:26","01:27","01:28","01:30","01:32","01:35","01:40","01:41","01:44","01:46","01:47","01:51","02:07","02:09","02:11","02:13","02:15","02:21","02:22","02:23","02:24","02:28","02:30","02:32","02:39","02:45","03:14","03:17","03:22","03:28","03:32","04:21","04:28","04:34","04:39","04:45","04:47"),V3 = c(1791,1790.5,1790.25,1789.5,1790,1789.75,1789.25,1788.75,1789,1790.75,1791,1791.5,1791.25,1792,1792.5,1792.75,1793,1793.25,1793.75,1793.5,1792.25,1793
),V4 = c(1791,1791.75,1794,1793),V5 = c(1790.75,V6 = c(1790.75,V7 = c(11L,3L,6L,4L,5L,1L,2L,8L,9L,11L,7L,14L,20L,15L,26L,33L,25L,50L,10L,12L,56L,21L,1L)),row.names = c(NA,100L),class = "data.frame")

您可以看到其中缺少一些值。 例如,在2007年1月4日23:26到2007年1月4日23:28之间,我们错过了2007年1月4日23:27

我想要的只是添加一个时间为23:27的行,所有其他列的前一行的值相同

换句话说,每天应该准确地有60(分钟)* 24(小时)= 1440行 从00:00到23:59

解决方法

合并V1V2以创建日期时间,使用complete包括缺少的分钟数,并使用fill填充新行中的前几行值。

library(dplyr)
library(tidyr)

df %>%
  unite(datetime,V1,V2) %>%
  mutate(datetime = lubridate::dmy_hm(datetime)) %>%
  complete(datetime = seq(min(datetime),max(datetime),by = 'min')) %>%
  fill(everything()) %>%
  mutate(V1 = format(datetime,"%d/%m/%Y"),V2 = format(datetime,'%H:%M')) %>%
  select(-datetime)

#     V3    V4    V5    V6    V7 V1         V2   
#   <dbl> <dbl> <dbl> <dbl> <int> <chr>      <chr>
# 1 1791  1791  1791. 1791.    11 01/04/2007 23:01
# 2 1791  1791  1791. 1791.    11 01/04/2007 23:02
# 3 1790. 1790. 1790. 1790.     3 01/04/2007 23:03
# 4 1790. 1790. 1790. 1790.     6 01/04/2007 23:04
# 5 1790. 1790. 1790. 1790.     4 01/04/2007 23:05
# 6 1790  1790. 1790  1790.     5 01/04/2007 23:06
# 7 1790. 1790. 1790. 1790.     1 01/04/2007 23:07
# 8 1790. 1790. 1790. 1790.     2 01/04/2007 23:08
# 9 1790  1790  1790  1790      2 01/04/2007 23:09
#10 1790  1790  1790  1790      2 01/04/2007 23:10
# … with 337 more rows
,

假设输入数据帧为tdf,我们将其转换为动物园对象z,并创建日期/时间的所需范围rng。在分钟内输入mins,然后将其与z合并返回zz。最后,将其转换回数据帧tdf2

library(zoo)

z <- read.zoo(tdf,index = 1:2,tz = "UTC",format = "%d/%m/%Y %H:%M")
rng <- as.POSIXct(paste(range(as.Date(time(z))),c("00:00:00","23:59:00")))
mins <- seq(rng[1],rng[2],by = "min")
zz <- na.locf(merge(z,zoo(,mins),all = TRUE),na.rm = FALSE)
tdf2 <- fortify.zoo(zz)

根据您的需要,您也许可以直接使用Zoo对象zz,在这种情况下,可以省略最后一行。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。