如何解决R:面板数据的移动平均公式
示例:
date code_ID name_ID new_value
2021-03-10T17:00:00 13 Alpha 372
2021-03-11T17:00:00 13 Alpha 608
2021-03-12T17:00:00 13 Alpha 515
2021-03-13T17:00:00 13 Alpha 320
2021-03-14T17:00:00 13 Alpha 323
2021-03-15T17:00:00 13 Alpha 329
2021-03-16T17:00:00 13 Alpha 212
2021-03-17T17:00:00 13 Alpha 304
2021-03-18T17:00:00 13 Alpha 462
2021-03-10T17:00:00 17 Beta 115
2021-03-11T17:00:00 17 Beta 151
2021-03-12T17:00:00 17 Beta 141
2021-03-13T17:00:00 17 Beta 137
2021-03-14T17:00:00 17 Beta 106
2021-03-15T17:00:00 17 Beta 67
2021-03-16T17:00:00 17 Beta 166
2021-03-17T17:00:00 17 Beta 126
2021-03-18T17:00:00 17 Beta 179
2021-03-10T17:00:00 8 eta-firm 2155
2021-03-11T17:00:00 8 eta-firm 2845
2021-03-12T17:00:00 8 eta-firm 3477
2021-03-13T17:00:00 8 eta-firm 2950
2021-03-14T17:00:00 8 eta-firm 3023
2021-03-15T17:00:00 8 eta-firm 2822
2021-03-16T17:00:00 8 eta-firm 2184
2021-03-17T17:00:00 8 eta-firm 2026
2021-03-18T17:00:00 8 eta-firm 2531
2021-03-10T17:00:00 6 phi hotel 866
2021-03-11T17:00:00 6 phi hotel 991
2021-03-12T17:00:00 6 phi hotel 971
2021-03-13T17:00:00 6 phi hotel 953
2021-03-14T17:00:00 6 phi hotel 604
2021-03-15T17:00:00 6 phi hotel 398
2021-03-16T17:00:00 6 phi hotel 672
2021-03-17T17:00:00 6 phi hotel 986
2021-03-18T17:00:00 6 phi hotel 1058
我如何制作一个计算移动平均线的公式,通过 code_ID 和日期,可能使用 R 的基本包?
公式为(乳胶格式):
{\hat{y}_{t}} = \frac{y_{t-6} + y_{t-5} + y_{t-4} + y_{t-3} + y_{t-2} + y_{t-1} + y_{t}}{7}
解决方法
这个怎么样:
dat <- tibble::tribble(~date,~code_ID,~name_ID,~new_value,"2021-03-10 17:00:00",13,"Alpha",372,"2021-03-11 17:00:00",608,"2021-03-12 17:00:00",515,"2021-03-13 17:00:00",320,"2021-03-14 17:00:00",323,"2021-03-15 17:00:00",329,"2021-03-16 17:00:00",212,"2021-03-17 17:00:00",304,"2021-03-18 17:00:00",462,17,"Beta",115,151,141,137,106,67,166,126,179,8,"eta-firm",2155,2845,3477,2950,3023,2822,2184,2026,2531,6,"phi hotel",866,991,971,953,604,398,672,986,1058)
dat$date <- anytime::anytime(dat$date)
id <- dat$code_ID
s <- split(dat,id)
l <- lapply(s,function(x)cbind(x,ma=rowMeans(sapply(0:6,function(t)lag(x$new_value,t)))))
out <- do.call(bind_rows,l)
out
# date code_ID name_ID new_value ma
# 1 2021-03-10 17:00:00 6 phi hotel 866 NA
# 2 2021-03-11 17:00:00 6 phi hotel 991 NA
# 3 2021-03-12 17:00:00 6 phi hotel 971 NA
# 4 2021-03-13 17:00:00 6 phi hotel 953 NA
# 5 2021-03-14 17:00:00 6 phi hotel 604 NA
# 6 2021-03-15 17:00:00 6 phi hotel 398 NA
# 7 2021-03-16 17:00:00 6 phi hotel 672 779.2857
# 8 2021-03-17 17:00:00 6 phi hotel 986 796.4286
# 9 2021-03-18 17:00:00 6 phi hotel 1058 806.0000
# 10 2021-03-10 17:00:00 8 eta-firm 2155 NA
# 11 2021-03-11 17:00:00 8 eta-firm 2845 NA
# 12 2021-03-12 17:00:00 8 eta-firm 3477 NA
# 13 2021-03-13 17:00:00 8 eta-firm 2950 NA
# 14 2021-03-14 17:00:00 8 eta-firm 3023 NA
# 15 2021-03-15 17:00:00 8 eta-firm 2822 NA
# 16 2021-03-16 17:00:00 8 eta-firm 2184 2779.4286
# 17 2021-03-17 17:00:00 8 eta-firm 2026 2761.0000
# 18 2021-03-18 17:00:00 8 eta-firm 2531 2716.1429
# 19 2021-03-10 17:00:00 13 Alpha 372 NA
# 20 2021-03-11 17:00:00 13 Alpha 608 NA
# 21 2021-03-12 17:00:00 13 Alpha 515 NA
# 22 2021-03-13 17:00:00 13 Alpha 320 NA
# 23 2021-03-14 17:00:00 13 Alpha 323 NA
# 24 2021-03-15 17:00:00 13 Alpha 329 NA
# 25 2021-03-16 17:00:00 13 Alpha 212 382.7143
# 26 2021-03-17 17:00:00 13 Alpha 304 373.0000
# 27 2021-03-18 17:00:00 13 Alpha 462 352.1429
# 28 2021-03-10 17:00:00 17 Beta 115 NA
# 29 2021-03-11 17:00:00 17 Beta 151 NA
# 30 2021-03-12 17:00:00 17 Beta 141 NA
# 31 2021-03-13 17:00:00 17 Beta 137 NA
# 32 2021-03-14 17:00:00 17 Beta 106 NA
# 33 2021-03-15 17:00:00 17 Beta 67 NA
# 34 2021-03-16 17:00:00 17 Beta 166 126.1429
# 35 2021-03-17 17:00:00 17 Beta 126 127.7143
# 36 2021-03-18 17:00:00 17 Beta 179 131.7143
以上是基本的R解决方案。如果你愿意使用 dplyr
和 zoo
,你可以这样做:
dat %>%
group_by(code_ID) %>%
mutate(ma = zoo::rollmean(new_value,k=7,fill=NA,align="right"))
,
这里有两种使用包 rollmeanr
中的函数 zoo
的方法。
第一个不会将输出分配给新列,第二个会。
library(zoo)
by(df1$new_value,df1$code_ID,function(x)
rollmeanr(x,k = 7,fill = NA)
)
df1$mean6 <- with(df1,ave(new_value,code_ID,FUN = function(x) rollmeanr(x,fill = NA)))
head(df1,10)
# date code_ID name_ID new_value mean6
#1 2021-03-10 17:00:00 13 Alpha 372 NA
#2 2021-03-11 17:00:00 13 Alpha 608 NA
#3 2021-03-12 17:00:00 13 Alpha 515 NA
#4 2021-03-13 17:00:00 13 Alpha 320 NA
#5 2021-03-14 17:00:00 13 Alpha 323 NA
#6 2021-03-15 17:00:00 13 Alpha 329 NA
#7 2021-03-16 17:00:00 13 Alpha 212 382.7143
#8 2021-03-17 17:00:00 13 Alpha 304 373.0000
#9 2021-03-18 17:00:00 13 Alpha 462 352.1429
#10 2021-03-10 17:00:00 17 Beta 115 NA
dput
格式的数据。
df1 <-
structure(list(date = structure(c(1615395600,1615482000,1615568400,1615654800,1615741200,1615827600,1615914000,1616000400,1616086800,1615395600,1616086800),class = c("POSIXct","POSIXt"
),tzone = ""),code_ID = c(13L,13L,17L,8L,6L,6L
),name_ID = c("Alpha","phi hotel"),new_value = c(372L,608L,515L,320L,323L,329L,212L,304L,462L,115L,151L,141L,137L,106L,67L,166L,126L,179L,2155L,2845L,3477L,2950L,3023L,2822L,2184L,2026L,2531L,866L,991L,971L,953L,604L,398L,672L,986L,1058L),mean6 = c(NA,NA,382.714285714286,373,352.142857142857,126.142857142857,127.714285714286,131.714285714286,2779.42857142857,2761,2716.14285714286,779.285714285714,796.428571428571,806)),row.names = c(NA,-36L),class = "data.frame")
,
如果您愿意使用包,这会稍微容易一些,但由于问题要求使用基础 R,因此仅使用我们有以下内容。 ave
通过code_ID对new_value应用roll,roll是通过embed或者filter或者cumsum和diff结合的row方式实现的
roll <- function(x,n = 7) c(rep(NA,n-1),rowMeans(embed(x,n)))
dat2 <- transform(dat,mean7 = ave(new_value,FUN = roll))
或使用以下选项之一进行滚动:
roll2 <- function(x,n = 7) stats::filter(x,rep(1,n) / n,sides = 1)
roll3 <- function(x,diff(cumsum(c(0,x)),n)/n)
roll4 <- function(x,apply(embed(x,n),1,mean))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。