微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

在对某些值进行操作时通过重叠时间段加入

如何解决在对某些值进行操作时通过重叠时间段加入

我正在尝试加入这样一个时期的数据库

id = c(rep(1,3),rep(2,rep(3,3))
start = as.Date(c("2014-07-01","2015-03-12","2016-08-13","2014-07-01","2016-08-13"))
end = as.Date(c("2015-03-11","2015-08-12","2018-12-31","2015-03-11","2018-12-31"))

DT = data.table(id,start,end)

DT

   id      start        end
1:  1 2014-07-01 2015-03-11
2:  1 2015-03-12 2015-08-12
3:  1 2016-08-13 2018-12-31
4:  2 2014-07-01 2015-03-11
5:  2 2015-03-12 2015-08-12
6:  2 2016-08-13 2018-12-31
7:  3 2014-07-01 2015-03-11
8:  3 2015-03-12 2015-08-12
9:  3 2016-08-13 2018-12-31

一个像这样的临床登记(体重和身高):

id_clin = (c(rep(1,2),rep (2,4)))
date = as.Date(c("2014-10-23","2016-09-01","2017-01-01","2014-08-01","2015-02-01","2017-06-01","2018-03-05","2018-09-01","2018-11-30"))
weight = c(60,65,62,75,68,90,102,104,98 )
height = c(160,160,170,175,200,200)

DT_clin = data.table(id_clin,date,weight,height)

DT_clin

   id_clin       date weight height
1:       1 2014-10-23     60    160
2:       1 2016-09-01     65    160
3:       2 2017-01-01     62    170
4:       2 2014-08-01     75    175
5:       2 2015-02-01     68    170
6:       3 2017-06-01     90    200
7:       3 2018-03-05    102    200
8:       3 2018-09-01    104    200
9:       3 2018-11-30     98    200
  • 当某个 id 的临床测量 (DT_clin) 的注册表位于同一 id 的句点 (DT) 的开始和结束之间时,必须连接注册表的值。
  • 如果 DT 周期之间的 DT_clin 中没有值,则无需连接任何内容
  • 如果 DT 周期之间存在多个值,我想计算重叠值的平均值。

期望的结果看起来像这样*:

   id      start        end       date       date2       weight       height
1:  1 2014-07-01 2015-03-11 2014-10-23  2014-10-23         60.0        160.0
2:  1 2015-03-12 2015-08-12       <NA>        <NA>           NA           NA
3:  1 2016-08-13 2018-12-31 2016-09-01  2016-09-01         65.0        160.0
4:  2 2014-07-01 2015-03-11 2014-08-01  2015-02-01         71.5        172.5
5:  2 2015-03-12 2015-08-12       <NA>        <NA>           NA           NA
6:  2 2016-08-13 2018-12-31 2017-01-01  2017-01-01         62.0        170.0
7:  3 2014-07-01 2015-03-11       <NA>        <NA>           NA           NA
8:  3 2015-03-12 2015-08-12       <NA>        <NA>           NA           NA
9:  3 2016-08-13 2018-12-31 2018-03-05  2018-11-30        101.3        200.0

另外,如果有一种方法可以对不同的变量进行多个操作,我也会有兴趣知道一种方法。 (例如,在我进行连接的同时计算重量的平均值和高度的最大值)

当只有一个值时,我测试了 foverlaps 并取得了良好的结果,但是当有多个值重叠时,我无法实现我的目标:

setkey(DT,id,end)
setkey(DT_clin,id_clin,date2)

foverlaps(DT[id == "1",],DT_clin[id == "1",by.x =c("id","start","end"),by.y = c("id_clin","date","date2" ),nomatch = NA )

我应该使用非等值联接吗?

在此先感谢您的帮助:)

*我复制了 date 来创建 date2 并伪造了一个时间间隔

解决方法

使用非对等连接,然后按 id、开始和结束进行汇总

ans <- DT_clin[DT,on = .(date >= start,date <= end,id_clin = id)]
ans[,.(date   = min(date2),date2  = max(date2),weight = mean(weight),height = mean(height)),by = .(id = id_clin,start = date,end = date.1)]

#    id      start        end       date      date2 weight height
# 1:  1 2014-07-01 2015-03-11 2014-10-23 2014-10-23   60.0  160.0
# 2:  1 2015-03-12 2015-08-12       <NA>       <NA>     NA     NA
# 3:  1 2016-08-13 2018-12-31 2016-09-01 2016-09-01   65.0  160.0
# 4:  2 2014-07-01 2015-03-11 2014-08-01 2015-02-01   71.5  172.5
# 5:  2 2015-03-12 2015-08-12       <NA>       <NA>     NA     NA
# 6:  2 2016-08-13 2018-12-31 2017-01-01 2017-01-01   62.0  170.0
# 7:  3 2014-07-01 2015-03-11       <NA>       <NA>     NA     NA
# 8:  3 2015-03-12 2015-08-12       <NA>       <NA>     NA     NA
# 9:  3 2016-08-13 2018-12-31 2017-06-01 2018-11-30   98.5  200.0
,

使用foverlaps

library(data.table)
setkey(DT_clin,id_clin,date,date2)

foverlaps(DT,DT_clin,by.x =c("id","start","end"),by.y = c("id_clin","date","date2" ),nomatch = NA )[,.(datemin = min(date),datemax = max(date),weight  = mean(weight,na.r=T),height  = mean(height,na.rm=T)),by=.(id,start,end)]

   id      start        end    datemin    datemax weight height
1:  1 2014-07-01 2015-03-11 2014-10-23 2014-10-23   60.0  160.0
2:  1 2015-03-12 2015-08-12       <NA>       <NA>    NaN    NaN
3:  1 2016-08-13 2018-12-31 2016-09-01 2016-09-01   65.0  160.0
4:  2 2014-07-01 2015-03-11 2014-08-01 2015-02-01   71.5  172.5
5:  2 2015-03-12 2015-08-12       <NA>       <NA>    NaN    NaN
6:  2 2016-08-13 2018-12-31 2017-01-01 2017-01-01   62.0  170.0
7:  3 2014-07-01 2015-03-11       <NA>       <NA>    NaN    NaN
8:  3 2015-03-12 2015-08-12       <NA>       <NA>    NaN    NaN
9:  3 2016-08-13 2018-12-31 2017-06-01 2018-11-30   98.5  200.0

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。