微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

根据匹配日期检索值

如何解决根据匹配日期检索值

我有两个数据框。第一个包含具有相应开始和结束时间的事件。第二个包含每分钟不同 ID 的价格。往下看:

Event                       starttime             endtime
Change in Nonfarm Payrolls  2020-03-06 08:15:00   2020-03-06 09:00:00
Change in Nonfarm Payrolls  2020-02-07 08:15:00   2020-02-07 09:00:00
Change in Nonfarm Payrolls  2020-01-10 08:15:00   2020-01-10 09:00:00
Change in Nonfarm Payrolls  2020-01-10 08:15:00   2020-01-10 09:00:00
Price    date_time             ID
24813    2020-03-06 08:14:00   DJ
24763    2020-03-06 08:15:00   DJ
24750    2020-03-06 08:16:00   DJ
24725    2020-03-06 08:17:00   DJ

我想从第二个数据集(开始时间和结束时间)中获取价格和 ID,并将其添加到第一个数据集中。我试过像这样使用 ifelse 但它不起作用。

df1$startprice <- ifelse(df1$starttime == df2$date_time,df2$Price,"no")

有人可以帮我吗?

重现数据:(对于第一个事件,包括开始和结束时间)

df1 <- structure(list(Event = structure(c(1L,1L,1L),.Label = c("Change in Nonfarm Payrolls"),class = "factor"),starttime = structure(c(1583478900,1581059700,1578640500,1581059700),class = c("POSIXct","POSIXt"),tzone = ""),endtime = structure(c(1583481600,1581062400,1578643200,1581062400),tzone = "")),row.names = c(NA,5L),class = "data.frame")
df2 <- structure(list(Price = c(24813,24763,24750,24725,24746,24735,24755,24744,24762,24773,24778,24832,24856,24845,24842,24902,24934,24854,24888,24914,24922,24875,24896,24853,24834,24886,24872,24844,24846,24860,24812,24791,24767,24765,24756,24745,24800,24789,24787,24887,24876,24911),date_time = structure(c(1583478840,1583478900,1583478960,1583479020,1583479080,1583479140,1583479200,1583479260,1583479320,1583479380,1583479440,1583479500,1583479560,1583479620,1583479680,1583479740,1583479800,1583479860,1583479920,1583479980,1583480040,1583480100,1583480160,1583480220,1583480280,1583480340,1583480400,1583480460,1583480520,1583480580,1583480640,1583480700,1583480760,1583480820,1583480880,1583480940,1583481000,1583481060,1583481120,1583481180,1583481240,1583481300,1583481360,1583481420,1583481480,1583481540,1583481600,1583481660),ID = c("DJ","DJ","DJ")),row.names = 62835:62882,class = "data.frame")

提前致谢! 亲切的问候, 于尔根

解决方法

我假设您尝试通过将第二个数据集的 Price 与 {{1} } 第一个数据集。

在这种情况下,可以使用 dplyr 的 ID 来实现:

date_time

输出:

starttime

更新:
您想获得 left_join 处的 library(dplyr) df1 %>% left_join(df2,by = c('starttime' = 'date_time')) Event starttime endtime Price ID 1 Change in Nonfarm Payrolls 2020-03-06 15:15:00 2020-03-06 16:00:00 24763 DJ 2 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA <NA> 3 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA <NA> 4 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA <NA> 5 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA <NA> Price

您可以通过管道将另一个 starttime 连接到前面的代码,这次链接 df1 的 Price 而不是 endtime

left_join

endtime 的输出:

starttime

开始价格和结束价格分别命名为 combinedPrice <- df1 %>% left_join(df2,by = c('starttime' = 'date_time')) %>% left_join(df2,by = c('endtime' = 'date_time')) combinedPrice。此外,我们有 2 个 Event starttime endtime Price.x ID.x Price.y ID.y 1 Change in Nonfarm Payrolls 2020-03-06 15:15:00 2020-03-06 16:00:00 24763 DJ 24876 DJ 2 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA <NA> NA <NA> 3 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA <NA> NA <NA> 4 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA <NA> NA <NA> 5 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA <NA> NA <NA> 列作为连接的结果。我们可以重命名价格列并删除 1 个 ID 列,如下所示:

Price.x

输出:

Price.y
,

您可以先用 configs = {"fs.azure.account.auth.type": "OAuth","fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider","fs.azure.account.oauth2.client.id": "<application-id>","fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"} dbutils.fs.mount( source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",mount_point = "/mnt/<mount-name>",extra_configs = configs) wasbs 两次,然后用 mergestarttime

endtime

如果您想在最终输出中保留 merge(df1,transform(df2,start_time_price = Price)[-1],by.x = 'starttime',by.y = 'date_time') |> merge(transform(df2,end_time_price = Price)[-1],by.x = c('ID','endtime'),by.y = c('ID','date_time')) 的所有行,请使用 df1 中的 all.x = TRUE。如果您使用旧版本的 R,管道运算符 (merge) 已在 R 4.1 中引入 -

|>

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。