如何解决R 将一列中的日期与多列中的日期进行比较
我有一个包含 8 列的数据集,8 列中有 7 列是日期列。我需要比较一列中的日期说 ReferenceDate
与其他列中的日期说 DateCol1,DateCol2,DateCol3,DateCol4,DateCol5,DateCol6,DateCol7
ID DateCol1 DateCol2 DateCol3 DateCol4 ReferenceDate DateCol5 DateCol6 DateCol7
12 2000-11-03 2007-05-17 2003-07-11 2014-03-19 2000-07-11 1999-10-06 2015-06-29 2014-07-06
17 2015-12-16 2017-07-26 2015-01-13 2020-01-30 2015-03-08 2007-07-30 2020-05-21 2010-10-09
19 2003-03-06 2011-02-23 2001-09-18 2001-04-05 2013-05-17 1999-10-02 2004-08-26 2019-04-15
23 2002-10-06 2019-03-12 1999-04-19 2008-04-03 2006-11-20 2000-11-15 2010-07-22 1999-05-27
22 2019-05-19 2014-11-17 2001-03-24 2003-07-03 2001-04-02 2017-06-03 2016-09-21 2013-07-13
我的目标是创建一个列 Yes/No
值,用于指示 ReferenceDate
列中的日期是早于还是晚于其他 7 个日期列中的任何日期。
ID DateCol1 DateCol2 DateCol3 DateCol4 ReferenceDate DateCol5 DateCol6 DateCol7 Status
12 2000-11-03 2007-05-17 2003-07-11 2014-03-19 2000-07-11 1999-10-06 2015-06-29 2014-07-06 Yes (DateCol5 earlier than Reference Date)
17 2015-12-16 2017-07-26 2015-01-13 2020-01-30 2015-03-08 2007-07-30 2020-05-21 2010-10-09 Yes (DateCol5 earlier than Reference Date)
19 2003-03-06 1981-02-23 2001-09-18 2001-04-05 2013-05-17 1999-10-02 2004-08-26 2019-04-15 Yes (DateCol2 earlier than Reference Date)
23 1992-10-06 2019-03-12 1999-04-19 2008-04-03 2006-11-20 2000-11-15 2010-07-22 1999-05-27 Yes (DateCol1 earlier than Reference Date)
22 2019-05-19 2014-11-17 2001-03-24 2003-07-03 2001-04-02 2017-06-03 2016-09-21 2013-07-13 No
我想我可以使用很多嵌套的 ifelse 来做到这一点,但我会发疯。我需要一些帮助来更有效地完成这项工作。提前致谢。
解决方法
也许这可能会有所帮助
df$Status <- ifelse(rowSums(sapply(df[-1],`<`,df$ReferenceDate)) > 0,"Yes","No")
给出(不知道为什么最后一行所需输出给出“否”)
> df
ID DateCol1 DateCol2 DateCol3 DateCol4 ReferenceDate DateCol5
1 12 2000-11-03 2007-05-17 2003-07-11 2014-03-19 2000-07-11 1999-10-06
2 17 2015-12-16 2017-07-26 2015-01-13 2020-01-30 2015-03-08 2007-07-30
3 19 2003-03-06 2011-02-23 2001-09-18 2001-04-05 2013-05-17 1999-10-02
4 23 2002-10-06 2019-03-12 1999-04-19 2008-04-03 2006-11-20 2000-11-15
5 22 2003-05-19 2014-11-17 2001-03-24 2003-07-03 2014-04-02 2017-06-03
DateCol6 DateCol7 Status
1 2015-06-29 2014-07-06 Yes
2 2020-05-21 2010-10-09 Yes
3 2004-08-26 2019-04-15 Yes
4 2010-07-22 1999-05-27 Yes
5 2016-09-21 2013-07-13 Yes
数据
> dput(df)
structure(list(ID = c(12L,17L,19L,23L,22L),DateCol1 = structure(c(11264,16785,12117,11966,12191),class = "Date"),DateCol2 = structure(c(13650,17373,15028,17967,16391),DateCol3 = structure(c(12244,16448,11583,10700,11405),DateCol4 = structure(c(16148,18291,11417,13972,12236),ReferenceDate = structure(c(11149,16502,15842,13472,16162),DateCol5 = structure(c(10870,13724,10866,11276,17320),DateCol6 = structure(c(16615,18403,12656,14812,17065),DateCol7 = structure(c(16257,14891,18001,10738,15899),class = "Date")),row.names = c(NA,-5L),class = "data.frame")
,
带有 rowSums
的选项是 select
'Date' 列,与 'ReferenceDate' 列进行比较,检查 rowSums
输出是否不等于 0,转换逻辑到数字索引(加 1)并使用它来用“是”、“否”替换值
nm1 <- grep('^DateCol',names(df1),value = TRUE)
或者如果列名不是 'DateCol' 作为模式,可能是
nm1 <- setdiff(names(df1),c("ID","ReferenceDate"))
df1$flag <- c("No","Yes")[(rowSums(df1[nm1] > df1$ReferenceDate) != 0) + 1]
,
使用dplyr
的{{1}}:
rowwise
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。