如何解决根据R中的模式删除观察值 更新
我有一个观察足球受伤情况的数据框。不幸的是,每种伤势我都有几支球队可供选择。这是数据框的一部分:
df_x = data.frame(injury_id=c(250,250,100,328,329,330,15,5106,5106),player_id=c(109,109,39728,2374,26,59016,59016),season=c(2011,2011,2010,2012,2012),inury_from=c("2011-09-13","2011-09-13","2011-03-03","2011-04-21","2010-11-23","2010-10-01","2011-02-24","2012-09-16","2012-09-16"),injury_until=c("2011-09-27","2011-09-27","2011-03-17","2011-08-31","2011-03-14","2010-11-22","2011-02-28","2012-10-28","2012-10-28"),team_id=c(1,2,3,4,5,6,7,8,9),member_since=c("1998-07-01",NA,"2009-07-01","2008-07-01","2002-07-01","2012-07-01","2013-01-01","2011-07-01"))
我的目标是每个伤病编号仅一行。结果应显示以下数据框:
df_result_x = data.frame(injury_id=c(250,7),"2012-07-01"))
为具有多个伤亡编号的观察选择的算法:
- 删除在member_since上具有NA的行。
- 删除所有member_since晚于harmony_until的行。
- 如果保留重复的观测值,请在member_since中选择日期较晚的观测值。
我可以通过管道执行此操作还是必须使用循环?
谢谢。
更新11-10-2020:
df_x2 = data.frame(injury_id=c(250,9,"2011-07-01","2012-12-31"))
解决方法
按'injury_id'分组后,我们可以使用slice
library(dplyr)
df_x %>%
group_by(injury_id) %>%
slice(1) %>%
ungroup
或带有distinct
df_x %>%
distinct(injury_id,.keep_all = TRUE)
或者,如果NA
元素的顺序不正确,请在'injury_id'上执行arrange
,然后在'member_since'中基于NA元素的逻辑矢量进行处理(这样NA会是最后一个)和Date
转换后的“ member_since”,然后使用distinct
根据“ injury_id”列选择第一个唯一行
df_x %>%
arrange(injury_id,is.na(member_since),as.Date(member_since)) %>%
distinct(injury_id,.keep_all = TRUE)
更新
基于评论
df_x %>%
filter(!is.na(member_since)) %>%
mutate(injury_until = as.Date(injury_until),member_since = as.Date(member_since)) %>%
mutate(ind = injury_until - member_since) %>%
group_by(injury_id) %>%
filter(ind == min(ind[ind > 0])) %>%
select(-ind)
-输出
# A tibble: 7 x 7
# Groups: injury_id [7]
# injury_id player_id season inury_from injury_until team_id member_since
# <dbl> <dbl> <dbl> <chr> <date> <dbl> <date>
#1 250 109 2011 2011-09-13 2011-09-27 1 1998-07-01
#2 100 39728 2010 2011-03-03 2011-03-17 3 2009-07-01
#3 328 2374 2010 2011-04-21 2011-08-31 4 2008-07-01
#4 329 2374 2010 2010-11-23 2011-03-14 4 2008-07-01
#5 330 2374 2010 2010-10-01 2010-11-22 4 2008-07-01
#6 15 26 2010 2011-02-24 2011-02-28 6 2002-07-01
#7 5106 59016 2012 2012-09-16 2012-10-28 7 2012-07-01
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。