如何解决R - 在 data.table 中查找每组的第一个非零元素
我在 R 中有一个数据表,如下所示:
toMap
名为 State City Maturing Soil 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
1: PR CityA Early SANDY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 40 40 40 40 40
2: PR CityA Early SILT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 20 20 20 20 20 20
3: PR CityA Early CLAY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 40 40 30 30 20 20 20 20 20 20
4: PR CityA Medium SANDY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 40 40 40 40 40 40
5: PR CityA Medium SILT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 30 20 20 20 20 20 20 30
6: PR CityA Medium CLAY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 40 30 30 20 20 20 20 20 20 20
7: PR CityA Late SANDY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 30 30 30 30 40 40 0
8: PR CityA Late SILT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 40 30 30 20 20 20 20 20 30 30
9: PR CityA Late CLAY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 30 20 20 20 20 20 20 20 20 20
10: PR CityB Early SANDY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 30 30 30 30 30 30
11: PR CityB Early SILT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 40 30 30 20 20 20 20 20 20 20
12: PR CityB Early CLAY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 30 20 20 20 20 20 20 20 20 20
13: PR CityB Medium SANDY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 40 30 30 30 20 20 30 30 30
14: PR CityB Medium SILT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 30 30 20 20 20 20 20 20 20 20
15: PR CityB Medium CLAY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 20 20 20 20 20 20 20 20 20 20
16: PR CityB Late SANDY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 40 30 30 20 20 20 20 20 30 40
17: PR CityB Late SILT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 20 20 20 20 20 20 20 20 20 20
18: PR CityB Late CLAY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 20 20 20 20 20 20 20 20 20 20
19: RS CityC Early SANDY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 40 40 40 0
20: RS CityC Early SILT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 40 30 30 30 30 30 40
21: RS CityC Early CLAY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 30 30 30 20 30 20 30 30
22: RS CityC Medium SANDY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 40 40 40 40 0 0
23: RS CityC Medium SILT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 30 30 30 30 30 30 0
24: RS CityC Medium CLAY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 30 30 30 20 20 20 30 40
25: RS CityC Late SANDY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 40 30 30 30 40 0 0
26: RS CityC Late SILT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 30 30 30 20 30 30 40 0
27: RS CityC Late CLAY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 20 20 20 20 20 20 30 40
28: RS CityD Early SANDY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 30 30 30 30 30 30 30 40
29: RS CityD Early SILT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 20 20 20 20 20 20 20 20 30
30: RS CityD Early CLAY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 20 20 20 20 20 20 20 20 20
31: RS CityD Medium SANDY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 30 30 30 30 20 20 30 30 0
32: RS CityD Medium SILT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 20 20 20 20 20 20 20 20 40
33: RS CityD Medium CLAY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 20 20 20 20 20 20 20 20 20
34: RS CityD Late SANDY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 30 20 20 20 20 20 30 40 0
35: RS CityD Late SILT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 20 20 20 20 20 20 20 30 0
36: RS CityD Late CLAY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 20 20 20 20 20 20 20 20 30
State City Maturing Soil 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
到 1
的列表示建议种植的一年中的十天时间段。我想为每个 36
、State
和 Maturing
提取最早的推荐种植日期。换句话说,我想为上述组提取不为 0 的前十天的列名。
对于上面的例子,预期的结果是:
Soil
我怎样才能做到这一点?
可以使用以下代码重现数据:
State Maturing Soil Earliest
PR Early SANDY 30
PR Early SILT 26
PR Early CLAY 26
PR Medium SANDY 27
PR Medium SILT 26
PR Medium CLAY 26
PR Late SANDY 26
PR Late SILT 26
PR Late CLAY 26
RS Early SANDY 28
RS Early SILT 27
RS Early CLAY 27
RS Medium SANDY 27
RS Medium SILT 27
RS Medium CLAY 27
RS Late SANDY 27
RS Late SILT 27
RS Late CLAY 27
解决方法
你可以这样做:
is.na(dat) <- dat == 0
dat[,cbind(.SD[,1:4],Earliest = dplyr::coalesce(!!!.SD[,-(1:4)]))]
State City Maturing Soil Earliest
1: PR CityA Early SANDY 40
2: PR CityA Early SILT 30
3: PR CityA Early CLAY 40
4: PR CityA Medium SANDY 40
5: PR CityA Medium SILT 40
6: PR CityA Medium CLAY 40
7: PR CityA Late SANDY 40
8: PR CityA Late SILT 40
9: PR CityA Late CLAY 30
10: PR CityB Early SANDY 40
11: PR CityB Early SILT 40
12: PR CityB Early CLAY 30
编辑:
如果您需要列号:那么您可以这样做,
dat[,Earliest = max.col(.SD[,-(1:4)]>0,ties.method = "first"))]
State City Maturing Soil Earliest
1: PR CityA Early SANDY 31
2: PR CityA Early SILT 30
3: PR CityA Early CLAY 26
4: PR CityA Medium SANDY 30
5: PR CityA Medium SILT 28
6: PR CityA Medium CLAY 26
7: PR CityA Late SANDY 29
8: PR CityA Late SILT 26
9: PR CityA Late CLAY 26
10: PR CityB Early SANDY 30
,
更新
我已修改我的代码以获得您想要的结果,但这不是您首选的 data.table
解决方案。我们没有得到的一点是,分组变量的组合并不总是唯一的,因此这基本上不是逐行操作,需要在考虑每个组中最早的日期时进行分组。我的输出唯一的问题是 Maturing
和 Soil
变量的级别顺序与输出中的顺序 obs 不同。可以修复。
library(dplyr)
library(tidyr)
library(purrr)
dat %>%
mutate(Earliest = pmap(dat %>%
select(`1`:`36`),~ names(c(...))[c(...) != 0][1])) %>%
select(-c(`1`:`36`)) %>%
unnest(cols = c(Earliest)) %>%
group_by(State,Maturing,Soil) %>%
mutate(Earliest = as.numeric(Earliest)) %>%
summarise(across(Earliest,~ min(.x))) %>%
ungroup()
# A tibble: 18 x 4
State Maturing Soil Earliest
<chr> <chr> <chr> <dbl>
1 PR Early CLAY 26
2 PR Early SANDY 30
3 PR Early SILT 26
4 PR Late CLAY 26
5 PR Late SANDY 26
6 PR Late SILT 26
7 PR Medium CLAY 26
8 PR Medium SANDY 27
9 PR Medium SILT 26
10 RS Early CLAY 27
11 RS Early SANDY 28
12 RS Early SILT 27
13 RS Late CLAY 27
14 RS Late SANDY 27
15 RS Late SILT 27
16 RS Medium CLAY 27
17 RS Medium SANDY 27
18 RS Medium SILT 27
,
受到 Anoushiravan 解决方案的启发(做得非常好),我尝试了单独使用 dyplr
和 tidyr
的解决方案,并保持 OP 所需的顺序。
这是我的解决方案(附评论):
library(dplyr)
library(tidyr)
# relevel Soil for same Output as desired
order_Soil <- c("SANDY","SILT","CLAY")
dat1 <- dat %>%
select(where(~ any(. != 0)),-City) %>% #remove all 0 columns
pivot_longer( #bring in longformat
cols = c(`26`:`36`),names_to = "Names",values_to = "Values"
) %>%
mutate(Soil = factor(Soil,#to keep the desired order
levels = order_Soil)) %>%
filter(Values != 0) %>% #remove rows with any 0
group_by(State,Soil) %>%
summarise(Earliest = min(Names)) #Summarize the Earliest
输出:
State Maturing Soil Earliest
<chr> <chr> <fct> <chr>
1 PR Early SANDY 30
2 PR Early SILT 26
3 PR Early CLAY 26
4 PR Late SANDY 26
5 PR Late SILT 26
6 PR Late CLAY 26
7 PR Medium SANDY 27
8 PR Medium SILT 26
9 PR Medium CLAY 26
10 RS Early SANDY 28
11 RS Early SILT 27
12 RS Early CLAY 27
13 RS Late SANDY 27
14 RS Late SILT 27
15 RS Late CLAY 27
16 RS Medium SANDY 27
17 RS Medium SILT 27
18 RS Medium CLAY 27
,
解决方案:
- 选择每行第一个非零元素的名称
- 按
State,Soil
分组 - 将每个组的
min
设置为Earliest
dat[,Earliest := apply(
.SD[,-(1:4)],1,function(x) as.numeric(names(which(x != 0)[1]))
)][,.(Earliest = min(Earliest)),by = .(State,Soil)]
输出:
State Maturing Soil Earliest
1: PR Early SANDY 30
2: PR Early SILT 26
3: PR Early CLAY 26
4: PR Medium SANDY 27
5: PR Medium SILT 26
6: PR Medium CLAY 26
7: PR Late SANDY 26
8: PR Late SILT 26
9: PR Late CLAY 26
10: RS Early SANDY 28
11: RS Early SILT 27
12: RS Early CLAY 27
13: RS Medium SANDY 27
14: RS Medium SILT 27
15: RS Medium CLAY 27
16: RS Late SANDY 27
17: RS Late SILT 27
18: RS Late CLAY 27
,
这是一种 data.table
方法:
dat_long = melt(
data = dat,measure.vars = as.character(1:36),# column names to be melted
variable.name = 'period',variable.factor = FALSE
)
res = dat_long[
value > 0,# we're looking for non-zero periods
.(Earliest = min(as.integer(period))),# extract the minimum (first) period
by = .(State,Soil) # grouping variables
]
res
# State Maturing Soil Earliest
# 1: PR Early CLAY 26
# 2: PR Medium CLAY 26
# 3: PR Late SILT 26
# 4: PR Late CLAY 26
# 5: PR Early SILT 26
# 6: PR Medium SILT 26
# 7: PR Late SANDY 26
# 8: PR Medium SANDY 27
# 9: RS Early SILT 27
# 10: RS Early CLAY 27
# 11: RS Medium SANDY 27
# 12: RS Medium SILT 27
# 13: RS Medium CLAY 27
# 14: RS Late SANDY 27
# 15: RS Late SILT 27
# 16: RS Late CLAY 27
# 17: RS Early SANDY 28
# 18: PR Early SANDY 30
底线:将您的数据转换为长格式,计算变得非常容易(并且很可能在长格式中更高效)。
,尽管上述许多方法都非常好,但我发现这可以通过使用 max.col
来轻松完成。这是一个仅使用 dplyr
library(dplyr)
dat %>%
mutate(Earliest = max.col(.[,-c(1:4)] > 0,ties.method = "first")) %>%
group_by(State,Soil) %>%
summarise(Earliest = min(Earliest),.groups = 'drop')
# A tibble: 18 x 4
State Maturing Soil Earliest
<chr> <chr> <chr> <int>
1 PR Early CLAY 26
2 PR Early SANDY 30
3 PR Early SILT 26
4 PR Late CLAY 26
5 PR Late SANDY 26
6 PR Late SILT 26
7 PR Medium CLAY 26
8 PR Medium SANDY 27
9 PR Medium SILT 26
10 RS Early CLAY 27
11 RS Early SANDY 28
12 RS Early SILT 27
13 RS Late CLAY 27
14 RS Late SANDY 27
15 RS Late SILT 27
16 RS Medium CLAY 27
17 RS Medium SANDY 27
18 RS Medium SILT 27
此外,如果需要值和索引,也可以只在 dplyr 中完成,语法如下
dat %>%
mutate(Earliest = names(.[,-(1:4)])[max.col(.[,ties.method = "first")]) %>%
rowwise() %>%
mutate(E_val = get(Earliest)) %>%
group_by(State,Soil) %>%
summarise(E_val = first(E_val[Earliest == min(Earliest)]),Earliest = min(Earliest),.groups = 'drop')
# A tibble: 18 x 5
State Maturing Soil E_val Earliest
<chr> <chr> <chr> <int> <chr>
1 PR Early CLAY 40 26
2 PR Early SANDY 40 30
3 PR Early SILT 40 26
4 PR Late CLAY 30 26
5 PR Late SANDY 40 26
6 PR Late SILT 40 26
7 PR Medium CLAY 40 26
8 PR Medium SANDY 40 27
9 PR Medium SILT 30 26
10 RS Early CLAY 20 27
11 RS Early SANDY 30 28
12 RS Early SILT 30 27
13 RS Late CLAY 20 27
14 RS Late SANDY 40 27
15 RS Late SILT 30 27
16 RS Medium CLAY 20 27
17 RS Medium SANDY 40 27
18 RS Medium SILT 30 27
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。