如何解决如何使用stringr中的`separate`函数来拆分列名 数据
[1] "average_on_return_belief_01" "average_on_return_belief_02"
[3] "average_on_return_belief_03" "average_on_return_belief_04"
[5] "average_on_return_belief_05" "average_on_return_belief_06"
[7] "average_on_return_belief_07" "average_on_return_belief_08"
[9] "average_on_return_belief_09" "average_on_return_belief_10"
[11] "average_on_return_belief_11" "average_on_return_belief_12"
[13] "average_on_send_belief_01" "average_on_send_belief_02"
[15] "average_on_send_belief_03" "average_on_send_belief_04"
[17] "average_on_send_belief_05" "average_on_send_belief_06"
[19] "average_on_send_belief_07" "average_on_send_belief_08"
[21] "average_on_send_belief_09" "average_on_send_belief_10"
[23] "average_on_send_belief_11" "average_on_send_belief_12"
[25] "sender_decision_01" "sender_decision_02"
[27] "sender_decision_03" "sender_decision_04"
[29] "sender_decision_05" "sender_decision_06"
[31] "sender_decision_07" "sender_decision_08"
[33] "sender_decision_09" "sender_decision_10"
[35] "sender_decision_11" "sender_decision_12"
...
我需要使用 separate
中的 stringr
函数(对于 pivot_longer
names_sep
参数)将它们分成两列:parameter name
和 {{1} }.
code
但不知道如何将它们输入到 to_retrieve <- c("average_on_return_belief_","average_on_send_belief_","sender_decision_","return_decision_","receiver_belief_","sender_belief_" )
中,从而将 separate
的输入拆分为 sender_belief_01
之类的内容
非常欢迎任何想法或提示。
解决方法
我认为没有必要显式传递 to_retrieve
。您可以使用 extract
使用正则表达式将一列分成两个新列。
tidyr::extract(df,cols,c('col1','col2'),'(.*)_(.*)')
# col1 col2
# <chr> <chr>
# 1 average_on_return_belief 01
# 2 average_on_return_belief 02
# 3 average_on_return_belief 11
# 4 average_on_return_belief 12
# 5 average_on_send_belief 01
# 6 average_on_send_belief 02
# 7 average_on_send_belief 11
# 8 average_on_send_belief 12
# 9 sender_decision 01
#10 sender_decision 02
#11 sender_decision 11
#12 sender_decision 12
数据
df <- structure(list(cols = c("average_on_return_belief_01","average_on_return_belief_02","average_on_return_belief_11","average_on_return_belief_12","average_on_send_belief_01","average_on_send_belief_02","average_on_send_belief_11","average_on_send_belief_12","sender_decision_01","sender_decision_02","sender_decision_11","sender_decision_12")),row.names = c(NA,-12L),class = c("tbl_df","tbl","data.frame"))
,
我们可以在 separate
中使用正则表达式查找匹配字符串末尾 (_
) 数字 (\\d+$
) 之前的 $
library(tidyr)
separate(df,into = c("col1","col2"),"_(?=\\d+$)")
# A tibble: 12 x 2
# col1 col2
# <chr> <chr>
# 1 average_on_return_belief 01
# 2 average_on_return_belief 02
# 3 average_on_return_belief 11
# 4 average_on_return_belief 12
# 5 average_on_send_belief 01
# 6 average_on_send_belief 02
# 7 average_on_send_belief 11
# 8 average_on_send_belief 12
# 9 sender_decision 01
#10 sender_decision 02
#11 sender_decision 11
#12 sender_decision 12
数据
df <- structure(list(cols = c("average_on_return_belief_01","data.frame"))
,
这是一个使用 data.table
的 tstrsplit
选项,遵循与 @akrun 相同的正则表达式模式
setDT(df)[,setNames(tstrsplit(cols,"_(?=\\d+$)",perl = TRUE),c("col1","col2"))]
给出
col1 col2
1: average_on_return_belief 01
2: average_on_return_belief 02
3: average_on_return_belief 11
4: average_on_return_belief 12
5: average_on_send_belief 01
6: average_on_send_belief 02
7: average_on_send_belief 11
8: average_on_send_belief 12
9: sender_decision 01
10: sender_decision 02
11: sender_decision 11
12: sender_decision 12
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。