如何解决如何使用 R 中的另一列重塑数据框以创建二进制组合
name company
A Mazda
B Benz
C Mazda
D Toyota
E Benz
F Mazda
E BMW
G Benz
A Toyota
C Toyota
B BMW
我想使用 dplyr 并通过显示在同一家公司工作过的两个名字的任意组合来对数据进行排序。名称的顺序无关紧要,例如,A 和 C 组合与 C 和 A 组合没有区别。所以有利的结果是
name 1 name 2 company
A C Mazda
A C Toyota
A F Mazda
B E Benz
B E BMW
C F Mazda
B G Benz
E G Benz
D C Toyota
A C Toyota
解决方法
这是使用 purrr
library(dplyr)
library(purrr)
# Function that take in a df that only contains one company names
company_combination <- function(data) {
stopifnot(length(unique(data$company)) == 1)
# generate a empty tibble if there only one record for the company
if (nrow(data) == 1) {
names_comb <- tibble()
} else {
# get the company name
company <- first(data[["company"]])
# generate name combination from the name list and put it into tibble of two columns
names_comb <- as_tibble(t(combn(x = unique(data[["name"]]),m = 2)))
# change column names to name_1 & name_2
names(names_comb) <- c("name_1","name_2")
# Add the company name into the new name combination df and return the result
names_comb %<>% mutate(company = company)
}
names_comb
}
# Here there are three steps
df %>%
# 1st split original data into list of dataframe group by company
split(.$company) %>%
# Then apply the company_combination function for each company df
# Result of this command is a new list of dataframe as result of the function
map(.,company_combination) %>%
# Then bind them all together back to one dataframe.
bind_rows()
结果如下:
# A tibble: 10 x 3
name_1 name_2 company
<chr> <chr> <chr>
1 B E Benz
2 B G Benz
3 E G Benz
4 B E BMW
5 A C Mazda
6 A F Mazda
7 C F Mazda
8 A C Toyota
9 A D Toyota
10 C D Toyota
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。