如何解决如何使用连接到主ID列的数据基于辅助ID列创建新列
这有点难以解释。我有一组speed dating data off of Kaggle,有一列Subject_IDs
和Partner_IDs
(如何重命名数据)。 Subject_IDs
上有诸如种族和性别之类的列,但每个主题也是数据集中的伙伴。我想根据我重命名为Subject_Gender
和Subject_Race
的列创建Partner_Race和Partner_Gender列。
编辑:澄清一下,Partner_IDs
是Subject_IDs
中的同一个人,并使用相同的ID号。它们只是放在不同的列中。
我真的迷失了执行这些步骤所需要的逻辑步骤。当然,我的数据比六个观察值还要长,或者我只是手动完成。我希望使用dplyr或plyr方法,但如果不可能,那就可以了
我的数据如下:
Subject_ID Partner_ID Subject_Race Subject_Gender
1 6 Caucasian Female
2 5 Asian Male
3 4 African_American Female
4 3 Other Female
5 2 Latin Male
6 1 NA Male
这就是我要创建的
Subject_ID Partner_ID Subject_Race Subject_Gender **Partner_Race Partner Gender**
1 6 Caucasian Female NA Male
2 5 Asian Male Latino Male
3 4 African_American Female Other Female
4 3 Other Female African_American Female
5 2 Latino Male Asian Male
6 1 NA Male Caucasian Female
我仍然掌握数据清理和论证的基础知识。这是我头上的
解决方法
您可以单独将数据与列Partner_ID
和Subject_ID
连接起来。
df <- read.table(text = "Subject_ID Partner_ID Subject_Race Subject_Gender
1 6 Caucasian Female
2 5 Asian Male
3 4 African_American Female
4 3 Other Female
5 2 Latin Male
6 1 NA Male",header = T)
library(tidyverse)
df %>%
dplyr::left_join(df,by = c("Subject_ID" = "Partner_ID"),suffix = c("","_Partner")) %>%
dplyr::select(-Subject_ID_Partner,Partner_Gender = Subject_Gender_Partner,Partner_Race = Subject_Race_Partner)
输出:
Subject_ID Partner_ID Subject_Race Subject_Gender Partner_Race Partner_Gender
1 1 6 Caucasian Female <NA> Male
2 2 5 Asian Male Latin Male
3 3 4 African_American Female Other Female
4 4 3 Other Female African_American Female
5 5 2 Latin Male Asian Male
6 6 1 <NA> Male Caucasian Female
>
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。