微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何根据R中的条件使用两个变量合并三个数据框

如何解决如何根据R中的条件使用两个变量合并三个数据框

R 用户, 我想将来自三个不同数据框(studentsPublic、studentsPrivate、studentState)的数据合并到一个名为 Final_Desired_df 的数据框中。 Final_Desired_df。我想使用学生的电子邮件地址或他们的社会安全号码 (ssn)。下面的示例说明了我需要什么以及 Final_Desired_df 的描述。预先感谢您的帮助。

studentsPublic = randomNames::randomNames(10)
emailPublic = c('a@usa.com',NA,'b@usa.com','c@usa.com','d@usa.com','e@usa.com','f@usa.com','g@usa.com')
exampublic = rnorm(10,mean=15,sd=5)
d1_PublicSchool = data.frame(studentsPublic,emailPublic,exampublic)

studentsPrivate = randomNames::randomNames(10)
emailPivate = c('t@usa.com',NA)
ssnPrivate = c(NA,12,34,45,67,32,23,NA )
exanPrivate = rnorm(10,sd=5)
d2_PrivateSchool = data.frame(studentsPrivate,emailPivate,ssnPrivate,exanPrivate)

studentsstate = randomNames::randomNames(30)
emailState = c('a@usa.com','g@usa.com')
ssnState = c(NA,NA)
sexState = rep(c('male','female'),15,15)
d3_StateSchools = data.frame(studentsstate,emailState,ssnState,sexState)

Final_Desired_df = 应包括来自 d1_PublicSchool 且电子邮件地址在 d3_StateSchools 中的所有学生;以及来自 d2_PrivateSchool 的所有学生,他们的 emailPivate 在 d3_StateSchools 中,或者他们的 ssnPrivate 在 d3_StateSchools 中。

提前致谢。

解决方法

这个怎么样?我不得不重命名列以连接到最终数据框,并添加了删除重复行的最后一步。

# all students from d1_PublicSchool whose email addresses are in the d3_StateSchools
students_from_d1_in_d3_email<-d1_PublicSchool[which(d1_PublicSchool$emailPublic %in% d3_StateSchools$emailState),]

# add missing column of ssn as NAs
students_from_d1_in_d3_email<-cbind(students_from_d1_in_d3_email$studentsPublic,students_from_d1_in_d3_email$emailPublic,"ssn"=NA,students_from_d1_in_d3_email$examPublic)

# adjust column names to match
colnames(students_from_d1_in_d3_email)<-c("name","email","ssn","exam")

# all students from d2_PrivateSchool whose emailPivate are in the d3_StateSchools
students_from_d2_in_d3_email<-d2_PrivateSchool[which(d2_PrivateSchool$emailPivate %in% d3_StateSchools$emailState),]

# adjust column names to match
colnames(students_from_d2_in_d3_email)<-c("name","exam")

# all students from d2_PrivateSchool whose ssnPrivate are in the d3_StateSchools
students_from_d2_in_d3_SSN<-d2_PrivateSchool[which(d2_PrivateSchool$ssnPrivate %in% d3_StateSchools$ssnState),]

# adjust column names to match
colnames(students_from_d2_in_d3_SSN)<-c("name","exam")

# Final dataframe
Final_Desired_df<-rbind(students_from_d1_in_d3_email,students_from_d2_in_d3_email,students_from_d2_in_d3_SSN)


# Remove duplicate students in final dataframe
Final_Desired_df<-unique(Final_Desired_df)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。