如何解决从另一个数据框中用相同的字符串马赫一个字符串
我有这个数据框 (DF1)
structure(list(ID = 1:3,Text = c("there was not clostridium","clostridium difficile positive","test was OK")),class = "data.frame",row.names = c(NA,-3L))
ID TEXT
1 "there was not clostridium"
2 "clostridium difficile positive"
3 "test was OK"
和数据框 (DF2)
structure(list(ID = 1:3,Microorganisms = c("ESCHERICHIA COLI","CLOSTRIDIUM DIFFICILE","FUNGI")),-3L))
ID Microorganisms
1 ESCHERICHIA COLI
2 CLOSTRIDIUM DIFFICILE
3 FUNGI
我想用正则表达式找到匹配的 DF1 和 DF2 并将它们放到这样的新列中
ID TEXT Microorganism
1 "there was not clostridium" CLOSTRIDIUM DIFFICILE
2 "clostridium difficile positive" CLOSTRIDIUM DIFFICILE
3 "test was OK" no
我试过这样的事情
DF1 %>% mutate(Mikroorganism = ifelse(grepl(DF2$Microorganisms,TEXT),str_extract(TEXT,DF2$Microorganisms),"no"))
但事实并非如此。
解决方法
一种方法是使用 fuzzyjoin
包。
DF1 %>%
fuzzyjoin::regex_left_join(
transmute(DF2,Microorganisms,ptn = gsub("\\s+","|",Microorganisms)),by = c("Text" = "ptn"),ignore_case = TRUE) %>%
select(-ptn)
# ID Text Microorganisms
# 1 1 there was not clostridium CLOSTRIDIUM DIFFICILE
# 2 2 clostridium difficile positive CLOSTRIDIUM DIFFICILE
# 3 3 test was OK <NA>
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。