微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

从另一个数据框中用相同的字符串马赫一个字符串

如何解决从另一个数据框中用相同的字符串马赫一个字符串

我有这个数据框 (DF1)

structure(list(ID = 1:3,Text = c("there was not clostridium","clostridium difficile positive","test was OK")),class = "data.frame",row.names = c(NA,-3L)) 

ID TEXT
1  "there was not clostridium"
2  "clostridium difficile positive"
3  "test was OK"

和数据框 (DF2)

structure(list(ID = 1:3,Microorganisms = c("ESCHERICHIA COLI","CLOSTRIDIUM DIFFICILE","FUNGI")),-3L))

ID Microorganisms
1  ESCHERICHIA COLI
2  CLOSTRIDIUM DIFFICILE
3  FUNGI

我想用正则表达式找到匹配的 DF1 和 DF2 并将它们放到这样的新列中

ID TEXT                                Microorganism
1  "there was not clostridium"         CLOSTRIDIUM DIFFICILE
2  "clostridium difficile positive"    CLOSTRIDIUM DIFFICILE
3  "test was OK"                       no

我试过这样的事情

DF1 %>% mutate(Mikroorganism = ifelse(grepl(DF2$Microorganisms,TEXT),str_extract(TEXT,DF2$Microorganisms),"no"))

但事实并非如此。

解决方法

一种方法是使用 fuzzyjoin 包。

DF1 %>%
  fuzzyjoin::regex_left_join(
    transmute(DF2,Microorganisms,ptn = gsub("\\s+","|",Microorganisms)),by = c("Text" = "ptn"),ignore_case = TRUE) %>%
  select(-ptn)
#   ID                           Text        Microorganisms
# 1  1      there was not clostridium CLOSTRIDIUM DIFFICILE
# 2  2 clostridium difficile positive CLOSTRIDIUM DIFFICILE
# 3  3                    test was OK                  <NA>

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。