微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

正则表达式 – 如何根据部分字符串匹配R中的其他列在数据帧中创建新列

我在 r中有一个数据帧,带有2列GL和GLDESC,并且根据GLDESC列中的某些数据,添加一个名为KIND的第3列.

数据框如下:

GL                             GLDESC
1 515100         Payroll-Indir Salary Labor
2 515900 Payroll-Indir Compensated Absences
3 532300                           Bulk Gas
4 539991                     Area Charge In
5 551000        Repairs & Maint-Spare Parts
6 551100                 Supplies-Operating
7 551300                        Consumables

对于数据表的每行:

>如果GLDESC在字符串中包含单词Payroll,那我想要KIND为工资单
>如果GLDESC在字符串中包含“Gas”字样,那么我希望KIND成为“材料”
>在所有其他情况下,我想要KIND是其他的

我在stackoverflow上寻找类似的例子,但是找不到任何的,也在R中看到在转换,grep,apply和正则表达式上的虚拟变量只尝试匹配GLDESC列的一部分,然后用该类型的字段填充KIND列,无法使其工作.

任何帮助都不胜感激.

感谢:D

由于您只有两个条件,您可以使用嵌套ifelse:
#random data; it wasn't easy to copy-paste yours  
DF <- data.frame(GL = sample(10),GLDESC = paste(sample(letters,10),c("gas","payroll12","GaSer","asdf","qweaa","PayROll-12","asdfg","GAS--2","fghfgh","qweee"),sample(letters,sep = " "))

DF$KIND <- ifelse(grepl("gas",DF$GLDESC,ignore.case = T),"Materials",ifelse(grepl("payroll","Payroll","Other"))

DF
#   GL         GLDESC      KIND
#1   8        e gas l Materials
#2   1  c payroll12 y   Payroll
#3  10      m GaSer v Materials
#4   6       t asdf n     Other
#5   2      w qweaa t     Other
#6   4 r PayROll-12 q   Payroll
#7   9      n asdfg a     Other
#8   5     d GAS--2 w Materials
#9   7     s fghfgh e     Other
#10  3      g qweee k     Other

编辑10/3/2016(..收到比预期更多的关注)

处理更多模式的可能解决方案可能是迭代所有模式,并且每当有匹配时,逐渐减少比较的数量

ff = function(x,patterns,replacements = patterns,fill = NA,...)
{
    stopifnot(length(patterns) == length(replacements))

    ans = rep_len(as.character(fill),length(x))    
    empty = seq_along(x)

    for(i in seq_along(patterns)) {
        greps = grepl(patterns[[i]],x[empty],...)
        ans[empty[greps]] = replacements[[i]]  
        empty = empty[!greps]
    }

    return(ans)
}

ff(DF$GLDESC,"payroll"),c("Materials","Payroll"),"Other",ignore.case = TRUE)
# [1] "Materials" "Payroll"   "Materials" "Other"     "Other"     "Payroll"   "Other"     "Materials" "Other"     "Other"

ff(c("pat1a pat2","pat1a pat1b","pat3","pat4"),c("pat1a|pat1b","pat2","pat3"),c("1","2","3"),fill = "empty")
#[1] "1"     "1"     "3"     "empty"

ff(c("pat1a pat2",c("pat2","pat1a|pat1b",c("2","1",fill = "empty")
#[1] "2"     "1"     "3"     "empty"

原文地址:https://www.jb51.cc/regex/356690.html

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐