如何解决r循环/函数从列表中查找匹配项
我在三列中列出了一个销售人员列表,我想在列表中查找并列出以下内容:
a)在三列中的任何一列中出现其名称的地方
b)他们的名字与见习销售员一起出现(这些人的名字不在列表中)
ilist <- c("SP1","SP2","SP3","SP4","SP5")
df2 <-
data.frame(sales1 = c("SP5","SP5","SP1","SP3"),sales2 = c("",""),sales3 = c("","SP9","","SP6",""))
A B
SP1 3 1
SP2 1 0
SP3 3 1
SP4 1 1
SP5 3 1
我尝试创建一个循环和一个函数,但似乎无法使它们正常工作。
使其生效后的目的是使其成为group_by
的一部分,以便我可以按类型和年份对其进行细分
data %>%
group_by(type,year) %>%
your helpful answer here
编辑:
我正在使用的列中的select
。
我的iList将如下所示
(在“ 3列”的第2列和第3列中,空白处仅包含销售人员,而销售人员仅出现在第1列中;对于销售人员或受训人员的出现位置,也没有设置位置)
ilist <- c("SJ","KW","MOLC","FERB","BACC")
structure(list(iYear = structure(c(1L,4L,3L,5L,6L,9L),.Label = c("2020-07-01","2020-07-02","2020-07-03","2020-07-04","2020-07-06","2020-07-07","2020-07-08","2020-07-09","2020-07-10","2020-07-11","2020-07-12","2020-07-13","2020-07-14","2020-07-15","2020-07-16","2020-07-17","2020-07-18","2020-07-19","2020-07-20","2020-07-21","2020-07-22","2020-07-23","2020-07-24","2020-07-25","2020-07-27","2020-07-28","2020-07-29","2020-07-30","2020-07-31"),class = "factor"),iType = structure(c(4L,4L),.Label = c("","ZB","BS","CFN","CTR","MJ","UK","EFH","ENOC","EY","F","G","CD","HAEM","HN","IC","LB","LY","MNN","MOS","NERO","ZZZ","ZZZQE","GFT","PG","RE","SK","UR"),Sales.1 = structure(c(74L,20L,74L,16L,58L,41L),"ABUE","AHMEM","AJOS","ANNS","AOK","BACC","BH","BLAFM","BLOCA","BRAD","broWNJ","BRT","BUIH","BURDA","BURYA","CANRJ","CAVM","CHAMBA","COOSNP","COUPSI","CPH","CTT","Dara","DILP","EXPAT","FCH","FERMA","GT","HAMJR","HENJ","HENJA","HOWRA","HUSA","ILINC","JONG","KC","KNOT","LAUC","LOOP","LYEJO","LYNN","MAJJ","MCGREA","MENT","MKB","MUDHS","MULLM","NC","NODS","O'BSG","OLIT","OLIVK","PAEI","PARKD","PATEF","PERT","POL","PTRHUS","RAMACN","RAMS","REYMA","ROBCM","ROBINE","SAMJN","SAYC","SHARMM","SHEG","SJ","SJN","SKINT","SLOP","SORT","SOUBIO","SPOE","TELED","THAN","THEL","TURH","TURHJ","UCONS","UPH","UT","VALK","WALJ"
),Sales.2 = structure(c(1L,12L,1L,45L),"GYNT","VALK"),Sales.3 = structure(c(1L,1L),class = "factor")),row.names = c(NA,10L),class = "data.frame")
解决方法
由于您说您期望SP2 | 1 | 0
,但SP2并未出现在第1行中,因此我真的不太理解预期的结果。
library(data.table)
sales <- data.table(sale = c("SP1","SP2","SP3","SP4","SP5"))
sales_group <-
data.table(
sales1 = c("SP5","SP5","SP1","SP3"),sales2 = c("",""),sales3 = c("","SP9","","SP6","")
)
all <- sort(sales_group[,unique(c(sales1,sales2,sales3))])
all <- all[all != ""]
trainees <- all[!all %in% c(sales$sale,"")]
sales_group[,pos := seq(.N)]
sales1 <- merge(sales,sales_group,by.x = "sale",by.y = "sales1")
sales2 <- merge(sales,by.y = "sales2")
sales3 <- merge(sales,by.y = "sales3")
setnames(sales1,c("sale","plusone","plustwo","sales_pos"))
setnames(sales2,"sales_pos"))
setnames(sales3,"sales_pos"))
sales_visit_by_sale <- rbind(sales1,sales3)
sales_visit_by_sale[,with_trainee := FALSE]
sales_visit_by_sale[(plusone %in% trainees) | (plustwo %in% trainees),with_trainee := TRUE]
sales_visit_by_sale[(order(sale,sales_pos)),.(sale,sales_pos,with_trainee)]
,
我不确定这不是您要查找的内容,但认为可能会有所帮助。出于对使用group_by
的兴趣,听起来您可能希望使用tidyverse
方法。
在这里,将添加行号,因此您可以group_by
每行查看受训人员是否与销售人员在同一行。
然后,使用pivot_longer
放入长格式,并删除空字符串。
按行号分组时,可以添加一个指示符,指出这些人将与见习销售员一起出现。它会查看该人是否未包含在ilist
中。
最后,您可以group_by
每个销售人员,仅将ilist
中的销售人员与filter
包括在内,并相加出现的次数(假设初始数据中的每行仅一次) )以及受训人员的数量。
library(tidyverse)
df2 %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = -rn) %>%
na_if("") %>%
na.omit %>%
group_by(rn) %>%
mutate(with_trainee = ifelse(any(!value %in% ilist),1,0)) %>%
group_by(value) %>%
filter(value %in% ilist) %>%
summarise(A = n(),B = sum(with_trainee))
输出
value A B
<chr> <int> <dbl>
1 SP1 3 1
2 SP2 1 0
3 SP3 3 1
4 SP4 2 1
5 SP5 3 1
编辑1:使用“实时数据”,并按年份对iYear
和iType
进行分组,您可以尝试以下操作:
library(tidyverse)
df2 %>%
mutate(rn = row_number(),iYear = substr(iYear,4)) %>%
pivot_longer(cols = -c(rn,iYear,iType)) %>%
na_if("") %>%
na.omit %>%
group_by(rn,iType) %>%
mutate(with_trainee = ifelse(any(!value %in% ilist),0)) %>%
group_by(value,iType) %>%
filter(value %in% ilist) %>%
summarise(A = n(),B = sum(with_trainee))
编辑2:其他详细说明:
在这种情况下,行号(rn
至row_number
)非常有用,因为您想知道是否同时存在销售人员(这意味着“在同一行内”)。因此,如果2个销售人员共享相同的rn
,则他们会同时出现。
iYear
更改为仅一年。它使用substr()
(子字符串)来获取iYear
的第1至第4个字符,该字符在XXXX-XX-XX日期格式中是年份。
pivot_longer
(及其朋友pivot_wider
)非常强大,可以从长宽格式的数据进行转换。在tidyr
package中,pivot_longer
占据所有列(rn
,iYear
和iType
除外)并放入两列(name
和value
)。 value
现在将销售人员包含在单个列中,而不是开头的多个列。
na_if("")
将使空白字符串""
变成NA
(缺少数据)。接下来的na.omit
将使用NA
删除那些行。
带有group_by
的{{1}}确保您正在共同查看共享相同rn
的那些销售人员。我添加了rn
和iYear
,以便它们也将显示在最终的汇总结果中。然后,iType
是一个新列,其中将包含该销售人员是否与受训人员在一起(在with_trainee
之后,使用group_by
查看组中是否有“任何行”,共享any
中的相同rn
)。如果存在,则编码为1,如果没有,则编码为0。
下一个ilist
由group_by
(或销售人员)使用value
,因为您只希望对filter
中的人有结果。 (如果您希望所有人,包括不在ilist
中的受训者,则可以不填写此行。)
最后的ilist
与summarise
一起使用-group_by
显示每个n()
(或每个销售人员)的数据行数,与销售人员可能整体上出现的value
个不同值的数量。 rn
是给定sum(with_trainee)
(或销售人员)的with_trainee
为1的总次数。
输出
value
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。