微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

r循环/函数从列表中查找匹配项

如何解决r循环/函数从列表中查找匹配项

我在三列中列出了一个销售人员列表,我想在列表中查找并列出以下内容
a)在三列中的任何一列中出现其名称的地方
b)他们的名字与见习销售员一起出现(这些人的名字不在列表中)

ilist <- c("SP1","SP2","SP3","SP4","SP5")
    
df2 <- 
    data.frame(sales1 = c("SP5","SP5","SP1","SP3"),sales2 = c("",""),sales3 = c("","SP9","","SP6",""))

输出我希望能得到以下答案(尽管我会输出任何输出):

      A     B   
SP1   3     1 
SP2   1     0 
SP3   3     1 
SP4   1     1 
SP5   3     1

我尝试创建一个循环和一个函数,但似乎无法使它们正常工作。 使其生效后的目的是使其成为group_by的一部分,以便我可以按类型和年份对其进行细分

data %>%
group_by(type,year) %>%
your helpful answer here

编辑: 我正在使用的列中的select。 我的iList将如下所示 (在“ 3列”的第2列和第3列中,空白处仅包含销售人员,而销售人员仅出现在第1列中;对于销售人员或受训人员的出现位置,也没有设置位置)

ilist <- c("SJ","KW","MOLC","FERB","BACC")



structure(list(iYear = structure(c(1L,4L,3L,5L,6L,9L),.Label = c("2020-07-01","2020-07-02","2020-07-03","2020-07-04","2020-07-06","2020-07-07","2020-07-08","2020-07-09","2020-07-10","2020-07-11","2020-07-12","2020-07-13","2020-07-14","2020-07-15","2020-07-16","2020-07-17","2020-07-18","2020-07-19","2020-07-20","2020-07-21","2020-07-22","2020-07-23","2020-07-24","2020-07-25","2020-07-27","2020-07-28","2020-07-29","2020-07-30","2020-07-31"),class = "factor"),iType = structure(c(4L,4L),.Label = c("","ZB","BS","CFN","CTR","MJ","UK","EFH","ENOC","EY","F","G","CD","HAEM","HN","IC","LB","LY","MNN","MOS","NERO","ZZZ","ZZZQE","GFT","PG","RE","SK","UR"),Sales.1 = structure(c(74L,20L,74L,16L,58L,41L),"ABUE","AHMEM","AJOS","ANNS","AOK","BACC","BH","BLAFM","BLOCA","BRAD","broWNJ","BRT","BUIH","BURDA","BURYA","CANRJ","CAVM","CHAMBA","COOSNP","COUPSI","CPH","CTT","Dara","DILP","EXPAT","FCH","FERMA","GT","HAMJR","HENJ","HENJA","HOWRA","HUSA","ILINC","JONG","KC","KNOT","LAUC","LOOP","LYEJO","LYNN","MAJJ","MCGREA","MENT","MKB","MUDHS","MULLM","NC","NODS","O'BSG","OLIT","OLIVK","PAEI","PARKD","PATEF","PERT","POL","PTRHUS","RAMACN","RAMS","REYMA","ROBCM","ROBINE","SAMJN","SAYC","SHARMM","SHEG","SJ","SJN","SKINT","SLOP","SORT","SOUBIO","SPOE","TELED","THAN","THEL","TURH","TURHJ","UCONS","UPH","UT","VALK","WALJ"
    ),Sales.2 = structure(c(1L,12L,1L,45L),"GYNT","VALK"),Sales.3 = structure(c(1L,1L),class = "factor")),row.names = c(NA,10L),class = "data.frame")

解决方法

由于您说您期望SP2 | 1 | 0,但SP2并未出现在第1行中,因此我真的不太理解预期的结果。

>
library(data.table)

sales <- data.table(sale = c("SP1","SP2","SP3","SP4","SP5"))

sales_group <- 
  data.table(
    sales1 = c("SP5","SP5","SP1","SP3"),sales2 = c("",""),sales3 = c("","SP9","","SP6","")
  )

all <- sort(sales_group[,unique(c(sales1,sales2,sales3))])
all <- all[all != ""]
trainees <- all[!all %in% c(sales$sale,"")]

sales_group[,pos := seq(.N)]

sales1 <- merge(sales,sales_group,by.x = "sale",by.y = "sales1")
sales2 <- merge(sales,by.y = "sales2")
sales3 <- merge(sales,by.y = "sales3")
setnames(sales1,c("sale","plusone","plustwo","sales_pos"))
setnames(sales2,"sales_pos"))
setnames(sales3,"sales_pos"))
sales_visit_by_sale <- rbind(sales1,sales3)
sales_visit_by_sale[,with_trainee := FALSE]
sales_visit_by_sale[(plusone %in% trainees) | (plustwo %in% trainees),with_trainee := TRUE]
sales_visit_by_sale[(order(sale,sales_pos)),.(sale,sales_pos,with_trainee)]
,

我不确定这不是您要查找的内容,但认为可能会有所帮助。出于对使用group_by的兴趣,听起来您可能希望使用tidyverse方法。

在这里,将添加行号,因此您可以group_by每行查看受训人员是否与销售人员在同一行。

然后,使用pivot_longer放入长格式,并删除空字符串。

按行号分组时,可以添加一个指示符,指出这些人将与见习销售员一起出现。它会查看该人是否未包含在ilist中。

最后,您可以group_by每个销售人员,仅将ilist中的销售人员与filter包括在内,并相加出现的次数(假设初始数据中的每行仅一次) )以及受训人员的数量。

library(tidyverse)

df2 %>%
  mutate(rn = row_number()) %>%
  pivot_longer(cols = -rn) %>%
  na_if("") %>%
  na.omit %>%
  group_by(rn) %>%
  mutate(with_trainee = ifelse(any(!value %in% ilist),1,0)) %>%
  group_by(value) %>%
  filter(value %in% ilist) %>%
  summarise(A = n(),B = sum(with_trainee))

输出

  value     A     B
  <chr> <int> <dbl>
1 SP1       3     1
2 SP2       1     0
3 SP3       3     1
4 SP4       2     1
5 SP5       3     1

编辑1:使用“实时数据”,并按年份对iYeariType进行分组,您可以尝试以下操作:

library(tidyverse)

df2 %>%
  mutate(rn = row_number(),iYear = substr(iYear,4)) %>%
  pivot_longer(cols = -c(rn,iYear,iType)) %>%
  na_if("") %>%
  na.omit %>%
  group_by(rn,iType) %>%
  mutate(with_trainee = ifelse(any(!value %in% ilist),0)) %>%
  group_by(value,iType) %>%
  filter(value %in% ilist) %>%
  summarise(A = n(),B = sum(with_trainee)) 

编辑2:其他详细说明:

在这种情况下,行号(rnrow_number)非常有用,因为您想知道是否同时存在销售人员(这意味着“在同一行内”)。因此,如果2个销售人员共享相同的rn,则他们会同时出现。

iYear更改为仅一年。它使用substr()(子字符串)来获取iYear的第1至第4个字符,该字符在XXXX-XX-XX日期格式中是年份。

pivot_longer(及其朋友pivot_wider)非常强大,可以从长宽格式的数据进行转换。在tidyr package中,pivot_longer占据所有列(rniYeariType除外)并放入两列(namevalue)。 value现在将销售人员包含在单个列中,而不是开头的多个列。

na_if("")将使空白字符串""变成NA(缺少数据)。接下来的na.omit将使用NA删除那些行。

带有group_by的{​​{1}}确保您正在共同查看共享相同rn的那些销售人员。我添加了rniYear,以便它们也将显示在最终的汇总结果中。然后,iType是一个新列,其中将包含该销售人员是否与受训人员在一起(在with_trainee之后,使用group_by查看组中是否有“任何行”,共享any中的相同rn)。如果存在,则编码为1,如果没有,则编码为0。

下一个ilistgroup_by(或销售人员)使用value,因为您只希望对filter中的人有结果。 (如果您希望所有人,包括不在ilist中的受训者,则可以不填写此行。)

最后的ilistsummarise一起使用-group_by显示每个n()(或每个销售人员)的数据行数,与销售人员可能整体上出现的value个不同值的数量。 rn是给定sum(with_trainee)(或销售人员)的with_trainee为1的总次数。

输出

value

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。