如何解决分散列并按r中的ID进行计数
我有一个因子列,名称为Lead_DataSource__c
。我想将每个因子分散到一列,然后按ID在每一行中显示的因子计数来填补空白。
这是我数据框的头;
head(df)
Id Lead_DataSource__c numberoflead leadduration lasttouch firsttouch
<chr> <chr> <int> <drtn> <chr> <chr>
1 0010I000026fxp6QAA NA 1 NA days NA NA
2 0010I000026frM6QAI Walk in 1 0.0000 days Walk in Walk in
3 0010I000026frOQQAY Walk in 1 0.0000 days Walk in Walk in
4 0010I000026frsUQAQ Walk in 3 243.9656 days Walk in Facebook
5 0010I000026frsUQAQ Facebook 3 243.9656 days Walk in Facebook
6 0010I000026frsUQAQ Facebook 3 243.9656 days Walk in Facebook
我需要这个;
Id lastcreateddateoflead lasttouch firsttouch Facebook Walk.in <NA>
1 0010I000026frM6QAI 43575 Walk in Walk in 0 1 0
2 0010I000026frOQQAY 43843 Walk in Walk in 0 1 0
3 0010I000026frsUQAQ 43794 Walk in Facebook 2 1 0
4 0010I000026frsUQAQ 43794 Walk in Facebook 2 1 0
5 0010I000026frsUQAQ 43794 Walk in Facebook 2 1 0
6 0010I000026fsBrQAI 43699 Facebook Facebook 1 0 0
到目前为止,我已经使用dplyr进行了尝试,但没有得到上面想要的;
df%>%
group_by(Id,Lead_DataSource__c) %>%
mutate(numberofleadsource=n()) %>%
spread(Lead_DataSource__c,numberofleadsource,fill = 0)
这是我的代码的输出;
Id lastcreateddateoflead lasttouch firsttouch Facebook Walk.in <NA>
1 0010I000026frM6QAI 43575 Walk in Walk in 0 1 0
2 0010I000026frOQQAY 43843 Walk in Walk in 0 1 0
3 0010I000026frsUQAQ 43794 Walk in Facebook 2 0 0
4 0010I000026frsUQAQ 43794 Walk in Facebook 2 0 0
5 0010I000026frsUQAQ 43794 Walk in Facebook 0 1 0
6 0010I000026fsBrQAI 43699 Facebook Facebook 1 0 0
有人可以帮我解决我在这里想念的东西吗?
输入数据:
structure(list(Id = c("0010I000026fxp6QAA","0010I000026frM6QAI","0010I000026frOQQAY","0010I000026frsUQAQ","0010I000026frsUQAQ"),Lead_DataSource__c = c(NA,"Walk in","Facebook","Facebook"),numberoflead = c(1L,1L,3L,3L),leadduration = structure(c(NA,243.9656,243.9656),class = "difftime",units = "days"),lasttouch = c(NA,"Walk in"),firsttouch = c(NA,"Facebook")),row.names = c(NA,-6L),class = c("tbl_df","tbl","data.frame"))
解决方法
在这里,我使用add_count()
来计算每个ID /线索源组合发生的次数,然后使用pivot_wider()
进行传播。最后一行填补了数据透视表中缺失的值。
library(dplyr)
library(tidyr)
df %>%
add_count(Id,Lead_DataSource__c) %>%
mutate(tmp = 1:nrow(.)) %>%
pivot_wider(names_from = Lead_DataSource__c,values_from = n) %>%
select(-tmp) %>%
group_by(Id) %>%
mutate_at(c("NA","Walk in","Facebook"),~ifelse(any(!is.na(.)),.[!is.na(.)][1],0))
# A tibble: 6 x 8
# Groups: Id [4]
Id numberoflead leadduration lasttouch firsttouch `NA` `Walk in` Facebook
<chr> <int> <drtn> <chr> <chr> <dbl> <dbl> <dbl>
1 0010I000026fxp6QAA 1 NA days NA NA 1 0 0
2 0010I000026frM6QAI 1 0.0000 days Walk in Walk in 0 1 0
3 0010I000026frOQQAY 1 0.0000 days Walk in Walk in 0 1 0
4 0010I000026frsUQAQ 3 243.9656 days Walk in Facebook 0 1 2
5 0010I000026frsUQAQ 3 243.9656 days Walk in Facebook 0 1 2
6 0010I000026frsUQAQ 3 243.9656 days Walk in Facebook 0 1 2
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。