微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

命名水果的两列上的自定义聚合

我希望按名称聚合数据框的两列,具体方法如下:

>通过专门汇总两列水果和零件,将零件列放入结果中
>虽然Apple,Banana和StrawBerry的零件价值无关紧要,一切都得到了总结,Grape和Kiwi的零件价值应该成为新的水果名称
>结果(在底部)应该有8个聚合行而不是20个

这听起来似乎很简单,但经过数小时的试验和错误,我没有找到任何有用的解决方案.这是一个例子:

theDF <- data.frame(dates = as.Date(c(today()+20)),fruits = c("Apple","Apple","Banana","StrawBerry","Grape","Kiwi","Kiwi"),parts = c("Big Green Apple","Apple2","Blue Apple","XYZ Apple4","Yellow Banana1","Small Banana","Banana3","Banana4","Red Small StrawBerry","Red StrawBerryY","Big StrawBerry","StrawBerryZ","Green Grape","Blue Grape","Big Kiwi","Small Kiwi","Middle Kiwi"),stock = as.vector(sample(1:20)) )

当前数据框:

enter image description here

所需的输出

enter image description here

解决方法

我们可以使用data.table.如果有一些模式,如结束字符是大写字母或要删除的“部分”列中的数字,我们可以使用sub来做这个并用作分组变量和’dates’并得到’stock’的总和.

library(data.table)
setDT(theDF)[,.(stock = sum(stock)),.(dates,fruits = sub("([0-9]|[A-Z])$","",parts))]
#        dates      fruits stock
#1: 2016-06-19       Apple    46
#2: 2016-06-19      Banana    35
#3: 2016-06-19  StrawBerry    38
#4: 2016-06-19 Green Grape    12
#5: 2016-06-19  Blue Grape    21
#6: 2016-06-19    Big Kiwi    37
#7: 2016-06-19  Small Kiwi    14 
#8: 2016-06-19 Middle Kiwi     7

或者使用dplyr,我们可以类似地实现相同的方法.

library(dplyr)
theDF %>%
    group_by(dates,fruits = sub('([0-9]|[A-Z])$','',parts)) %>% 
    summarise(stock = sum(stock))

更新

如果没有模式并且仅基于手动识别’fruits’中的元素,则创建元素向量,使用%chin%获取’i’中的逻辑索引,赋值(:=)’parts’中的值对应到’我’到’水果’,然后通过’日期’,’水果’做组,并获得’股票’的总和.

setDT(theDF)[as.character(fruits) %chin% c("Grape",fruits := parts][,fruits)]

数据

theDF <- structure(list(dates = structure(c(16971,16971,16971),class = "Date"),fruits = structure(c(1L,1L,2L,5L,3L,4L,4L),.Label = c("Apple","StrawBerry"),class = "factor"),parts = structure(c(1L,6L,7L,8L,14L,15L,16L,11L,10L,9L,13L,12L),.Label = c("Apple1","Apple3","Apple4","Banana1","Banana2","Middle Kiwi","StrawBerryX","StrawBerryY","StrawBerryZ"
    ),stock = c(8,19,15,4,6,18,1,10,9,16,11,2,12,13,5,3,17,14,20,7)),.Names = c("dates","fruits","parts","stock"),row.names = c(NA,-20L),class = "data.frame")

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐