我希望按名称聚合数据框的两列,具体方法如下:
>通过专门汇总两列水果和零件,将零件列放入结果中
>虽然Apple,Banana和StrawBerry的零件价值无关紧要,一切都得到了总结,Grape和Kiwi的零件价值应该成为新的水果名称
>结果(在底部)应该有8个聚合行而不是20个
这听起来似乎很简单,但经过数小时的试验和错误,我没有找到任何有用的解决方案.这是一个例子:
theDF <- data.frame(dates = as.Date(c(today()+20)),fruits = c("Apple","Apple","Banana","StrawBerry","Grape","Kiwi","Kiwi"),parts = c("Big Green Apple","Apple2","Blue Apple","XYZ Apple4","Yellow Banana1","Small Banana","Banana3","Banana4","Red Small StrawBerry","Red StrawBerryY","Big StrawBerry","StrawBerryZ","Green Grape","Blue Grape","Big Kiwi","Small Kiwi","Middle Kiwi"),stock = as.vector(sample(1:20)) )
当前数据框:
所需的输出:
解决方法
我们可以使用data.table.如果有一些模式,如结束字符是大写字母或要删除的“部分”列中的数字,我们可以使用sub来做这个并用作分组变量和’dates’并得到’stock’的总和.
library(data.table) setDT(theDF)[,.(stock = sum(stock)),.(dates,fruits = sub("([0-9]|[A-Z])$","",parts))] # dates fruits stock #1: 2016-06-19 Apple 46 #2: 2016-06-19 Banana 35 #3: 2016-06-19 StrawBerry 38 #4: 2016-06-19 Green Grape 12 #5: 2016-06-19 Blue Grape 21 #6: 2016-06-19 Big Kiwi 37 #7: 2016-06-19 Small Kiwi 14 #8: 2016-06-19 Middle Kiwi 7
library(dplyr) theDF %>% group_by(dates,fruits = sub('([0-9]|[A-Z])$','',parts)) %>% summarise(stock = sum(stock))
更新
如果没有模式并且仅基于手动识别’fruits’中的元素,则创建元素向量,使用%chin%获取’i’中的逻辑索引,赋值(:=)’parts’中的值对应到’我’到’水果’,然后通过’日期’,’水果’做组,并获得’股票’的总和.
setDT(theDF)[as.character(fruits) %chin% c("Grape",fruits := parts][,fruits)]
数据
theDF <- structure(list(dates = structure(c(16971,16971,16971),class = "Date"),fruits = structure(c(1L,1L,2L,5L,3L,4L,4L),.Label = c("Apple","StrawBerry"),class = "factor"),parts = structure(c(1L,6L,7L,8L,14L,15L,16L,11L,10L,9L,13L,12L),.Label = c("Apple1","Apple3","Apple4","Banana1","Banana2","Middle Kiwi","StrawBerryX","StrawBerryY","StrawBerryZ" ),stock = c(8,19,15,4,6,18,1,10,9,16,11,2,12,13,5,3,17,14,20,7)),.Names = c("dates","fruits","parts","stock"),row.names = c(NA,-20L),class = "data.frame")
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。