如何解决汇总并计数r中的相同行
我有一个像这样的data.frame(但是具有更多的列和行):
我想对所有列都相同的行求和,并创建最后一列“ count”,以便获得如下所示的结果:
谢谢您的帮助!
数据:
structure(list(Gene = c("A","A","B","C"),`Cell 1` = c(2,2,3,4),`Cell 2` = c(2,`Cell 3` = c(2,4)),row.names = c(NA,-4L),class = c("tbl_df","tbl","data.frame"))
>
解决方法
玩具示例,不是最优雅的方式
mtcars2=mtcars[c(1,1,2,3),]
do.call(rbind,by(
mtcars2,mtcars2,function(x){
data.frame(unique(x),"Count"=nrow(x))
})
)
mpg cyl disp hp drat wt qsec vs am gear carb Count
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 1
编辑: OP提供的数据
df=structure(list(Gene = c("A","A","B","C"),`Cell 1` = c(2,3,4),`Cell 2` = c(2,`Cell 3` = c(2,4)),row.names = c(NA,-4L),class = c("tbl_df","tbl","data.frame"))
do.call(rbind,by(df,df,function(x){
data.frame(unique(x),"Count"=nrow(x))
}
)
)
Gene Cell.1 Cell.2 Cell.3 Count
1 A 2 2 2 2
3 B 3 3 3 1
4 C 4 4 4 1
,
有几种方法可以执行此操作,但是这里有一个dplyr
解决方案,它依赖于所有要添加到Count
列中的列都是相同的。此操作将所有列分组,并在Count
列中添加每个“组”的长度(即n()
),然后使用distinct()
library(dplyr)
df1 %>%
group_by(across(everything())) %>%
mutate(Count = n()) %>%
ungroup() %>%
distinct()
# A tibble: 3 x 5
Gene Cell_1 Cell_2 Cell_3 Count
<chr> <dbl> <dbl> <dbl> <int>
1 A 2 2 2 2
2 B 3 3 3 1
3 C 4 4 4 1
或者,使用相同逻辑的data.table
解决方案:
library(data.table)
setDT(df1)
df1[,Count := .N,by = names(df1)]
unique(df1)
或者,一个基本的解决方案是用索引数据代替分组。在全帧范围内重复项:
df1$Count = duplicated(df1) + 1
df1[!duplicated(df1[names(df1) != "Count"],fromLast = TRUE),]
数据:
df1 = data.frame(Gene = c("A","C"))
df1[paste0("Cell_",1:3)] = c(2,2:4)
,
用SQL术语,您可以对按所有列分组的行进行计数,并将结果与初始data.frame结合起来。
我建议使用data.table包。
df=data.frame(a=c(1,4,b=c("a","a","b","e","f"))
library(data.table)
# convert df to data.table
df=as.data.table(df)
# aggregate df grouping by all columns
clmns=colnames(df)
row_multiplicity=df[,.N,by=clmns]
#join/merge with initial data.frame
new_df=merge(df,row_multiplicity)
,
使用dplyr软件包:
> library(dplyr)
> df %>% add_count(Gene,name = 'Count') %>% group_by(Gene) %>% filter(row_number() == 1)
# A tibble: 3 x 5
# Groups: Gene [3]
Gene `Cell 1` `Cell 2` `Cell 3` Count
<chr> <dbl> <dbl> <dbl> <int>
1 A 2 2 2 2
2 B 3 3 3 1
3 C 4 4 4 1
>
使用的数据:
structure(list(Gene = c("A","data.frame"))
>
,
检查此代码
df=structure(list(Gene = c("A","data.frame"))
df1=unique(df) #Store Unique Rows
df2=df[duplicated(df),] #Store Duplicated Rows
df3=df1 #Copy unique Dataframe into new dataframe
df3['count']=1 #Create and assign with default value
for(i in 1:nrow(df2)) #Duplicated Rows
{
for(j in 1:nrow(df1)) #Unique Rows
{
if (all(df1[j,] == df2[i,])) #Check all columns data is same
{
df3[j,'count'] <- df3[j,'count']+1 # Increase count to one
}
}
}
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。