汇总并计数r中的相同行

如何解决汇总并计数r中的相同行

我有一个像这样的data.frame（但是具有更多的列和行）：

我想对所有列都相同的行求和，并创建最后一列“ count”，以便获得如下所示的结果：

谢谢您的帮助！

数据：

structure(list(Gene = c("A","A","B","C"),`Cell 1` = c(2,2,3,4),`Cell 2` = c(2,`Cell 3` = c(2,4)),row.names = c(NA,-4L),class = c("tbl_df","tbl","data.frame"))
>

解决方法

玩具示例，不是最优雅的方式

mtcars2=mtcars[c(1,1,2,3),]

do.call(rbind,by(
    mtcars2,mtcars2,function(x){
      data.frame(unique(x),"Count"=nrow(x))
    })
)

               mpg cyl disp  hp drat    wt  qsec vs am gear carb Count
Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1     1
Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4     2
Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4     1

编辑： OP提供的数据

df=structure(list(Gene = c("A","A","B","C"),`Cell 1` = c(2,3,4),`Cell 2` = c(2,`Cell 3` = c(2,4)),row.names = c(NA,-4L),class = c("tbl_df","tbl","data.frame"))
do.call(rbind,by(df,df,function(x){
       data.frame(unique(x),"Count"=nrow(x))
     }
  )
)

  Gene Cell.1 Cell.2 Cell.3 Count
1    A      2      2      2     2
3    B      3      3      3     1
4    C      4      4      4     1

有几种方法可以执行此操作，但是这里有一个dplyr解决方案，它依赖于所有要添加到Count列中的列都是相同的。此操作将所有列分组，并在Count列中添加每个“组”的长度（即n()），然后使用distinct()

取消分组并删除重复的行

library(dplyr)

df1 %>%
  group_by(across(everything())) %>%
  mutate(Count = n()) %>%
  ungroup() %>%
  distinct()
# A tibble: 3 x 5
  Gene  Cell_1 Cell_2 Cell_3 Count
  <chr>  <dbl>  <dbl>  <dbl> <int>
1 A          2      2      2     2
2 B          3      3      3     1
3 C          4      4      4     1

或者，使用相同逻辑的data.table解决方案：

library(data.table)

setDT(df1)
df1[,Count := .N,by = names(df1)]
unique(df1)

或者，一个基本的解决方案是用索引数据代替分组。在全帧范围内重复项：

df1$Count = duplicated(df1) + 1
df1[!duplicated(df1[names(df1) != "Count"],fromLast = TRUE),]

数据：

df1 = data.frame(Gene = c("A","C"))
df1[paste0("Cell_",1:3)] = c(2,2:4)

用SQL术语，您可以对按所有列分组的行进行计数，并将结果与初始data.frame结合起来。

我建议使用data.table包。

df=data.frame(a=c(1,4,b=c("a","a","b","e","f"))

library(data.table)

# convert df to data.table
df=as.data.table(df)

# aggregate df grouping by all columns
clmns=colnames(df)
row_multiplicity=df[,.N,by=clmns]

#join/merge with initial data.frame
new_df=merge(df,row_multiplicity)

使用dplyr软件包：

> library(dplyr)
> df %>% add_count(Gene,name = 'Count') %>% group_by(Gene) %>% filter(row_number() == 1)
# A tibble: 3 x 5
# Groups:   Gene [3]
  Gene  `Cell 1` `Cell 2` `Cell 3` Count
  <chr>    <dbl>    <dbl>    <dbl> <int>
1 A            2        2        2     2
2 B            3        3        3     1
3 C            4        4        4     1
>

使用的数据：

structure(list(Gene = c("A","data.frame"))
>

检查此代码

df=structure(list(Gene = c("A","data.frame"))

df1=unique(df)    #Store Unique Rows    
df2=df[duplicated(df),]    #Store Duplicated Rows

df3=df1    #Copy unique Dataframe into new dataframe    
df3['count']=1    #Create and assign with default value    


for(i in 1:nrow(df2))    #Duplicated Rows
  
{
  
  for(j in 1:nrow(df1))    #Unique Rows
    
  {
    
    if (all(df1[j,] == df2[i,]))    #Check all columns data is same
      
    {
      
      df3[j,'count'] <- df3[j,'count']+1   # Increase count to one
      
    }
    
    
  }
  
}

汇总并计数r中的相同行

如何解决汇总并计数r中的相同行

解决方法

相关推荐