使用 dplyr 以长格式对数据集中的因素/汇总因素进行描述性统计

如何解决使用 dplyr 以长格式对数据集中的因素/汇总因素进行描述性统计

我有重复测量的数据，目前为长格式。

我想要做的只是获得一些汇总统计数据，例如每个时间点的计数和百分比。

示例数据：

questiondata <- structure(list(id = c(2,2,6,9,22,23,25,30,31,33,34,34),time = structure(c(1L,2L,1L,2L),.Label = c("time1","time2"),class = "factor"),age = c(65,69.17,76.75,81.05,58.64,62.71,59.37,63.56,58,61.69,55.78,59.95,59.3,63.36,60.45,64.39,56.3,60.08,59.53,63.84),sex = structure(c(1L,.Label = c("men","women"),hypert_drug = structure(c(1L,1L),.Label = c("no","yes"),class = "factor")),row.names = c(NA,-20L),class = c("tbl_df","tbl","data.frame"))

对应于以下tibble：

# A tibble: 20 x 5
      id time    age sex   hypert_drug
   <dbl> <fct> <dbl> <fct> <fct>      
 1     2 time1  65   men   no         
 2     2 time2  69.2 men   yes        
 3     6 time1  76.8 women yes        
 4     6 time2  81.0 women yes        
 5     9 time1  58.6 men   no         
 6     9 time2  62.7 men   no         
 7    22 time1  59.4 men   no         
 8    22 time2  63.6 men   no         
 9    23 time1  58   women no         
10    23 time2  61.7 women no         
11    25 time1  55.8 men   no         
12    25 time2  60.0 men   no         
13    30 time1  59.3 women no         
14    30 time2  63.4 women yes        
15    31 time1  60.4 men   yes        
16    31 time2  64.4 men   yes        
17    33 time1  56.3 men   no         
18    33 time2  60.1 men   no         
19    34 time1  59.5 women no         
20    34 time2  63.8 women no

我尝试了以下方法来简单统计男性和女性：

questiondata %>% 
  group_by(time) %>% 
  summarise(n_sex=n_distinct(sex))

但这给出了：

# A tibble: 2 x 2
  time  n_sex
* <fct> <int>
1 time1     2
2 time2     2

然后我尝试了

questiondata %>% 
  group_by(time) %>% 
  mutate(n_sex=count(sex))

出现错误：

Error: Problem with `mutate()` input `n_sex`.
x no applicable method for 'count' applied to an object of class "factor"
i Input `n_sex` is `count(sex)`.
i The error occurred in group 1: time = "time1".
Run `rlang::last_error()` to see where the error occurred.

有什么帮助吗？谢谢！

解决方法

这使用时间和性别作为分组变量，n 列代表每个时间和性别组合的观察次数。

library(dplyr)

 questiondata %>% 
    group_by(time,sex) %>% 
    summarize(n=n())

`summarise()` has grouped output by 'time'. You can override using the `.groups` argument.
# A tibble: 4 x 3
# Groups:   time [2]
  time  sex       n
  <fct> <fct> <int>
1 time1 men       6
2 time1 women     4
3 time2 men       6
4 time2 women     4

questiondata %>%group_by(time,sex)%>% summarise(count = n(),.groups = "keep")
# A tibble: 4 x 3
# Groups:   time,sex [4]
  time  sex   count
  <fct> <fct> <int>
1 time1 men       6
2 time1 women     4
3 time2 men       6
4 time2 women     4