使用 dplyr 中的 mutate 对 R 中自定义函数中的分组数据使用数据框和列作为参数

如何解决使用 dplyr 中的 mutate 对 R 中自定义函数中的分组数据使用数据框和列作为参数

我在 R 中创建了一个自定义函数，用于准备绘图数据。我将一个数据框和两列（来自该数据框）传递给我的函数，然后使用 dplyr。该函数需要按分类变量（在本例中为age.group）进行分组，并且在数据仍然分组的同时，创建一个连续变量（to.be.binned）的分箱版本并获得该组的计数。我尝试使用 mutate 来完成两者。

此函数中的代码在函数之外工作，但我将数据帧和变量都传递给函数（使用大括号，因为它是 dplyr）。

我收到以下错误：

Error: Column `"age.group"` can't be modified because it's a grouping variable

我认为我的代码不会修改这个变量。我需要按组进行计数才能获得每个组的百分比，所以我不能先取消分组（这是对其他遇到同样错误的人的建议）。

任何建议将不胜感激！

正则表达式：

library(tidyverse)

simple.df <- data.frame(
  age.group = c("18-30","Under 18","Over 30","18-30","Under 18"),to.be.binned = c(98.415794,32.35116,73.29943,81.92012,99.61144,29.665798,97.652885,94.94358,77.798035,24.110243,99.110245,98.415794,99.80469,94.24913,79.665794,72.02691,96.332466,97.02691,92.860245,90.082466,99.55236,99.110245)
)



bin_by_group <- function(df,my.grouping,bin.this) {
  
  bw = 25
  
  new.df <- df %>%
    group_by({{my.grouping}}) %>%
    mutate(this.binned = cut(as.numeric({{bin.this}}),breaks = seq(0,100,bw),labels = seq(0 + bw,bw)-(bw/2)),n = n()) %>%
    group_by({{my.grouping}},this.binned) %>%
    summarise(p = n()/n[1]) %>%
    ungroup() %>%
    mutate(this.binned = as.numeric(as.character(this.binned)))
  
  return(new.df)
  
}


test.df <- bin_by_group(simple.df,"age.group","to.be.binned")
#> Warning in cut(as.numeric(~"to.be.binned"),labels =
#> seq(0 + : NAs introduced by coercion
#> Error: Column `"age.group"` can't be modified because it's a grouping variable

解决方法

只是我们需要传递不带引号的参数，因为 {{}} 期望它是不带引号的，因为 {{}} 等价于 enquo + !!。

bin_by_group(simple.df,age.group,to.be.binned)

-输出

# A tibble: 7 x 3
#  age.group this.binned     p
#  <chr>           <dbl> <dbl>
#1 18-30            87.5   1  
#2 Over 30          62.5   0.1
#3 Over 30          87.5   0.9
#4 Under 18         12.5   0.1
#5 Under 18         37.5   0.2
#6 Under 18         62.5   0.1
#7 Under 18         87.5   0.6

如果我们想传递带引号的或不带引号的，请使用 ensym 进行转换，然后计算 (!!)

bin_by_group <- function(df,my.grouping,bin.this) {
  
  bw = 25
  my.grouping <- ensym(my.grouping)
  bin.this <- ensym(bin.this)
  new.df <- df %>%
    group_by(!! my.grouping) %>%
    mutate(this.binned = cut(as.numeric(!!bin.this),breaks = seq(0,100,bw),labels = seq(0 + bw,bw)-(bw/2)),n = n()) %>%
    group_by(!! my.grouping,this.binned) %>%
    summarise(p = n()/n[1],.groups = 'drop') %>%
    ungroup() %>%
    mutate(this.binned = as.numeric(as.character(this.binned)))
  
  return(new.df)
  
}

-测试

 bin_by_group(simple.df,"age.group","to.be.binned")
# A tibble: 7 x 3
  age.group this.binned     p
  <chr>           <dbl> <dbl>
1 18-30            87.5   1  
2 Over 30          62.5   0.1
3 Over 30          87.5   0.9
4 Under 18         12.5   0.1
5 Under 18         37.5   0.2
6 Under 18         62.5   0.1
7 Under 18         87.5   0.6

bin_by_group(simple.df,to.be.binned)
# A tibble: 7 x 3
  age.group this.binned     p
  <chr>           <dbl> <dbl>
1 18-30            87.5   1  
2 Over 30          62.5   0.1
3 Over 30          87.5   0.9
4 Under 18         12.5   0.1
5 Under 18         37.5   0.2
6 Under 18         62.5   0.1
7 Under 18         87.5   0.6

使用 dplyr 中的 mutate 对 R 中自定义函数中的分组数据使用数据框和列作为参数

如何解决使用 dplyr 中的 mutate 对 R 中自定义函数中的分组数据使用数据框和列作为参数

解决方法

相关推荐