微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

R中许多模型的逐步选择

如何解决R中许多模型的逐步选择

我尝试根据 Wickham 和 Grolemund (https://r4ds.had.co.nz/) 所著的“R for Data Science”一书中的第 20 章“许多带有 purrr 和 broom 的模型”计算大量泊松回归的逐步选择。然而,使用这种方法,计算模型缺乏有意义的名称,这在应用阶跃函数时似乎是有问题的。下面是一个可重现的示例。

library(tidyverse)
library(gapminder)

# Create tibble with variables and models for each corresponding country
by_country <- gapminder %>%
  group_by(country,continent) %>%
  nest()

# model-fitting function:
country_model <- function(df) {
  lm(lifeExp ~ year + pop + gdpPercap,data = df)
}

# apply country_model to each element:
by_country <- by_country %>%
  mutate(model = map(data,country_model))

# Stepwise selection
step_select <- mutate(by_country,step_sel = map(model,step))
# This fails

# It works fine if applied to individually calculated models (in this case for Germany),but not if you extract the same model from the tibble
# Individually calculated model
ge <- filter(gapminder,country == "Germany")
ge_mod <- lm(lifeExp ~ year + pop + gdpPercap,data = ge)
step(ge_mod)
# Extract the same model from the tibble
tb_ge_mod <- by_country[[4]][[48]]
step(tb_ge_mod)

# The only difference I can spot between these models is the name of data,which is generic in the tibble,but specific in the individually calculated model:
ge_mod[["call"]] 
tb_ge_mod[["call"]]
ge_mod[["call"]][["data"]]
tb_ge_mod[["call"]][["data"]]

# If you replace these names,it works.
tb_ge_mod[["call"]][["data"]] <- ge
tb_ge_mod[["call"]] <- lm(formula = lifeExp ~ year + pop + gdpPercap,data = ge)
step(tb_ge_mod)

但是,我没有找到自动调整名称方法(无论如何,这只是一种解决方法)。那么,有没有办法在这里应用逐步选择?

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。