如何解决Tidymodels + Spark
我正在尝试使用 Tidymodels 和 Spark 引擎开发一个简单的逻辑回归模型。当我指定 set_engine = "glm"
时,我的代码工作正常,但当我尝试将引擎设置为 spark
时失败。任何建议将不胜感激!
library(tidyverse)
library(sparklyr)
library(tidymodels)
train.df <- titanic::titanic_train
train.df <- train.df %>%
mutate(Survived = factor(ifelse(Survived == 1,'Y','N')),Sex = factor(Sex),Pclass = factor(Pclass))
skimr::skim(train.df)
# Just working with Spark locally.
sc <- spark_connect(master = 'local',version = '3.1')
train.spark.df <- copy_to(sc,train.df)
logistic.regression.recipe <-
recipe(Survived ~ PassengerId + Sex + Age + Pclass,data = train.spark.df) %>%
update_role(PassengerId,new_role = 'ID') %>%
step_dummy(all_nominal(),-all_outcomes()) %>%
step_impute_linear(all_predictors())
logistic.regression.recipe
summary(logistic.regression.recipe)
logistic.regression.model <-
logistic_reg() %>%
set_mode("classification") %>%
set_engine("spark")
logistic.regression.model
logistic.regression.workflow <-
workflow() %>%
add_recipe(logistic.regression.recipe) %>%
add_model(logistic.regression.model)
logistic.regression.workflow
logistic.regression.final.model <-
logistic.regression.workflow %>%
fit(data = train.spark.df)
logistic.regression.final.model
Error: `data` must be a data.frame or a matrix,not a tbl_spark.
感谢阅读!
解决方法
因此,tidymodels 中对 Spark 的支持甚至没有涵盖建模分析的所有部分。 parsnip 中对建模 的支持很好,但我们没有对 recipes 中的特征工程或将这些构建块放在 workflows 中的全功能支持.例如,您可以只拟合逻辑回归模型:
library(tidyverse)
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
library(sparklyr)
#>
#> Attaching package: 'sparklyr'
#> The following object is masked from 'package:purrr':
#>
#> invoke
#> The following object is masked from 'package:stats':
#>
#> filter
sc <- spark_connect(master = "local")
train_sp <- copy_to(sc,titanic::titanic_train,overwrite = TRUE)
log_spec <- logistic_reg() %>% set_engine("spark")
log_spec %>%
fit(Survived ~ Sex + Fare + Pclass,data = train_sp)
#> parsnip model object
#>
#> Fit time: 5.1s
#> Formula: Survived ~ Sex + Fare + Pclass
#>
#> Coefficients:
#> (Intercept) Sex_male Fare Pclass
#> 3.143731639 -2.630648858 0.001450218 -0.917173436
由 reprex package (v2.0.0) 于 2021 年 7 月 9 日创建
但是您不能使用开箱即用的配方和工作流程。您可能会考虑尝试 something like using spark_apply()
,但这在 tidymodels 与 Spark 集成的当前成熟阶段可能是一个挑战。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。