如何解决未设置线性回归特征
我正在尝试编写一些线性回归来分析我的数据。所以我使用 Scala,我基本上是这样做的
import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.ml.regression.LinearRegressionModel
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.{Pipeline,PipelineModel}
val training_data_finalised = training.drop("COUNTRY_REGION","PROVINCE_STATE","DATE")
val featuresArray = Array("Active","Confirmed","Deaths","Recovered","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC","AVG_PARKS_CHANGE_PERC","AVG_RESIDENTIAL_CHANGE_PERC","AVG_RETAIL_AND_RECREATION_CHANGE_PERC","AVG_TRANSIT_STATIONS_CHANGE_PERC","AVG_WORKPLACES_CHANGE_PERC","Active_1_day","Active_2_day","Active_7_day","Active_14_day","Confirmed_1_day","Confirmed_2_day","Confirmed_7_day","Confirmed_14_day","Deaths_1_day","Deaths_2_day","Deaths_7_day","Deaths_14_day","Recovered_1_day","Recovered_2_day","Recovered_7_day","Recovered_14_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_1_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_2_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_7_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_14_day","AVG_PARKS_CHANGE_PERC_1_day","AVG_PARKS_CHANGE_PERC_2_day","AVG_PARKS_CHANGE_PERC_7_day","AVG_PARKS_CHANGE_PERC_14_day","AVG_RESIDENTIAL_CHANGE_PERC_1_day","AVG_RESIDENTIAL_CHANGE_PERC_2_day","AVG_RESIDENTIAL_CHANGE_PERC_7_day","AVG_RESIDENTIAL_CHANGE_PERC_14_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_1_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_2_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_7_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_14_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_1_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_2_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_7_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_14_day","AVG_WORKPLACES_CHANGE_PERC_1_day","AVG_WORKPLACES_CHANGE_PERC_2_day","AVG_WORKPLACES_CHANGE_PERC_7_day","AVG_WORKPLACES_CHANGE_PERC_14_day")
val assembler = new VectorAssembler()
.setInputCols(featuresArray)
.setoutputCol("features")
val lr = new LinearRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
.setFeaturesCol("features") // setting features column
.setLabelCol("Deaths") // setting label column
val pipeline = new Pipeline().setStages(Array(assembler,lr))
//fitting the model
val lrModel = pipeline.fit(training_data_finalised.na.fill(0))
但是我如何获得系数值?
有什么建议吗?
补充一点,我尝试根据 spark 文档 (https://spark.apache.org/docs/latest/ml-classification-regression.html) 执行此操作
val lr = new LinearRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
// Fit the model
val lrModel = lr.fit(training)
// Print the coefficients and intercept for linear regression
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")
但出于某种原因,这给了我一个
IllegalArgumentException: features does not exist. Available: Active,Confirmed,Deaths
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。