如何解决具有 GridSearch 和递归特征消除的套索回归模型以获得最佳参数集?
我正在处理一个连续数的回归任务,我尝试了 GridSearch 的 lasso 回归模型
这是我得到的
alpha = 0.000011 最高 r^2score = 0.80 代码拟合模型
for ind,i in enumerate(lambdas):
reg = Lasso(alpha = i)
reg.fit(normalized_x_train,y_train)
results = cross_val_score(reg,normalized_x_train,y_train,cv=4,scoring="r2")
train_r_squared[ind] = reg.score(normalized_x_train,y_train)
test_r_squared[ind] = reg.score(normalized_x_test,y_test)
df_lam = pd.DataFrame(test_r_squared*100,columns=['R_squared'])
df_lam['lambda'] = (lambdas)
best_lambda_cv = df_lam.loc[df_lam['R_squared'].idxmax()][1]
model_cv = Lasso(alpha=best_lambda_cv,max_iter=50000,fit_intercept=True,normalize=False,tol=0.0001,copy_X=True,positive=False,random_state=None,selection='cyclic')
model_cv.fit(normalized_x_train,y_train)
但是它们在回归模型上显示了许多特征,我想找到可能会增加或(有点)减少 r^2 的最佳特征集
所以我尝试了递归特征消除,代码如下
# get a list of models to evaluate
def get_models():
models = dict()
for i in range(2,41):
rfe = RFE(Lasso(alpha=0.00001663157894736842,selection='cyclic'),n_features_to_select=i)
model = Lasso()
model.fit(X_train,y_train)
models[str(i)] = Pipeline(steps=[('s',rfe),('m',model)])
return models
# evaluate a given model using cross-validation
def evaluate_model(model,X,y):
scores = cross_val_score(model,y,scoring='r2',n_jobs=-1)
return scores
# get the models to evaluate
models = get_models()
# evaluate the models and store results
results,names = list(),list()
for name,model in models.items():
scores = evaluate_model(model,X_train,y_train)
results.append(scores)
names.append(name)
print('>%s %.3f' % (name,mean(scores)))
# plot model performance for comparison
pyplot.boxplot(results,labels=names,showmeans=True)
pyplot.show()
我不确定我是否做对了,我的问题是“他们已经运行了 10 多分钟并且还在计数”
那么我的代码是否正确?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。