为什么 sklearn pipeline.set_params() 不起作用？

如何解决为什么 sklearn pipeline.set_params() 不起作用？

我有以下管道：

from sklearn.pipeline import Pipeline
import lightgbm as lgb


steps_lgb = [('lgb',lgb.LGBMClassifier())]
 
# Create the pipeline: composed of preprocessing steps and estimators
pipe = Pipeline(steps_lgb)

现在我想使用以下命令设置分类器的参数：

best_params = {'boosting_type': 'dart','colsample_bytree': 0.7332216010898506,'feature_fraction': 0.922329814019706,'learning_rate': 0.046566283755421566,'max_depth': 7,'metric': 'auc','min_data_in_leaf': 210,'num_leaves': 61,'objective': 'binary','reg_lambda': 0.5185517505019249,'subsample': 0.5026815575448366}

pipe.set_params(**best_params)

然而这会引发一个错误：

ValueError: Invalid parameter boosting_type for estimator Pipeline(steps=[('estimator',LGBMClassifier())]). Check the list of available parameters with `estimator.get_params().keys()`.

boosting_type 绝对是 lightgbm 框架的核心参数，但如果删除（从 best_params 中）其他参数会导致 valueError 被引发。

所以，我想要的是在创建管道后设置分类器的参数。

解决方法

使用管道时，您需要根据它们引用的管道的哪一部分为参数加上相应组件的名称（此处为 lgb）后跟双 uncerscore (lgb__) 的前缀;此处您的管道仅包含一个元素这一事实并没有改变这一要求。

因此，您的参数应该类似于（仅显示前 2 个元素）：

best_params = {'lgb__boosting_type': 'dart','lgb__colsample_bytree': 0.7332216010898506
              }

如果您遵循错误消息中明确提供的建议，您就会自己发现这一点：

Check the list of available parameters with `estimator.get_params().keys()`.

就你而言，

pipe.get_params().keys()

给予

dict_keys(['memory','steps','verbose','lgb','lgb__boosting_type','lgb__class_weight','lgb__colsample_bytree','lgb__importance_type','lgb__learning_rate','lgb__max_depth','lgb__min_child_samples','lgb__min_child_weight','lgb__min_split_gain','lgb__n_estimators','lgb__n_jobs','lgb__num_leaves','lgb__objective','lgb__random_state','lgb__reg_alpha','lgb__reg_lambda','lgb__silent','lgb__subsample','lgb__subsample_for_bin','lgb__subsample_freq'])