数字+分类+文本特征上的 LightGBM >> TypeError：未知类型的参数：boosting_type，得到：dict

如何解决数字+分类+文本特征上的 LightGBM >> TypeError：未知类型的参数：boosting_type，得到：dict

我正在尝试在由数值、分类和文本数据组成的数据集上训练 lightGBM 模型。但是，在训练阶段，我收到以下错误：

params = {
'num_class':5,'max_depth':8,'num_leaves':200,'learning_rate': 0.05,'n_estimators':500
}

clf = LGBMClassifier(params)
data_processor = ColumnTransformer([
    ('numerical_processing',numerical_processor,numerical_features),('categorical_processing',categorical_processor,categorical_features),('text_processing_0',text_processor_1,text_features[0]),('text_processing_1',text_features[1])
                                    ]) 
pipeline = Pipeline([
    ('data_processing',data_processor),('lgbm',clf)
                    ])
pipeline.fit(X_train,y_train)

错误是：

TypeError: UnkNown type of parameter:boosting_type,got:dict

这是我的管道：

我基本上有两个文本特征，都是某种形式的名称，我主要在其上进行词干提取。

任何指针将不胜感激。

解决方法

您错误地设置了分类器，这给了您错误，您可以在进入管道之前轻松尝试：

params = {
'num_class':5,'max_depth':8,'num_leaves':200,'learning_rate': 0.05,'n_estimators':500
}

clf = LGBMClassifier(params)
clf.fit(np.random.uniform(0,1,(50,2)),np.random.randint(0,5,50))

给你同样的错误：

TypeError: Unknown type of parameter:boosting_type,got:dict

您可以像这样设置分类器：

clf = LGBMClassifier(**params)

然后用一个例子，你可以看到它运行：

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler,OneHotEncoder
from sklearn.compose import ColumnTransformer

numerical_processor = StandardScaler()
categorical_processor = OneHotEncoder()
numerical_features = ['A']
categorical_features = ['B']

data_processor = ColumnTransformer([('numerical_processing',numerical_processor,numerical_features),('categorical_processing',categorical_processor,categorical_features)])

X_train = pd.DataFrame({'A':np.random.uniform(100),'B':np.random.choice(['j','k'],100)})

y_train = np.random.randint(0,100)

pipeline = Pipeline([('data_processing',data_processor),('lgbm',clf)])

pipeline.fit(X_train,y_train)