sklearn 使用逻辑回归时的重要特征错误

如何解决sklearn 使用逻辑回归时的重要特征错误

以下代码使用随机森林模型工作，为我提供显示特征重要性的图表：

from sklearn.feature_selection import SelectFromModel
import matplotlib

clf = RandomForestClassifier()
clf = clf.fit(X_train,y_train)
clf.feature_importances_  
model = SelectFromModel(clf,prefit=True)
test_X_new = model.transform(X_test)

matplotlib.rc('figure',figsize=[5,5])
plt.style.use('ggplot')

feat_importances = pd.Series(clf.feature_importances_,index=X_test.columns)
feat_importances.nlargest(20).plot(kind='barh',title = 'Feature Importance')

但是我需要对逻辑回归模型做同样的事情。以下代码产生错误：

from sklearn.feature_selection import SelectFromModel
import matplotlib

clf = LogisticRegression()
clf = clf.fit(X_train,title = 'Feature Importance')

我明白

AttributeError: 'LogisticRegression' object has no attribute 'feature_importances_'

有人可以帮助我哪里出错了吗？

解决方法

逻辑回归没有排名特征的属性。如果您想可视化可用于显示特征重要性的系数。基本上，我们假设更大的系数对模型的贡献更大，但必须确保特征具有相同的尺度，否则这个假设是不正确的。请注意，某些系数可能为负，因此如果您想像在绘图中那样对它们进行排序，您的绘图看起来会有所不同，您可以将它们转换为正值。

拟合逻辑回归模型后，您可以可视化您的系数：

logistic_model.fit(X,Y)
importance = logistic_model.coef_[0]
#importance is a list so you can plot it. 
feat_importances = pd.Series(importance)
feat_importances.nlargest(20).plot(kind='barh',title = 'Feature Importance')

输出将是这样的：

注意：您可以对您的特征进行一些统计测试或相关性分析，以了解对模型的贡献。这取决于您应该使用哪种测试的数据类型（分类、数字等）。