微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

每次我重新启动内核时,随机森林特征重要性都会返回不同的特征

如何解决每次我重新启动内核时,随机森林特征重要性都会返回不同的特征

您好,我正在使用 Spyder 运行以下代码行。

# importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import SelectFromModel

# Importing the dataset
GRR = pd.read_csv('StackO_Qstn-GRR.csv')
GRR

# Create a list of feature names
feat_labels = ['ipi','company_pos','country','sales','HOLDING CMPNY LEVEL(% of prvlg debt)','Subsidiary LEVEL(% of prvlg debt)','% of debt cushion (Holding Level)','% of debt cushion (Subsidry Level)','totaldebt/EBITDA at closing','Tangible assets','% of equity']
feat_labels

# assigining x and y
x = GRR.iloc[:,1:12].values
x
y = GRR.iloc[:,12:13].values
y

# Viewing data type
#x.dtypes

#Encoding categorical data - IPI,Company Pos,Country
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
x[:,0]= LabelEncoder().fit_transform(x[:,0])
x[:,1] = LabelEncoder().fit_transform(x[:,1])
x[:,2]= LabelEncoder().fit_transform(x[:,2])

# Creating test and train set
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.3)
#x_train
#x_test
#y_test
#y_train

# scaling the dataset
from sklearn.preprocessing import StandardScaler
x_train[:,4:] = StandardScaler().fit_transform(x_train[:,4:])
x_test[:,4:] = StandardScaler().fit_transform(x_test[:,4:])

#fitting Random Forest Regression to the dataset
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 1000,random_state=0,n_jobs=-1)
regressor.fit(x_train,y_train)

# print name and importance of each feature
for feature in zip(feat_labels,regressor.feature_importances_):
    print(feature)

标题为“#print name and important of each feature”的最后一行给出了以下特征重要性分数。

##Result [('ipi',0.08782535553719388)
#('company_pos',0.03322401324199033)
#('country',0.06447021300853917)
#('sales',0.1379991525843173)
#('HOLDING CMPNY LEVEL(% of prvlg debt)',0.06634810871800015)
#('Subsidiary LEVEL(% of prvlg debt)',0.05734671836669356)
#('% of debt cushion (Holding Level)',0.07796951473939422)
#('% of debt cushion (Subsidry Level)',0.09403285942720432)
#('totaldebt/EBITDA at closing',0.1255130846908077)
#('Tangible assets',0.1431045158254024)
#('% of equity',0.11216646386045684)]

当我重新启动内核并从头开始再次运行此代码时,我得到了以下功能评分:

##Result [('ipi',0.053276343860643866)
#('company_pos',0.021602932811886296)
#('country',0.10250002825771375)
#('sales',0.17694086852805885)
#('HOLDING CMPNY LEVEL(% of prvlg debt)',0.057531549153569464)
#('Subsidiary LEVEL(% of prvlg debt)',0.04789046979822429)
#('% of debt cushion (Holding Level)',0.07317383620302384)
#('% of debt cushion (Subsidry Level)',0.0899436819443599)
#('totaldebt/EBITDA at closing',0.1257549078973554)
#('Tangible assets',0.13849283791771863)
#('% of equity',0.11289254362744534)]

问题 - 为什么重新启动内核会产生不同的重要特性分数?当“regressor = RandomForestRegressor(n_estimators = 1000,n_jobs=-1)”中使用的参数保持不变时??

在此处输入图片说明

Image 1/4

Image 2/4

Image 3/4

Image 4/4

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。