RandomizedSearchCV 大大降低了准确率

如何解决RandomizedSearchCV 大大降低了准确率

当我使用 RandomForestRegressor 进行检查时

from sklearn.ensemble import RandomForestRegressor
r=RandomForestRegressor()
r.fit(X_train,y_train)
r.score(X_test,y_test)

我得到 0.9746156332220394

但是当我使用 RandomizedSearchCV 时

from sklearn.ensemble import RandomForestRegressor
n_estimators = [int(x) for x in np.linspace(start = 100,stop =1200,num = 12)]
max_features = ['auto','sqrt']
max_depth = [int(x) for x in np.linspace(5,30,num = 6)]
min_samples_split = [2,5,10,15,100]
min_samples_leaf = [1,2,10]

from sklearn.model_selection import RandomizedSearchCV
random_grid = {'n_estimators': n_estimators,'max_features': max_features,'max_depth': max_depth,'min_samples_split': min_samples_split,'min_samples_leaf': min_samples_leaf}

rf=RandomForestRegressor()

rf_random = RandomizedSearchCV(estimator = rf,param_distributions = random_grid,scoring='neg_mean_squared_error',n_iter = 10,cv = 5,verbose=2,random_state=42,n_jobs = 1)
rf_random.fit(X_train,y_train)
rf_random.score(X_test,y_test)

我收到 -14881793274.345808

那么为什么准确率得分表现如此糟糕

解决方法

您的方法和假设存在一些问题：

问题 1：您根本没有测量准确度

您手头有一个回归任务。因此，准确度作为正确分类的指标不能在这里应用。事实上，您并没有在两种情况下测量准确度。

问题 2：您正在比较不同的指标

RandomForestRegressor 的 score() 函数执行以下操作：

返回预测的决定系数R²。

虽然 RandomizedSearchCV 的 score() 函数是这样做的：

这里使用由 scoring 定义的分数，否则使用 best_estimator_.score 方法。

因此在第一种情况下，将测量拟合的 RandomForestRegressor 的 R²。在第二种情况下，将返回最佳找到估计量的负均方误差 (MSE)，因为您指定了 neg_mean_squared_error 作为评分指标。

结论：您的比较无效

如果您想比较任何内容，请在您的 scoring=r2 中指定 RandomizedSeachCV 以比较 R² 的性能。或者使用 mean_squared_error 计算拟合 RandomForestRegressor 的 MSE（但请记住，RandomizedSearchCV 将返回否定的 MSE）。

重要提示

您还应该注意，无法保证 RandomizedSearchCV 找到的最佳估计器确实会在您的测试集上表现更好，因为超参数仅在训练集上进行交叉验证，无法在测试集考虑在内。

RandomizedSearchCV 大大降低了准确率

如何解决RandomizedSearchCV 大大降低了准确率

解决方法

重要提示

相关推荐