微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

为什么在回归问题中测试准确度是负值?

如何解决为什么在回归问题中测试准确度是负值?

我正在构建一个汽车价格预测模型,正如您在下面看到的,我已经训练和预测了 RandomForestRegressor,但测试准确度为负 我缩放了数字列,对分类变量进行了编码,还使用了特征选择来删除不太重要的列。

from sklearn.ensemble import RandomForestRegressor

rf_reg = RandomForestRegressor()
rf_reg.fit(X_train_new,y_train)

y_pred= rf_reg.predict(X_test_new)

print("Accuracy on Traing set: ",rf_reg.score(X_train_new,y_train))
print("Accuracy on Testing set: ",rf_reg.score(X_test_new,y_test))

训练集准确率:0.7982031146290948

测试集准确率:-125.03932214262775

我还尝试使用 RandomizedSearchCV 调整参数,这就是我得到的

from sklearn.model_selection import RandomizedSearchCV

# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 100,stop = 1200,num = 12)]
# Number of features to consider at every split
max_features = ['auto','sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(5,30,num = 6)]
# max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2,5,10,15,100]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1,2,10]

random_grid = {'n_estimators': n_estimators,'max_features': max_features,'max_depth': max_depth,'min_samples_split': min_samples_split,'min_samples_leaf': min_samples_leaf}

print(random_grid)
rf = RandomForestRegressor()
rf_random = RandomizedSearchCV(estimator = rf,param_distributions = random_grid,scoring='neg_mean_squared_error',n_iter = 10,cv = 5,verbose=2,random_state=42,n_jobs = 1)
rf_random.fit(X_train,y_train)
cv_rf = rf_random.best_estimator_
cv_rf.fit(X_train,y_train)
predictions = rf_random.predict(X_test)

机械工程师:12209.776836896259

MSE:722104139.6937662

均方根误差:26871.99545425993

解决方法

RandomForestRegressor 分数可以是负数,因为根据 sklearn documentation '最好的分数是 1.0,它可以是负数(因为模型可以任意差)。这意味着如果它是否定的,则表示性能非常差。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。