微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

每个折叠的交叉验证预测

如何解决每个折叠的交叉验证预测

在糖尿病数据集的回归问题中,我对使用 k 折交叉验证来获得对每个折的测试集的预测感兴趣,并从中计算平均交叉验证分数。我知道我可以从 cross_val_score 获得平均交叉验证分数,但由于我需要每个折叠的预测(绘制它们),我想知道以下方法是否正确。

# Imports
from sklearn import datasets,linear_model
from sklearn.model_selection import cross_val_predict,KFold,cross_val_score
from sklearn.metrics import mean_squared_error as mse
import numpy as np 

# Load dataset
diabetes = datasets.load_diabetes()

# Get features and target
X = diabetes.data[:150]
y = diabetes.target[:150]

# Regressor
lasso = linear_model.Lasso()

# Define CV scheme
n_splits = 3
kfold_cv = KFold(n_splits = n_splits,shuffle = False)

# Cross validation predictions on the test set (grouped)
y_pred = cross_val_predict(lasso,X,y,cv = kfold_cv)

# Split target and predictions into the folds considered by KFold
y_folds = np.array_split(y,n_splits)
y_pred_folds = np.array_split(y_pred,n_splits)

# Compute the MSE of each fold using predictions and target
mse_folds = []
for pred,target in zip(y_folds,y_pred_folds):
    mse_folds.append(mse(pred,target))

# Cross validation MSE from folds
print('CV MSE (manual): ',np.mean(mse_folds))

# Cross validation MSE directly from cross_val_score
print('CV MSE (automatic): ',np.mean(cross_val_score(lasso,cv = kfold_cv,scoring = 'neg_mean_squared_error')))

打印出来的两个分数是一样的,4441.21(显然cross_val_score计算出来的那个是负数)。这让我认为该方法是正确的,但由于 cross_val_predict 的使用几乎没有争议,所以我需要一些反馈。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。