用交叉验证训练 8 个不同的分类器，对同一个文件给出相同的准确度？

如何解决用交叉验证训练 8 个不同的分类器，对同一个文件给出相同的准确度？

我有下面的脚本，它应该使用交叉验证来训练不同的模型，然后计算平均准确度，以便我可以使用最佳模型进行分类任务。但我对每个分类器都得到了相同的结果。

结果如下：

---Filename in processed................ corpusAmazon_train
etiquette  : [0 1]
Embeddings bert model used.................... :  sm
Model name: Model_LSVC_ovr
------------cross val predict used---------------- 

accuracy with cross_val_predict : 0.6582974014576258
corpusAmazon_train file terminated--- 

---------------cross val score used ----------------------- 

[0.66348722 0.66234262 0.63334605 0.66959176 0.66081648 0.6463182
 0.66730256 0.65572519 0.65648855 0.66755725]
0.66 accuracy with a standard deviation of 0.01 

Model name: Model_G_NB
------------cross val predict used---------------- 

accuracy with cross_val_predict : 0.6582974014576258
corpusAmazon_train file terminated--- 

---------------cross val score used ----------------------- 

[0.66348722 0.66234262 0.63334605 0.66959176 0.66081648 0.6463182
 0.66730256 0.65572519 0.65648855 0.66755725]
0.66 accuracy with a standard deviation of 0.01 

Model name: Model_LR
------------cross val predict used---------------- 

accuracy with cross_val_predict : 0.6582974014576258
corpusAmazon_train file terminated--- 

---------------cross val score used ----------------------- 

[0.66348722 0.66234262 0.63334605 0.66959176 0.66081648 0.6463182
 0.66730256 0.65572519 0.65648855 0.66755725]
0.66 accuracy with a standard deviation of 0.01

使用 cross_validation 的代码行：

models_list = {'Model_LSVC_ovr': model1,'Model_G_NB': model2,'Model_LR': model3,'Model_RF': model4,'Model_KN': model5,'Model_MLP': model6,'Model_LDA': model7,'Model_XGB': model8}

# cross_validation
def cross_validation(features,ylabels,models_list,n,lge_model):

    cv_splitter = KFold(n_splits=10,shuffle=True,random_state=42)
    features,s = get_flaubert_layer(features,lge_model)
    for model_name,model in models_list.items():
        print("Model name: {}".format(model_name))
        print("------------cross val predict used----------------","\n")
        y_pred = cross_val_predict(model,features,cv=cv_splitter,verbose=1)
        accuracy_score_predict = accuracy_score(ylabels,y_pred)
        print("accuracy with cross_val_predict :",accuracy_score_predict)

        print("---------------cross val score used -----------------------","\n")
        scores = cross_val_score(model,scoring='accuracy',cv=cv_splitter)

        print("%0.2f accuracy with a standard deviation of %0.2f" % (accuracy_score_mean,accuracy_score_std),"\n")

即使在使用 cross_val_score 时，模型的准确度也相同。任何想法，也许我在 cross_validation 函数中使用了 random_state ？

模型定义代码：

def classifiers_b():

    model1 = LinearSVC()
    model2 = GaussianNB()  # MultinomialNB() X cannot be a non-negative
    model3 = LogisticRegression()
    model4 = RandomForestClassifier()
    model5 = KNeighborsClassifier()
    model6 = MLPClassifier(hidden_layer_sizes=(50,100,50),max_iter=500,activation='relu',solver='adam',random_state=1)
    model8 = XGBClassifier(eval_metric="logloss")
    model7 = LineardiscriminantAnalysis()

    #models_list = {'Model_LSVC_ovr': model1,'Model_XGB': model8}

解决方法

我建议为每个模型使用一个管道。看起来您在每次迭代中都在同一模型上执行 CV。您可以查看文档 here 以获取有关如何使用它们的更多信息。然后对每个模型管道执行 CV。

用交叉验证训练 8 个不同的分类器，对同一个文件给出相同的准确度？

如何解决用交叉验证训练 8 个不同的分类器，对同一个文件给出相同的准确度？

解决方法

相关推荐