欠采样无法提高二进制分类的精度

如何解决欠采样无法提高二进制分类的精度

我的代码段-

X,y = make_classification(n_samples=20000,n_features=8,n_informative=6,n_classes=2,weights=[150/151,1/151],n_redundant=2,n_clusters_per_class=3,class_sep=1.5,random_state=1729)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)
LogisticRegression().fit(X_train,y_train)
training_score = cross_val_score(LogisticRegression(),X_train,cv=5)

log_reg_params = {"penalty": ['l1','l2'],'C': [0.001,0.01,0.1,1,10,100,1000]}

grid_log_reg = gridsearchcv(LogisticRegression(),log_reg_params)
grid_log_reg.fit(X_train,y_train)

log_reg = grid_log_reg.best_estimator_

log_reg_score = cross_val_score(log_reg,cv=5)

sss = StratifiedKFold(n_splits=5,random_state=None,shuffle=False)

for train_index,test_index in sss.split(X,y):
    print("Train:",train_index,"Test:",test_index)
    Xtrain,Xtest = X[train_index],X[test_index]
    ytrain,ytest = y[train_index],y[test_index]

for train,test in sss.split(Xtrain,ytrain):
    pipeline = imbalanced_make_pipeline(RandomUnderSampler(sampling_strategy='majority',random_state=42),log_reg) # SMOTE happens during Cross Validation not before..
    model = pipeline.fit(Xtrain[train],ytrain[train])
    prediction = model.predict(Xtrain[test])

我已经使用sci-kit make_classification方法来创建各种级别的不平衡数据集。然后，我将应用重采样技术以查看其有效性。根据我所做的研究，应用欠采样时，精度总是会以召回为代价而提高，但在我的情况下却没有发生。欠采样的性能与不重新采样非常相似。我想知道我是否在代码中犯了一些错误或执行欠采样的原因。

感谢您的帮助！