微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

TomekLinks 在 Imblearn 管道中调整了采样策略

如何解决TomekLinks 在 Imblearn 管道中调整了采样策略

我想通过使用 TomekLinks 方法和不同的多数类样本来调整我的不平衡数据集。我正在使用的代码如下所示,但是库不平衡学习说 dict 采样策略没有实现为 dict 格式:

ValueError: 'sampling_strategy' as a dict for cleaning methods is not supported. Please give    a list of the classes to be targeted by the sampling.

有人可以帮我找到正确的解决方案以继续吗?这是我的代码

import math
import numpy as np
import pandas as pd
import collections
from collections import Counter
from imblearn.pipeline import Pipeline as Pipeline
from sklearn.model_selection import RandomizedSearchCV
from imblearn.under_sampling import TomekLinks
from sklearn.neighbors import KNeighborsClassifier

def get_sampling_strategy_ratios(y):
    counter = Counter(y)
    max_ratio = counter[0] - counter[1]
    sampling_strategy = list(np.linspace(1000,max_ratio,10))
    ratios = []
    for new_majority_class in sampling_strategy:
        ratios.append({0: round(new_majority_class)})
    return ratios

model = KNeighborsClassifier()
imbalancer = TomekLinks()

grid = dict()
grid['normalizer'] = [MinMaxScaler(),MaxAbsScaler(),StandardScaler(),RobustScaler()]
sqrt_records = round(math.sqrt(X_train.shape[0]))
grid['model__n_neighbors'] = list(range(3,sqrt_records*2,2))
grid['model__p'] = [1,2]
grid['model__weights'] = ['uniform','distance']
grid['imbalancer__sampling_strategy'] = get_sampling_strategy_ratios(y_train)

steps = [('normalizer',None),('imbalancer',imbalancer),('model',model)]
pipeline = Pipeline(steps=steps)

cv = StratifiedKFold(n_splits=3)
search = RandomizedSearchCV(pipeline,grid,scoring='f1_weighted',n_jobs=-1,n_iter=10,cv=cv,verbose=2,refit='f1_weighted',random_state=42)

# Training
results = search.fit(X_train,y_train)

params = results.best_params_
print('Best Config: %s ' % params)

# Classification
y_pred = search.predict(X_test)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。