如何解决处理分类中的不平衡数据集
我有一个基于会计欺诈的大型数据框,我想解决数据不平衡的问题。
首先,我将数据框拆分为 2 个:X(变量)和 y(目标,即:欺诈或不欺诈)
我试过了:
from collections import Counter
from sklearn.datasets import make_classification
from imblearn.combine import SMOTEENN
from collections import Counter
from sklearn.datasets import make_classification
from imblearn.under_sampling import RandomUnderSampler
X = df[['fyear','gvkey','sich','insbnk','understatement','option','p_aaer','new_p_aaer','act','ap','at','ceq','che','cogs','csho','dlc','dltis','dltt','dp','ib','invt','ivao','ivst','lct','lt','ni','ppegt','pstk','re','rect','sale','sstk','txp','txt','xint','prcc_f','dch_wc','ch_rsst','dch_rec','dch_inv','soft_assets','ch_cs','ch_cm','ch_roa','issue','bm','dpi','reoa','EBIT','ch_fcf']]
y = df[['target']]
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42)
X_res,y_res = sm.fit_resample(X,y)
print('Resampled dataset shape {}'.format(Counter(y_res)))
还有这个
# define sampling strategy
sample = SMOTEENN(sampling_strategy=0.5)
# fit and apply the transform
X_over,y_over = sample.fit_resample(X,y)
# summarize class distribution
print(Counter(y_over))
但在这两种情况下,结果都是一样的:
ValueError: could not convert string to float: '2.461.242'
请问,有人可以帮我吗?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。