如何解决在 imblearn 管道中使用 SMOTENC 实现 FAMD 时出现 AttributeError
我正在尝试使用 FAMD、SMOTENC 和其他预处理步骤来实现管道。但是每次都会出错。如果我从管道中删除 FAMD,它就可以正常工作。
我的代码:
#Seperate the dataset in two parts
num_df= X_train_new.select_dtypes(include=[np.number]).columns
cat_df= X_train_new.select_dtypes(exclude=[np.number]).columns
#Create a mask for categorical features
categorical_feature_mask = X_train_new.dtypes == object
print(categorical_feature_mask)
from sklearn.pipeline import make_pipeline
from sklearn.compose import make_column_transformer
from sklearn.compose import make_column_selector as selector
#Create a pipeline to automate the preprocessing steps and SMOTENC together
num_pipe = make_pipeline(SimpleImputer(strategy='median'))
cat_pipe = make_pipeline(SimpleImputer(strategy='most_frequent'),OneHotEncoder(handle_unkNown='ignore'))
transformer= make_column_transformer((num_pipe,selector(dtype_include='number')),(cat_pipe,selector(dtype_include='object')),n_jobs=2)
#Undersampling with SMOTENC
from imblearn.over_sampling import SMOTENC
smote= SMOTENC(categorical_features=categorical_feature_mask,random_state=99)
!pip install prince
from prince import FAMD
famd=FAMD(n_components=4,random_state=99)
from imblearn.pipeline import make_pipeline as imb_pipeline
#Fit the random forest learner
rf=RandomForestClassifier(n_estimators=300random_state=99)
pipe=imb_pipeline(transformer,smote,famd,rf)
pipe.fit(X_train_new,y_train_new)
print('Training Accuracy:%s'%pipe.score(X_train_new,y_train_new))
错误:
AttributeError Traceback (most recent call last)
<ipython-input-24-2b7ea084a318> in <module>()
3 rf=RandomForestClassifier(n_estimators=300,max_features=3,criterion='entropy',random_state=99)
4 pipe=imb_pipeline(transformer,rf)
----> 5 pipe.fit(X_train_new,y_train_new)
6 print('Training Accuracy:%s'%pipe.score(X_train_new,y_train_new))
6 frames
/usr/local/lib/python3.7/dist-packages/imblearn/pipeline.py in fit(self,X,y,**fit_params)
235
236 """
--> 237 Xt,yt,fit_params = self._fit(X,**fit_params)
238 if self._final_estimator is not None:
239 self._final_estimator.fit(Xt,**fit_params)
/usr/local/lib/python3.7/dist-packages/imblearn/pipeline.py in _fit(self,**fit_params)
195 Xt,fitted_transformer = fit_transform_one_cached(
196 cloned_transformer,None,Xt,--> 197 **fit_params_steps[name])
198 elif hasattr(cloned_transformer,"fit_resample"):
199 Xt,fitted_transformer = fit_resample_one_cached(
/usr/local/lib/python3.7/dist-packages/joblib/memory.py in __call__(self,*args,**kwargs)
350
351 def __call__(self,**kwargs):
--> 352 return self.func(*args,**kwargs)
353
354 def call_and_shelve(self,**kwargs):
/usr/local/lib/python3.7/dist-packages/imblearn/pipeline.py in _fit_transform_one(transformer,weight,**fit_params)
564 def _fit_transform_one(transformer,**fit_params):
565 if hasattr(transformer,'fit_transform'):
--> 566 res = transformer.fit_transform(X,**fit_params)
567 else:
568 res = transformer.fit(X,**fit_params).transform(X)
/usr/local/lib/python3.7/dist-packages/sklearn/base.py in fit_transform(self,**fit_params)
572 else:
573 # fit method of arity 2 (supervised transformation)
--> 574 return self.fit(X,**fit_params).transform(X)
575
576
/usr/local/lib/python3.7/dist-packages/prince/famd.py in fit(self,y)
27
28 # Separate numerical columns from categorical columns
---> 29 num_cols = X.select_dtypes(np.number).columns.tolist()
30 cat_cols = list(set(X.columns) - set(num_cols))
31
/usr/local/lib/python3.7/dist-packages/scipy/sparse/base.py in __getattr__(self,attr)
689 return self.getnnz()
690 else:
--> 691 raise AttributeError(attr + " not found")
692
693 def transpose(self,axes=None,copy=False):
AttributeError: select_dtypes not found
解决方法
tl;dr:尝试将 sparse=False
添加到您的 OneHotEncoder
。考虑使用 prince
提出问题,以处理稀疏输入。
您可以从回溯中看到问题在于 FAMD.fit
尝试 X.select_dtypes
将分类数据和数字数据分开。 select_dtypes
是一个 Pandas 函数,所以通常我会假设 prince
被写入来操作数据帧而不是 sklearn 内部使用的 numpy 数组(如果有必要,在从帧转换之后)。但是,查看源代码,他们确实将其从 numpy 数组转换为数据帧上方的几行。 但是,最后一条跟踪消息来自 scipy。这暗示您的 X
实际上可能是一个稀疏数组。实际上,OneHotEncoder
(在您的管道中的早期)更喜欢输出稀疏数组,而 ColumnTransformer
根据其组成部分和参数 sparse_threshold
确定是转换为稀疏数组还是密集数组。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。