如何解决为什么在使用 LightGBM 和 Sklearn 完成多分类任务时出现“Python int too large to convert to C long”?
我想用LightGBM和Sklearn完成一个多分类任务,代码如下
#Input
X = copy.deepcopy(GTEx_feature)
y = copy.deepcopy(out)
#TrainingSet : TestSet = 4 : 1
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state = SEED)
########################################
########## Model Construction ##########
print("\n...... Training Model ......\n")
# LightGBM params
param_dict = {
"objective":'multiclass',"num_class":[5],"learning_rate":[0.1,0.05,0.02,0.015,0.01],#
"num_leaves": range(10,36,5),# Maximum tree leaves for base learners.
"max_depth" : [-1,2,3,4,5,10,20,40,50],#Maximum tree depth for base learners,<=0 means no limit.
"min_data_in_leaf": range(1,45,2),#
"feature_fraction" : [i / 10 for i in range(2,11)],#
"metric" : "multi_error",#
"early_stopping_rounds" : [None],#
"n_jobs" : [-1],#
"silent" : [True],#
"verbose" : [-1],#
"n_estimators" : range(50,1000,50),#
"bagging_fraction" : [i / 10 for i in range(2,#
"bagging_freq" : [0,1,2],#
"lambda_l1" : [0,0.001,0.005,0.01,0.1],#
"lambda_l2" : [0,#
"random_state":[2020]#
}
#Initiate model
model = lgb.LGBMClassifier()
#Adjust hyper-parameters with 5-fold cross validation
rscv = RandomizedSearchCV(model,#
param_dict,#
n_iter=100,# Number of parameter settings that are sampled. n_iter Tradesoff runtime vs quality of the solution.
cv = 5,# Determines the cross-validation splitting strategy.
verbose = 0,# Controls the verbosity: the higher,the more messages.
scoring = "roc_auc",#
n_jobs =-1#
)
gbm=rscv.fit(X_train,y_train)
最后一行出现错误:
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-110-97f7932a1488> in <module>
42 n_jobs =-1#
43 )
---> 44 gbm=rscv.fit(X_train,y_train)
45 ########## Model Evaluation ##########
46 print("\n...... Evaluating Model ......\n")
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args,**kwargs)
71 FutureWarning)
72 kwargs.update({k: arg for k,arg in zip(sig.parameters,args)})
---> 73 return f(**kwargs)
74 return inner_f
75
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self,X,groups,**fit_params)
734 return results
735
--> 736 self._run_search(evaluate_candidates)
737
738 # For multi-metric evaluation,store the best_index_,best_params_ and
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self,evaluate_candidates)
1527 def _run_search(self,evaluate_candidates):
1528 """Search n_iter candidates from param_distributions"""
-> 1529 evaluate_candidates(ParameterSampler(
1530 self.param_distributions,self.n_iter,1531 random_state=self.random_state))
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params)
698
699 def evaluate_candidates(candidate_params):
--> 700 candidate_params = list(candidate_params)
701 n_candidates = len(candidate_params)
702
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in __iter__(self)
282 % (grid_size,grid_size),UserWarning)
283 n_iter = grid_size
--> 284 for i in sample_without_replacement(grid_size,n_iter,285 random_state=rng):
286 yield param_grid[i]
sklearn\utils\_random.pyx in sklearn.utils._random.sample_without_replacement()
OverflowError: Python int too large to convert to C long
所以我想知道为什么会发生这个错误以及我可以尝试什么来处理它
这是一行X
g 1.057671
c 2.644094
fi -0.302407
ph -0.772771
da -0.449314
phy -0.447774
gen -1.042650
gs 0.053665
ts -0.197370
sig -0.137325
endna -0.255032
eireig -0.372373
ee3 -0.159200
cong 0.000000
fhft 0.000000
sple 0.000000
p 0.000000
ts 0.000000
kb 0.000000
non 1.000000
sn 1.000000
nss -0.014290
Name: 1,dtype: float64
在使用相同的 X 完成二进制分类时,我没有遇到这样的错误, y如下
0 2
1 0
2 0
3 4
4 4
..
4995 0
4996 1
4997 1
4998 0
4999 0
Name: mtle,Length: 5000,dtype: int64
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。