如何解决lightgbm || ValueError:Series.dtypes必须为int,float或bool
数据框已填充na值。
数据集的架构没有文档中指定的 object dtype。
df.info()
输出:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 429 entries,351 to 559
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Gender 429 non-null category
1 Married 429 non-null category
2 Dependents 429 non-null category
3 Education 429 non-null category
4 Self_Employed 429 non-null category
5 ApplicantIncome 429 non-null int64
6 CoapplicantIncome 429 non-null float64
7 LoanAmount 429 non-null float64
8 Loan_Amount_Term 429 non-null float64
9 Credit_History 429 non-null float64
10 Property_Area 429 non-null category
dtypes: category(6),float64(4),int64(1)
memory usage: 23.3 KB
我有以下代码....................................... ................................................... ................................................... ................................................... ................................................... ................................................... ...................
- 我正在尝试使用lightgbm对数据集进行分类
import lightgbm as lgb
train_data=lgb.Dataset(x_train,label=y_train,categorical_feature=cat_cols)
#define parameters
params = {'learning_rate':0.001}
model= lgb.train(params,train_data,100,categorical_feature=cat_cols)
出现以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-178-aaa91a2d8719> in <module>
6
7
----> 8 model= lgb.train(params,categorical_feature=cat_cols)
~\Anaconda3\lib\site-packages\lightgbm\engine.py in train(params,train_set,num_boost_round,valid_sets,valid_names,fobj,feval,init_model,feature_name,categorical_feature,early_stopping_rounds,evals_result,verbose_eval,learning_rates,keep_training_booster,callbacks)
229 # construct booster
230 try:
--> 231 booster = Booster(params=params,train_set=train_set)
232 if is_valid_contain_train:
233 booster.set_train_data_name(train_data_name)
~\Anaconda3\lib\site-packages\lightgbm\basic.py in __init__(self,params,model_file,model_str,silent)
1981 break
1982 # construct booster object
-> 1983 train_set.construct()
1984 # copy the parameters from train_set
1985 params.update(train_set.get_params())
~\Anaconda3\lib\site-packages\lightgbm\basic.py in construct(self)
1319 else:
1320 # create train
-> 1321 self._lazy_init(self.data,label=self.label,1322 weight=self.weight,group=self.group,1323 init_score=self.init_score,predictor=self._predictor,~\Anaconda3\lib\site-packages\lightgbm\basic.py in _lazy_init(self,data,label,reference,weight,group,init_score,predictor,silent,params)
1133 raise TypeError('Cannot initialize Dataset from {}'.format(type(data).__name__))
1134 if label is not None:
-> 1135 self.set_label(label)
1136 if self.get_label() is None:
1137 raise ValueError("Label should not be None")
~\Anaconda3\lib\site-packages\lightgbm\basic.py in set_label(self,label)
1648 self.label = label
1649 if self.handle is not None:
-> 1650 label = list_to_1d_numpy(_label_from_pandas(label),name='label')
1651 self.set_field('label',label)
1652 self.label = self.get_field('label') # original values can be modified at cpp side
~\Anaconda3\lib\site-packages\lightgbm\basic.py in list_to_1d_numpy(data,dtype,name)
88 elif isinstance(data,Series):
89 if _get_bad_pandas_dtypes([data.dtypes]):
---> 90 raise ValueError('Series.dtypes must be int,float or bool')
91 return np.array(data,dtype=dtype,copy=False) # SparseArray should be supported as well
92 else:
ValueError: Series.dtypes must be int,float or bool
解决方法
有人帮助过你吗?如果不是:答案在于转换您的变量。
转到此链接:GitHub Discussion lightGBM
LightGBM 的创建者曾经遇到过同样的问题。 在上面的链接中,他们 (STRIKER) 告诉您,您应该:使用 astype("category") (pandas/scikit) 转换变量,并且您应该对它们进行 labelEncode,因为您需要一个 INT !特征列中的值,尤其是 INT32。
然而,labelEncoding 和 astype('category') 通常应该做同样的事情: Encoding
另一个有用的链接是关于分类特征的高级文档:Categorical feature light gbm homepage,其中他们告诉您他们无法像在您的数据中那样处理对象(字符串)dtype。
如果你仍然对这个解释感到不舒服,这里是我来自 kaggle space_race_set 的代码片段。如果您仍然遇到问题。就问吧。
cat_feats = ['Company Name','Night_and_Day','Rocket Type','Rocket Mission Type','State','Country']
labelencoder = LabelEncoder()
for col in cat_feats:
train_df[col] = labelencoder.fit_transform(train_df[col])
for col in cat_feats:
train_df[col] = train_df[col].astype('int')
y = train_df[["Status Mission"]]
X = train_df.drop(["Status Mission"],axis=1)
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state = 42)
train_data = lgb.Dataset(X_train,label=y_train,categorical_feature=['Company Name','Country'],free_raw_data=False)
test_data = lgb.Dataset(X_test,label=y_test,free_raw_data=False)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。