微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

SoftmaxMultiClassObj:标签大小和预测大小不匹配 - XGBoost

如何解决SoftmaxMultiClassObj:标签大小和预测大小不匹配 - XGBoost

我正在尝试使用约 9000 个文档和 5 个标签和 XGboost 执行多类文本分类。我也在尝试使用 20/80 分割进行训练和测试,但无法弄清楚如何做到这一点。这是我加载数据和库后的代码


    new_sentence = [] 
    for sentence in text_column:
      text = re.sub("@\S+|https?:\S|[^A-Za-z0-9]+",'',str(sentence).lower()).strip()
      text = [wnl.lemmatize(i) for i in text.split ('') if i not in stop_words]
      new_review.append(''.join(text))
    return new_review

    train['sentence'] = preprocess(train['sentence'])
    test['sentence'] = preprocess(test['sentence'])
from sklearn.feature_extraction.text import CountVectorizer
# vectorizing the sentences
cv = CountVectorizer(binary = True) # implies that it indicates whether the word is present or not.
cv.fit(train['sentence']) # find all the unique words from the training set
train_x = cv.transform(train)
test_x = cv.transform(test)
# importing the relevant modules
import xgboost as xgb
xgb_train_labels = []
accepted_strings_half1 = {'location','service','price'}
accepted_strings_half2 = {'food','time'}
for topic in train['topic']:
    if topic in accepted_strings_half1:
        xgb_train_labels.append(1)
    elif topic in accepted_strings_half2:
        xgb_train_labels.append(0)
    else:
        xgb_train_labels.append(None)

xgb_test_labels = []
for topic in test['topic']:
    if topic in accepted_strings_half1:
        xgb_test_labels.append(1)
    elif topic in accepted_strings_half2:
        xgb_test_labels.append(0)
    else:
        xgb_test_labels.append(None)

# creating a variable for the new train and test sets
xgb_train = xgb.DMatrix(train_x,xgb_train_labels)
xgb_test = xgb.DMatrix(test_x,xgb_test_labels)
# Setting the Parameters of the Model
param = {'objective':'multi:softmax','num_class': 5,'eta': 0.75,'max_depth': 50,}
# Training the Model
xgb_model = xgb.train(param,xgb_train,num_boost_round = 30)
# Predicting using the Model
y_pred = xgb_model.predict(xgb_test)
y_pred = np.where(np.array(y_pred) > 0.5,1,0) # converting them to 1/0’s
# Evaluation of Model
accuracy_score(xgb_test_labels,y_pred)     
f1_score(xgb_test_labels,y_pred)

在尝试运行上面的最后一个单元格时出现此错误

XGBoostError Traceback(最近一次调用最后一次) 在 () 3 'max_depth': 50,} 4 # 训练模型 ----> 5 xgb_model = xgb.train(param,num_boost_round = 30) 6 # 使用模型进行预测 7 y_pred = xgb_model.predict(xgb_test)

3 帧 check_call(ret) 中的 /usr/local/lib/python3.7/dist-packages/xgboost/core.py 第174话 175 如果 ret != 0: --> 176 引发 XGBoostError(py_str(LIB.XGBGetLastError())) 177 178 XGBoostError: [23:04:45] /workspace/src/objective/multiclass_obj.cu:60: 检查失败:preds.Size() == (static_cast(param.num_class) * info.labels .Size()): softmaxMultiClassObj:标签大小和pred大小不匹配 堆栈跟踪:

如果我能做些什么来解决这个问题,请告诉我!谢谢。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。