如何解决sklearn_crfsuite的参数需要是字节吗?
我正在尝试按照本教程构建一个条件随机场模型 https://www.kaggle.com/shoumikgoswami/ner-using-random-forest-and-crf 我已按照所有步骤操作,但出于某种原因,当我运行该行时
pred = cross_val_predict(estimator=crf,X=X,y=y,cv=5)
我收到以下错误
TypeError: expected bytes,int found
这是构建CRF的全部代码
X = [sent2features(s) for s in sentences]
y = [sent2labels(s) for s in sentences]
crf = sklearn_crfsuite.CRF(algorithm='lbfgs',c1=0.1,c2=0.1,max_iterations=100,all_possible_transitions=False)
pred = cross_val_predict(estimator=crf,cv=5)
环境 Conda Python 3.7
数据
y(labels) : [[0,1],[0,1,0],0]]
X (words:sequence): [[{'bias': 1.0,'word.lower()': 'we','word[-3:]': 'we','word[-2:]': 'we','word.isupper()': False,'word.istitle()': False,'word.isdigit()': False,'postag': 'PRP','postag[:2]': 'PR','BOS': True,'+1:word.lower()': 'have','+1:word.istitle()': False,'+1:word.isupper()': False,'+1:postag': 'VBP','+1:postag[:2]': 'VB'},{'bias': 1.0,'word.lower()': 'have','word[-3:]': 'ave','word[-2:]': 've','postag': 'VBP','postag[:2]': 'VB','-1:word.lower()': 'we','-1:word.istitle()': False,'-1:word.isupper()': False,'-1:postag': 'PRP','-1:postag[:2]': 'PR','+1:word.lower()': 'potatos','+1:postag': 'VBN','word.lower()': 'potatos','word[-3:]': 'tos','word[-2:]': 'os','postag': 'VBN','-1:word.lower()': 'have','-1:postag': 'VBP','-1:postag[:2]': 'VB','EOS': True}],[{'bias': 1.0,'word.lower()': 'an','word[-3:]': 'an','word[-2:]': 'an','postag': 'DT','postag[:2]': 'DT','+1:word.lower()': 'island','+1:postag': 'NN','+1:postag[:2]': 'NN'},'word.lower()': 'island','word[-3:]': 'and','word[-2:]': 'nd','postag': 'NN','postag[:2]': 'NN','-1:word.lower()': 'an','-1:postag': 'DT','-1:postag[:2]': 'DT','+1:word.lower()': 'forest','+1:postag': 'JJS','+1:postag[:2]': 'JJ'},'word.lower()': 'forest','word[-3:]': 'est','word[-2:]': 'st','postag': 'JJS','postag[:2]': 'JJ','-1:word.lower()': 'island','-1:postag': 'NN','-1:postag[:2]': 'NN','word.lower()': 'up','word[-3:]': 'up','word[-2:]': 'up','postag': 'IN','postag[:2]': 'IN','+1:word.lower()': 'the','+1:postag': 'DT','+1:postag[:2]': 'DT'},'word.lower()': 'the','word[-3:]': 'the','word[-2:]': 'he','-1:word.lower()': 'up','-1:postag': 'IN','-1:postag[:2]': 'IN','+1:word.lower()': 'mile','word.lower()': 'mile','word[-3:]': 'ile','word[-2:]': 'le','-1:word.lower()': 'the','+1:word.lower()': 'smell','word.lower()': 'smell','word[-3:]': 'ell','word[-2:]': 'll','+1:word.lower()': 'of','+1:postag': 'IN','+1:postag[:2]': 'IN'},'word.lower()': 'of','word[-3:]': 'of','word[-2:]': 'of','-1:word.lower()': 'smell','+1:word.lower()': 'tulips','+1:postag': 'NNS','word.lower()': 'tulips','word[-3:]': 'ips','word[-2:]': 'ps','postag': 'NNS','-1:word.lower()': 'of','+1:word.lower()': 'and','+1:postag': 'CC','+1:postag[:2]': 'CC'},'word.lower()': 'and','postag': 'CC','postag[:2]': 'CC','-1:word.lower()': 'tulips','-1:postag': 'NNS','+1:word.lower()': 'roses','word.lower()': 'roses','word[-3:]': 'ses','word[-2:]': 'es','-1:word.lower()': 'and','-1:postag': 'CC','-1:postag[:2]': 'CC','word.lower()': 'i','word[-3:]': 'i','word[-2:]': 'i','+1:word.lower()': 'would','+1:postag': 'MD','+1:postag[:2]': 'MD'},'word.lower()': 'would','word[-3:]': 'uld','word[-2:]': 'ld','postag': 'MD','postag[:2]': 'MD','-1:word.lower()': 'i','+1:word.lower()': 'love','+1:postag': 'VB','word.lower()': 'love','word[-3:]': 'ove','postag': 'VB','-1:word.lower()': 'would','-1:postag': 'MD','-1:postag[:2]': 'MD','+1:word.lower()': 'it','+1:postag': 'PRP','+1:postag[:2]': 'PR'},'word.lower()': 'it','word[-3:]': 'it','word[-2:]': 'it','-1:word.lower()': 'love','-1:postag': 'VB','EOS': True}]]
解决方法
我解决了这个问题。正如我们在数据示例中看到的,y 标签是一个包含整数 0 和 1 的数组列表。因此,在我将 y 变量的标签从整数更改为 0 和 1 的字符串后,它确实起作用了。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。