如何解决使用 NLTK 从标记化文本中删除停用词:TypeError
/* Create product display page (PDP) */
function displayProductDetails(){
var queryString = location.search;
var bookID = getParameterByName('bookID',queryString);
var xmlhttp = new XMLHttpRequest();
// **** Move xmlhttp.open to here
xmlhttp.open("GET","https://www.googleapis.com/books/v1/volumes/" + bookID + "",true);
// **** Validate xmlhttp.status
xmlhttp.onreadystatechange = function () {
if (xmlhttp.status == 0 || (xmlhttp.status >= 200 && xmlhttp.status < 400) {
var gbaObject = JSON.parse(this.responseText);
var gbaHtml = '';
console.log(gbaObject.items);
gbaObject.items.forEach(function (item) {
gbaHtml += `<div class="div200 border">
<div>
<img src="` + item.volumeInfo.imageLinks.thumbnail + `" alt="` + item.volumeInfo.title + `" title="` + item.volumeInfo.title + `" />
<div><strong>Title:</strong><a href="` + item.volumeInfo.previewLink + `"> ` + item.volumeInfo.title + `</a></div>
<div><strong>Sub Title:</strong> ` + item.volumeInfo.subtitle + `</div>
<div><strong>Published Date:</strong> ` + item.volumeInfo.publishedDate + `</div>
<div><strong>Publisher:</strong> ` + item.volumeInfo.publisher + `</div>
<div><strong>Description</strong>` + item.searchInfo.textSnippet + `</div>
</div>
</div>`;
});
//display results in container div container
$('#book-display').html(gbaHtml);
}
else{
$('#book-display').html("Something went wrong");
}
xmlhttp.send();
}
}
/* End PDP page */
返回以下内容:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize,word_tokenize
from nltk.tokenize import PunktSentenceTokenizer
from nltk.stem import WordNetLemmatizer
import re
import time
txt = input()
snt_tkn = sent_tokenize(txt)
wrd_tkn = [word_tokenize(s) for s in snt_tkn]
stp_wrd = set(stopwords.words("english"))
flt_snt = [w for w in wrd_tkn if not w in stp_wrd]
print(flt_snt)
如果可能,我想知道如何在不编辑 Traceback (most recent call last):
File "compiler.py",line 19,in
flt_snt = [w for w in wrd_tkn if not w in stp_wrd]
File "compiler.py",in
flt_snt = [w for w in wrd_tkn if not w in stp_wrd]
TypeError: unhashable type: 'list'
的情况下返回移除停用词的标记化文本。
解决方法
错误告诉您该列表不可更改。您可能会尝试使其可散列,实际上列表不是可散列的,因为它们是可变的,尝试将列表转换为不可变且可散列的集合。可以通过构造函数来完成
immutable_list = set(some_list)
,
为了将来参考,决议如下:
改变
flt_snt = [w for w in wrd_tkn if not w in stp_wrd]
到
flt_snt = [[w for w in s if not w in stp_wrd]for s in wrd_tkn]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。