微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

我们得到一个错误列表'对象在交叉验证中没有属性'low in

如何解决我们得到一个错误列表'对象在交叉验证中没有属性'low in

我们正在尝试对数据集进行一些分类,以在数据准备好后查看哪些句子是宣传,哪些不是。我们使用了 NLTK,然后进行了交叉验证。

我将 train_sentence(这是我们的训练数据集)作为输入。

#Sentence 标记

from nltk.tokenize import sent_tokenize
sent_token = [sent_tokenize(doc) for doc in train_sentence]
print(sent_token[1])

# Removing punctuation
import re
regex = re.compile('[%s]' % re.escape(string.punctuation)) #see documentation here: http://docs.python.org/2/library/string.html

tokenized_docs_no_punctuation = []

for review in sent_token:
    new_review = []
    for token in review:
        new_token = regex.sub(u'',token)
        if not new_token == u'':
            new_review.append(new_token)
    
    tokenized_docs_no_punctuation.append(new_review)
    
print(tokenized_docs_no_punctuation)

# Cleaning text of stopwords
from nltk.corpus import stopwords

tokenized_docs_no_stopwords = []

for doc in tokenized_docs_no_punctuation:
    new_term_vector = []
    for word in doc:
        if not word in stopwords.words('english'):
            new_term_vector.append(word)
    tokenized_docs_no_stopwords.append(new_term_vector)

print(tokenized_docs_no_stopwords)

from nltk.stem.porter import Porterstemmer
from nltk.stem.wordnet import WordNetLemmatizer

porter = Porterstemmer()
wordnet = WordNetLemmatizer()

preprocessed_docs = []

for doc in tokenized_docs_no_stopwords:
    final_doc = []
    for word in doc:
        #final_doc.append(porter.stem(word))
        final_doc.append(wordnet.lemmatize(word))
    preprocessed_docs.append(final_doc)

# train_sentence = preprocessed_docs_toarray
train_sentence = tokenized_docs_no_stopwords
print(tokenized_docs_no_stopwords)


#StratifiedKFold is better when we have unbalanced data,it makes sure that in training there is sufficient for the smallest class
from sklearn.model_selection import StratifiedKFold 
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB


# KFold cross validation - on our dataset

def get_score(model,X_train,X_test,y_train,y_test):
    model.fit(X_train,y_train)
    return model.score(X_test,y_test)

folds = StratifiedKFold(n_splits=2)

scores_logistic = []
scores_svm = []
scores_rf = []
scores_nb = []

Sample_Array_sentence = np.concatenate((train_sentence,development_sentence))
Sample_Array_propaganda = np.concatenate((train_propaganda,development_propaganda))


# Bag of words
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
Sample_Array_sentence_vectors = vectorizer.fit_transform(Sample_Array_sentence)


for train_index,test_index in folds.split(Sample_Array_sentence_vectors,Sample_Array_propaganda):
    X_train,y_test = Sample_Array_sentence_vectors[train_index],Sample_Array_sentence_vectors[test_index],\
                                       Sample_Array_propaganda[train_index],Sample_Array_propaganda[test_index]

    scores_logistic.append(get_score(LogisticRegression(solver='liblinear',multi_class='ovr'),y_test))  
    scores_svm.append(get_score(SVC(gamma='auto'),y_test))
    scores_rf.append(get_score(RandomForestClassifier(n_estimators=40),y_test))
    scores_nb.append(get_score(GaussianNB(),X_train.toarray(),X_test.toarray(),y_test))


print("score of Logistic Regression")
print(scores_logistic)
print("score of SVM")
print(scores_svm)
print("score of RandomForest")
print(scores_rf)
print("score of Naive Bayes")
print(scores_nb)

我们从交叉验证中得到一个错误:'list' 对象没有属性 'lower'

如果您能帮助我们,我们将不胜感激。

谢谢

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。