如何解决如何使用 vectorizer.transform(x_train) 生成唯一分数
我正在尝试使用 tfidf 向量化器为单词生成唯一分数,但无法生成。以下是代码-
def prepare_data(preprocessed,labels):
preprocessed = [
"look like good person","boy dats cold tyga dwn bad cuffin","hate another person got much going"
]
x_train,x_test,y_train,y_test = train_test_split(preprocessed,labels,test_size=0.30)
tfidf_vect = TfidfVectorizer()
tfidf_vect.fit(preprocessed)
x_train_tfidf = tfidf_vect.transform(x_train)
x_test_tfidf = tfidf_vect.transform(x_test)
print("tfidf--> ",x_train_tfidf)
(0,15) 0.37796447300922725
(0,6) 0.37796447300922725
(0,5) 0.37796447300922725
(0,4) 0.37796447300922725
(0,3) 0.37796447300922725
(0,2) 0.37796447300922725
(0,1) 0.37796447300922725
(1,14) 0.3220024178194947
(1,13) 0.4233944834119594
(1,10) 0.4233944834119594
(1,9) 0.4233944834119594
(1,7) 0.4233944834119594
(1,0) 0.4233944834119594
我不认为这是正确的,因为每个单词都应该分配有唯一的分数。我怎样才能获得完美的输出?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。