如何解决加载 LSA sklearn 向量
我用 sklearn 训练了一个 LSA 模型,这个模型是用 pickle 保存的。
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
from sklearn.pipeline import Pipeline
import numpy as np
import os.path
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords
import pickle
def load_data(path,file_name):
"""
Input : path and file_name
Purpose: loading text file
Output : list of paragraphs/documents and
title(initial 100 words considered as title of document)
"""
documents_list = []
titles=[]
with open( os.path.join(path,file_name),"r") as fin:
for line in fin.readlines():
text = line.strip()
documents_list.append(text)
print("Total Number of Documents:",len(documents_list))
titles.append( text[0:min(len(text),100)] )
return documents_list,titles
document_list,titles=load_data("","a-choose")
#clean_text=preprocess_data(document_list)
# raw documents to tf-idf matrix:
vectorizer = TfidfVectorizer(stop_words='english',use_idf=True,smooth_idf=True)
# SVD to reduce dimensionality:
svd_model = TruncatedSVD(n_components=4,algorithm='randomized',n_iter=10)
# pipeline of tf-idf + SVD,fit to and applied to documents:
svd_transformer = Pipeline([('tfidf',vectorizer),('svd',svd_model)])
svd_matrix = svd_transformer.fit_transform(document_list)
# svd_matrix can later be used to compare documents,compare words,or compare queries with documents
sentence=["football"]
sentence2=["match"]
query=svd_transformer.transform(sentence2)
query_vector = svd_transformer.transform(sentence)
#print(query_vector)
#print(query)
with open("lsa_model.bin","wb") as f:
pickle.dump(svd_matrix,f)
作为第二步,我使用另一个加载这个模型的程序,它将比较词向量。 我无法加载这些向量的问题,我的代码如下
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
from sklearn.pipeline import Pipeline
import numpy as np
import numpy as np
from gensim.models import KeyedVectors
import codecs
import pickle
model = pickle.load(open('lsa_model.bin','rb'))
query="best"
query_vector = model.transform(query)
print(query_vector)
query_vector = model.transform(query) AttributeError: 'numpy.ndarray' 对象没有属性 'transform'
解决方法
我认为你需要在这里使用 just fit 而不是 fit_transform :
svd_matrix = svd_transformer.fit(document_list)
我不知道为什么它只在第二部分有效
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。