微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

相似度矩阵和社区鲁汶图的文档索引

如何解决相似度矩阵和社区鲁汶图的文档索引

I am running this script to calculate and plot,the similarity between some documents.

#!/usr/bin/python
# -*- coding: utf-8 -*-

import os
import codecs
import string,re
import nltk
import sklearn
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd
from pathlib import Path
from matplotlib import cm as cm
import matplotlib.pyplot as plt
from sklearn.metrics.pairwise import cosine_similarity

path = "C:\\Users\\user\\Desktop\\texts\\dataset"
text_files = os.listdir(path)
#print (text_files)

tfidf_vectorizer = TfidfVectorizer()
documents = [open(f,encoding="utf-8").read() for f in text_files if f.endswith('.txt')]
sparse_matrix = tfidf_vectorizer.fit_transform(documents)

#with open('C:\\Users\\user\\Desktop\\texts\\results\\pairwise_similarity2.csv','w') as f:
#    for item in pairwise_similarity:
#        f.write("%s\n" % item)
#        f.write('\n')

labels = []
for f in text_files:
    if f.endswith('.txt'):
        labels.append(f)
#print(labels)

pairwise_similarity = sparse_matrix * sparse_matrix.T
pairwise_similarity_array = pairwise_similarity.toarray()
 
fig,ax = plt.subplots(figsize=(20,20))
cax = ax.matshow(pairwise_similarity_array,interpolation='spline16')
ax.grid(True)
plt.title('News articles similarity matrix')
plt.xticks(range(23),labels,rotation=90);
plt.yticks(range(23),labels);
fig.colorbar(cax,ticks=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1])
plt.show()

即使我已经创建了labels列表,但我想知道如何访问文档索引以使特定文档与分数相关联?这也将有助于跟踪其他任务中的文档。例如,我还使用louvain社区库为数据集绘制了进一步的假设,但是当尝试将labels列表用作标签时,会出现错误AttributeError: 'list' object has no attribute 'items'

这是Louvain社区的代码输出

[![# load the karate club graph
G = nx.from_numpy_matrix(pairwise_similarity_array)

# compute the best partition
partition = community_louvain.best_partition(G)
#print(partition)
modularity = community_louvain.modularity(partition,G)
print(modularity)

# draw the graph
pos = nx.spring_layout(G)
# color the nodes according to their partition
cmap = cm.get_cmap('coolwarm',max(partition.values()) + 1)
nx.draw_networkx_nodes(G,pos,partition.keys(),node_size=100,cmap=cmap,node_color=list(partition.values()))
nx.draw_networkx_edges(G,alpha=0.5)
nx.draw_networkx_labels(G,labels=doc_labels,font_size=12,font_family='sans-serif')
plt.show()

dendro = community_louvain.generate_dendrogram(G)][1]][1]

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。