如何解决熊猫专栏内的WMD
我正在尝试使用WMD查找相似的句子。
DATE TEXT
2019-01-12 The sky is blue and beautiful.
2019-01-12 love this blue and beautiful sky!
2019-01-12 The quick brown fox jumps over the lazy dog.
2019-01-12 A king’s breakfast has sausages,ham,bacon,eggs,toast and beans
2019-01-12 I love green eggs,sausages and bacon!
2020-01-13 The brown fox is quick and the blue dog is lazy!
2020-01-13 The sky is very blue and the sky is very beautiful today
2019-01-21 The dog is lazy but the brown fox is quick!
2020-01-12 President greets the press in Chicago
2020-01-12 Obama speaks in Illinois
为了找到这两个句子中的任何一个之间的相似性(我应该将其应用于所有句子),我尝试如下使用WMD(针对两个字符串):
import numpy as np
import pandas as pd
# calculate distance between 2 responses using wmd
def find_similar_sentences(sentence_1,sentence_2):
distance = model.wv.wmdistance(sentence_1,sentence_2)
return distance
# create distance matrix
tokenized_sentences = [s.split() for s in df[col]]
l = len(tokenized_sentences)
distances = np.zeros((l,l))
for i in range(l):
for j in range(l):
distances[i,j] = find_similar_sentences(tokenized_sentences[i],tokenized_sentences[j])
# make pandas dataframe
labels = ['sentence' + str(i + 1) for i in range(l)]
df = pd.DataFrame(data=distances,index=labels,columns=labels)
print(df)
我期望这样的事情:
DATE TEXT Similar Sentence
2019-01-12 The sky is blue and beautiful. [love this blue and beautiful sky!,The sky is very blue and the sky is very beautiful today]
2019-01-12 love this blue and beautiful sky! [The sky is blue and beautiful.,The sky is very blue and the sky is very beautiful today]
2019-01-12 The quick brown fox jumps over the lazy dog. [The brown fox is quick and the blue dog is lazy!,The dog is lazy but the brown fox is quick!]
2019-01-12 A king’s breakfast has sausages,toast and beans [I love green eggs,sausages and bacon!]
2019-01-12 I love green eggs,sausages and bacon! [A king’s breakfast has sausages,toast and beans ]
2020-01-13 The brown fox is quick and the blue dog is lazy! [The quick brown fox jumps over the lazy dog.,The dog is lazy but the brown fox is quick!]
2020-01-13 The sky is very blue and the sky is very beautiful today [The sky is blue and beautiful.,love this blue and beautiful sky! ]
2019-01-21 The dog is lazy but the brown fox is quick! [he quick brown fox jumps over the lazy dog.,The brown fox is quick and the blue dog is lazy! ]
2020-01-12 President greets the press in Chicago [Obama speaks in Illinois]
2020-01-12 Obama speaks in Illinois [President greets the press in Chicago]
其中Similar_Sentence
列是根据高于所选阈值的句子填充的。
能否请您告诉我如何通过行而不是字符串扩展上面的代码,以便获得与预期输出中显示的列相似的内容(与阈值无关)?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。