微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

熊猫专栏内的WMD

如何解决熊猫专栏内的WMD

我正在尝试使用WMD查找相似的句子。

DATE            TEXT
2019-01-12     The sky is blue and beautiful.
2019-01-12     love this blue and beautiful sky!
2019-01-12     The quick brown fox jumps over the lazy dog.
2019-01-12     A king’s breakfast has sausages,ham,bacon,eggs,toast and beans
2019-01-12     I love green eggs,sausages and bacon!
2020-01-13     The brown fox is quick and the blue dog is lazy!
2020-01-13     The sky is very blue and the sky is very beautiful today
2019-01-21     The dog is lazy but the brown fox is quick!
2020-01-12     President greets the press in Chicago
2020-01-12     Obama speaks in Illinois

为了找到这两个句子中的任何一个间的相似性(我应该将其应用于所有句子),我尝试如下使用WMD(针对两个字符串):

import numpy as np
import pandas as pd

# calculate distance between 2 responses using wmd
def find_similar_sentences(sentence_1,sentence_2):
    distance = model.wv.wmdistance(sentence_1,sentence_2)
    return distance
  
# create distance matrix
tokenized_sentences = [s.split() for s in df[col]]
l = len(tokenized_sentences)
distances = np.zeros((l,l))
for i in range(l):
    for j in range(l):
        distances[i,j] = find_similar_sentences(tokenized_sentences[i],tokenized_sentences[j])

# make pandas dataframe
labels = ['sentence' + str(i + 1) for i in range(l)]
df = pd.DataFrame(data=distances,index=labels,columns=labels)
print(df)

我期望这样的事情:

DATE            TEXT                                                                 Similar Sentence                    
2019-01-12     The sky is blue and beautiful.                      [love this blue and beautiful sky!,The sky is very blue and the sky is very beautiful today]
2019-01-12     love this blue and beautiful sky!                   [The sky is blue and beautiful.,The sky is very blue and the sky is very beautiful today]
2019-01-12     The quick brown fox jumps over the lazy dog.        [The brown fox is quick and the blue dog is lazy!,The dog is lazy but the brown fox is quick!]
2019-01-12     A king’s breakfast has sausages,toast and beans  [I love green eggs,sausages and bacon!]
2019-01-12     I love green eggs,sausages and bacon!            [A king’s breakfast has sausages,toast and beans ]
2020-01-13     The brown fox is quick and the blue dog is lazy!       [The quick brown fox jumps over the lazy dog.,The dog is lazy but the brown fox is quick!]
2020-01-13     The sky is very blue and the sky is very beautiful today  [The sky is blue and beautiful.,love this blue and beautiful sky! ]
2019-01-21     The dog is lazy but the brown fox is quick!            [he quick brown fox jumps over the lazy dog.,The brown fox is quick and the blue dog is lazy! ]
2020-01-12     President greets the press in Chicago                  [Obama speaks in Illinois]
2020-01-12     Obama speaks in Illinois                               [President greets the press in Chicago]

其中Similar_Sentence列是根据高于所选阈值的句子填充的。

能否请您告诉我如何通过行而不是字符串扩展上面的代码,以便获得与预期输出显示的列相似的内容(与阈值无关)?

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。