微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

python-2个句子的语义相似性度量

这个问题已经在这里有了答案:            >            How to compute the similarity between two text documents?                                    8个
我需要测量两个句子之间的相似度.例如:

s1 = "she is good a dog "
s2 = "she is nice a heel"

我需要证明“好”类似于“好”.对于名词和动词,按路径进行相似性度量的工作方式类似于此伪代码

def get max :
for loop
(wn.synset ('dog ')).path_similarity(wn.synset ('animal'))

结果:.33,这是一个很高的值,那么这些词是相关的,我可以说这是相似的.但是对于副词(“ nice”和“ good”),. 09值很低!

有任何想法吗?

解决方法:

您可以找到所有好的同义词集的path_similarity然后选择最大值:

>>> from nltk.corpus import wordnet as wn
>>> n=wn.synsets('nice')
>>> g=wn.synsets('good')
>>> [i.path_similarity(n[0]) for i in g]
[0.0625, 0.06666666666666667, 0.07142857142857142, 0.09090909090909091, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

>>> max(i.path_similarity(n[0]) for i in g)
0.09090909090909091

请注意,单词的同义词集包含单词的许多形式,例如动词,none,adj等,因此您需要选择适当的单词!

另外,您还可以使用wup_similarity:

>>> round(max(i.wup_similarity(n[0]) for i in g), 1)
0.4

Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).

阅读更多关于synsets http://www.nltk.org/howto/wordnet.html的信息

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐