微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

wordnet 替代方法来查找单词之间的语义关系 python

如何解决wordnet 替代方法来查找单词之间的语义关系 python

我有一个项目来获得两个词之间的语义关系,我想获得字对字的关系,如上位词、下位词、同义词、同义词…… 我尝试使用 wordnet nltk,但大多数关系都没有, 这是示例代码

from nltk.corpus import wordnet as wn
from wordhoard import synonyms

Word1 = 'red'
Word2 = 'color'
LSTWord1 =[]
for syn in wn.synsets(Word1):
    for lemma in syn.part_meronyms():
        LSTWord1.append(lemma)

            
for s in LSTWord1:
    if Word2 in s.name() :
        print(Word1 +' is meronyms  of ' +  Word2) 
        break
LSTWord2 =[]
for syn in wn.synsets(Word2):
    for lemma in syn.part_meronyms ():
        LSTWord2.append(lemma)

for s in LSTWord2:
    if Word1 in s.name() :
        print( Word2   +' is meronyms  of ' + Word1)
        break


这里有一个词的例子:

scheduled,geometry
games,river
campaign,sea
adventure,place
session,road
long,town
campaign,road
session,railway
difficulty of session,place of interest
campaign,town
leader,historic place
have,town
player,town
skills,church
campaign,cultural interest
character name,monument
player,province
games,beach
expertise level,gas station
character,municipality
world,electrict line
social interaction,electric line
percentage,municipality
character,hospital
inhabitants,mine
active character,municipality
campaign,altitude
died,municipality
many time,mountain
adventurer,altitude
campaign,peak
gain,place of interest
new capabilities,cultural interest
player,cultural interest
achievement,national park
campaign,good
first action,railway station
player,province

wordnet 可能有限制,也可能没有词之间的关系,我的问题是 wordnet 有没有其他方法可以处理词之间的语义关系,或者有没有更好的方法来获得词之间的语义关系? 谢谢

解决方法

看起来您正在寻找一对给定单词和大量词汇表之间的任意语义关系。词嵌入的简单余弦相似度可能在这里有所帮助。您可以从 GloVe 开始。

,

正如我之前所说,我是您在问题中使用的 Python 包 wordhoard 的作者。根据您的问题,我决定在包中添加一些额外的模块。这些模块侧重于:

  • 同音字
  • 上位词
  • 下位词

我找不到添加分词的简单方法,但我仍在寻找最好的方法。

同音词模块将查询一个手工构建的列表,其中包含 60,000 多个已知同音词最常用的英语单词。我计划在未来扩展此列表。

GET test/_mapping

{
  "test" : {
    "mappings" : {
      "properties" : {
        "id" : {
          "type" : "long"
        },"name" : {
          "type" : "text"
        }
      }
    }
  }
}


POST test/_doc/101
{
  "id":101,"name":"hello"
}

POST test/_doc/102
{
  "id":102,"name":"hi"
}

Wildcard Search pattern

GET test/_search
{
  "query": {
    "query_string": {
      "query": "*101* *hello*","default_operator": "AND","fields": [
        "id","name"
      ]
    }
  }
}

上位词模块查询各种在线存储库。

from wordhoard import homophones

words = ['scheduled','geometry','games','river','campaign','sea','adventure','place','session','road','long','town','railway']
for item in words:
    results = query_homophones(item)
    print(results)
    # output 
    no homophones for scheduled
    no homophones for geometry
    no homophones for games
    no homophones for river
    no homophones for campaign
    ['sea is a homophone of see','sea is a homophone of cee']
    no homophones for adventure
    ['place is a homophone of plaice']
    ['session is a homophone of cession']
    ['road is a homophone of rowed','road is a homophone of rode']
    truncated...

下位词模块查询存储库。

from wordhoard import hypernyms

words = ['scheduled','railway']
for item in words:
    results = find_hypernyms(item)
    print(results)
    # output 
    ['no hypernyms found']
    ['arrangement','branch of knowledge','branch of math','branch of mathematics','branch of maths','configuration','figure','form','math','mathematics','maths','pure mathematics','science','shape','study','study of numbers','study of quantities','study of shapes','system','type of geometry']
    ['lake','recreation']
    ['branch','dance','fresh water','geological feature','landform','natural ecosystem','natural environment','nature','physical feature','recreation','spring','stream','transportation','watercourse']
    ['action','actively seek election','activity','advertise','advertisement','battle','canvass','crusade','discuss','expedition','military operation','operation','political conflict','politics','promote','push','race','run','seek votes','wage war']
    truncated...

如果您在使用这些新模块时遇到任何问题,请告诉我。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。