微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

搜索关键字的同义词

如何解决搜索关键字的同义词

我有这个代码

import pandas as pd
import re
from nltk.tokenize.treebank import TreebankWordDetokenizer
from langdetect import detect



df1=pd.read_csv('TFG1.csv',encoding = 'utf8')

def find_all_words(words,sentence):
    all_words = re.findall(r'\w+',sentence)
    words_found = []
    for word in words:

        if word in all_words:
            words_found.append(word)
    return "Words found:",words_found.__len__()," The words are:",words_found


english_dic=['sage','selection']
spanish_dic=['grupo','bien']


TreebankWordDetokenizer().detokenize(df1["Reescribe aquí / Rewrite here"])

i=1

for rows in [x.lower() for x in df1["Reescribe aquí / Rewrite here"]]:

    if detect(rows)=='en':

        print(i,"-",rows,find_all_words(english_dic,rows),"Language of text:",detect(rows))

    elif detect(rows)=='es':

        print(i,find_all_words(spanish_dic,detect(rows))

    i += 1

打印:

1 - el grupo sage dijo que todo esta bien ('Words found:',2,' The words are:',['grupo','bien']) Language of text: es
2 - sage group clarifies that the selection of vaccines is optimal ('Words found:',['sage','selection']) Language of text: en

我想要的是,从我创建的预定义词典中的单词中,一个能够从这些单词中检测同义词并将它们作为有效值返回的代码

例如,它返回的不是“selection”,而是“choice”作为有效值。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。