如何解决搜索关键字的同义词
import pandas as pd
import re
from nltk.tokenize.treebank import TreebankWordDetokenizer
from langdetect import detect
df1=pd.read_csv('TFG1.csv',encoding = 'utf8')
def find_all_words(words,sentence):
all_words = re.findall(r'\w+',sentence)
words_found = []
for word in words:
if word in all_words:
words_found.append(word)
return "Words found:",words_found.__len__()," The words are:",words_found
english_dic=['sage','selection']
spanish_dic=['grupo','bien']
TreebankWordDetokenizer().detokenize(df1["Reescribe aquí / Rewrite here"])
i=1
for rows in [x.lower() for x in df1["Reescribe aquí / Rewrite here"]]:
if detect(rows)=='en':
print(i,"-",rows,find_all_words(english_dic,rows),"Language of text:",detect(rows))
elif detect(rows)=='es':
print(i,find_all_words(spanish_dic,detect(rows))
i += 1
打印:
1 - el grupo sage dijo que todo esta bien ('Words found:',2,' The words are:',['grupo','bien']) Language of text: es
2 - sage group clarifies that the selection of vaccines is optimal ('Words found:',['sage','selection']) Language of text: en
我想要的是,从我创建的预定义词典中的单词中,一个能够从这些单词中检测同义词并将它们作为有效值返回的代码。
例如,它返回的不是“selection”,而是“choice”作为有效值。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。