微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

需要帮助使用 spacy python 为 OR 和 AND 构建模式

如何解决需要帮助使用 spacy python 为 OR 和 AND 构建模式

假设我有一条文字说 输入句:Computer programming is the process of writing instructions that get executed by computers. The instructions,also kNown as code,are written in a programming language which the computer can understand and use to perform a task or solve a problem. Basic computer programming involves the analysis of a problem and development of a logical sequence of instructions to solve it.

我必须找出是否有任何短语与给定的文本句子匹配。

or_phrases = [efficient,design]
expected output - yes,because the above input sentence has a word "efficient"

and_phrase = [love,live]
expected output : None. Because the above input sentence doesn't have love or live anywhere in the entire sentence. Order  of the words doesn't matter. To convert this into a reg expr:
re.match('(?=.*love)|(?=.*live)'

Looking to put this rule into spacy's phrase or token matcher

有没有办法把这个语法模式放到一个 spacy 模式匹配器中?

or_phrases 应该给我包含其中一个词的句子。

and_phrases 应该给我包含这两个词的句子。

解决方法

使用 spacy,您可以使用 OR 轻松制作 'TEXT': {'IN':["word1","word2"]} 短语。在您的示例中,这将如下所示:

text = """Computer programming is the process of writing instructions that get executed by computers. The instructions,also known as code,are written in a programming language which the computer can understand and use to perform a task or solve a problem. Basic computer programming involves the analysis of a problem and development of a logical sequence of instructions to solve it. """

doc = nlp(text)
matcher = Matcher(nlp.vocab)
or_pattern = [[
        {"TEXT": {"IN": ["process","random"]}} # the list of "OR" words
]]
matcher.add("or_phrases",or_pattern)

for sent in doc.sents:
    matches = matcher(sent)
    if len(matches) > 0:
        print(sent)

我仍然无法为 AND 想出一个简单的解决方案,但我会尝试

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。