如何使用“节”获得动词的不定式形式?

如何解决如何使用“节”获得动词的不定式形式?

如何使用节找出句子中的不定式动词?

示例:

doc = "I need you to find the verbes in this sentence"
en_nlp = stanza.Pipeline('en',processors='tokenize,lemma,mwt,pos,depparse',verbose=False,use_gpu=False)
processed = en_nlp(doc)

print(*[f"id: {word.id}\t word: {word.text}\t POS: {word.pos}\t head id: {word.head}\t head: {sent.words[word.head-1].text if word.head > 0 else 'root'} \t deprel: {word.deprel}" for sent in processed.sentences for word in sent.words],sep='\n')

输出:

id: 1    word: I     POS: PRON   head id: 2  head: need      deprel: nsubj
id: 2    word: need  POS: VERB   head id: 0  head: root      deprel: root
id: 3    word: you   POS: PRON   head id: 2  head: need      deprel: obj
id: 4    word: to    POS: PART   head id: 5  head: find      deprel: mark
id: 5    word: find  POS: VERB   head id: 2  head: need      deprel: xcomp
id: 6    word: the   POS: DET    head id: 7  head: verbes    deprel: det
id: 7    word: verbes    POS: NOUN   head id: 5  head: find      deprel: obj
id: 8    word: in    POS: ADP    head id: 10     head: sentence      deprel: case
id: 9    word: this  POS: DET    head id: 10     head: sentence      deprel: det
id: 10   word: sentence  POS: NOUN   head id: 5  head: find      deprel: obl

但是,在这一行:

id:5个字:查找POS:VERB头id:2头:需要deprel:xcomp

我需要说这是一个不定式动词。

解决方法

我有一个相同的问题,不希望闯入分词器并最终调整节句。单词。

单词.feats表示不定式动词形式,如此处的id 7,我尚未测试其可靠性。

test_resp = "He was a little scared to knock on the door"
res = nlp(test_resp)
res.sentences[0].words[4:8]

为此

[{
   "id": 5,"text": "scared","lemma": "scared","upos": "ADJ","xpos": "JJ","feats": "Degree=Pos","head": 0,"deprel": "root","misc": "start_char=16|end_char=22"
 },{
   "id": 6,"text": "to","lemma": "to","upos": "PART","xpos": "TO","head": 7,"deprel": "mark","misc": "start_char=23|end_char=25"
 },{
   "id": 7,"text": "knock","lemma": "knock","upos": "VERB","xpos": "VB","feats": "VerbForm=Inf","head": 5,"deprel": "advcl","misc": "start_char=26|end_char=31"
 },{
   "id": 8,"text": "on","lemma": "on","upos": "ADP","xpos": "IN","head": 10,"deprel": "case","misc": "start_char=32|end_char=34"
 }]

就我的目的而言,将字符串“ to verb”视为单个词汇项,并将word.text更新为“ to_verb”,并使动词的字符跨度匹配更有用。这将使动词的word.lemma和word.upos保持为VERB不变,但需要减少动词的头和词位置索引以及后续的词以删除“ to”。

Deepcopy保护原始示例以供说明,如果可能的话,最好避免使用它。

import re
import sys
from copy import deepcopy

def patch_inf_verb(processed):
    """hack the parse to treat 'to VERB' as one word"""
 
    # modified sentence
    results = deepcopy(processed)
    
    # regex to captures the text and numerals in  word.misc,# e.g.,'start_char=11|stop_char=13'
    misc_vals_re = re.compile("(start_char=)(\d+)(\|end_char=)(?P<end>\d+)")

    for result in results.sentences:
        for wdx,word in enumerate(result.words):
            
            # peek back for "to"
            if wdx > 0 and word.pos == "VERB":
                one_back =  result.words[wdx - 1]
                if one_back.text.lower() == "to" and one_back.head == word.id:
                    
                    word.text = "to_" + word.text
                    # word.upos = "VERB_INF"  # update upos tag or leave as is

                    # parse verb's character span string
                    vals = misc_vals_re.match(word.misc).groups()
                    assert vals is not None
   
                    # nudge word.misc start_char back to span one-back "to"
                    word.misc = f"{vals[0]}{int(vals[1])-3}{vals[2]}{int(vals[3])}"
                    assert misc_vals_re.match(word.misc) is not None

                    # decrement the indexes for verb position and beyond,# the character spans don't change
                    for tdx in range(len(result.words)):
                        if result.words[tdx].id > wdx: result.words[tdx].id -= 1
                        if result.words[tdx].head > wdx: result.words[tdx].head -= 1
                    
                    # clobber the "to" after
                    del result.words[wdx - 1]
    return results

def format_results(results):
    """results in table format"""
    results_str = '\n'.join(
        [
            "\t".join(
                    [
                        f"{key:5s}: {val}" 
                        for key,val in word.to_dict().items() 
                        if key not in ["lemma","feats"]
                    ]
                )
                for sent in results.sentences 
                for word in sent.words
            ]
        )
    return results_str

OP示例:

print("python",sys.version)
print("stanza version:",stanza.__version__)

doc = "I need you to find the verbes in this sentence"
en_nlp = stanza.Pipeline('en',processors='tokenize,lemma,mwt,pos,depparse',verbose=False,use_gpu=False)
processed = en_nlp(doc)

print('OP stanza before\n',format_results(processed))

patched_to_verb = patch_inf_verb(processed)
print("after patch_inf_verb\n",format_results(patched_to_verb))

python 3.7.7 (default,Mar 26 2020,15:48:22) 
[GCC 7.3.0]
stanza version: 1.1.1
OP stanza before
 id   : 1   text : I    upos : PRON xpos : PRP  head : 2    deprel: nsubj   misc : start_char=0|end_char=1
id   : 2    text : need upos : VERB xpos : VBP  head : 0    deprel: root    misc : start_char=2|end_char=6
id   : 3    text : you  upos : PRON xpos : PRP  head : 2    deprel: obj misc : start_char=7|end_char=10
id   : 4    text : to   upos : PART xpos : TO   head : 5    deprel: mark    misc : start_char=11|end_char=13
id   : 5    text : find upos : VERB xpos : VB   head : 2    deprel: xcomp   misc : start_char=14|end_char=18
id   : 6    text : the  upos : DET  xpos : DT   head : 7    deprel: det misc : start_char=19|end_char=22
id   : 7    text : verbes   upos : NOUN xpos : NNS  head : 5    deprel: obj misc : start_char=23|end_char=29
id   : 8    text : in   upos : ADP  xpos : IN   head : 10   deprel: case    misc : start_char=30|end_char=32
id   : 9    text : this upos : DET  xpos : DT   head : 10   deprel: det misc : start_char=33|end_char=37
id   : 10   text : sentence upos : NOUN xpos : NN   head : 5    deprel: obl misc : start_char=38|end_char=46
after patch_inf_verb
 id   : 1   text : I    upos : PRON xpos : PRP  head : 2    deprel: nsubj   misc : start_char=0|end_char=1
id   : 2    text : need upos : VERB xpos : VBP  head : 0    deprel: root    misc : start_char=2|end_char=6
id   : 3    text : you  upos : PRON xpos : PRP  head : 2    deprel: obj misc : start_char=7|end_char=10
id   : 4    text : to_find  upos : VERB xpos : VB   head : 2    deprel: xcomp   misc : start_char=11|end_char=18
id   : 5    text : the  upos : DET  xpos : DT   head : 6    deprel: det misc : start_char=19|end_char=22
id   : 6    text : verbes   upos : NOUN xpos : NNS  head : 4    deprel: obj misc : start_char=23|end_char=29
id   : 7    text : in   upos : ADP  xpos : IN   head : 9    deprel: case    misc : start_char=30|end_char=32
id   : 8    text : this upos : DET  xpos : DT   head : 9    deprel: det misc : start_char=33|end_char=37
id   : 9    text : sentence upos : NOUN xpos : NN   head : 4    deprel: obl misc : start_char=38|end_char=46

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)&gt; insert overwrite table dwd_trade_cart_add_inc &gt; select data.id, &gt; data.user_id, &gt; data.course_id, &gt; date_format(
错误1 hive (edu)&gt; insert into huanhuan values(1,&#39;haoge&#39;); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive&gt; show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 &lt;configuration&gt; &lt;property&gt; &lt;name&gt;yarn.nodemanager.res