如何解决如何使用“节”获得动词的不定式形式?
如何使用节找出句子中的不定式动词?
示例:
doc = "I need you to find the verbes in this sentence"
en_nlp = stanza.Pipeline('en',processors='tokenize,lemma,mwt,pos,depparse',verbose=False,use_gpu=False)
processed = en_nlp(doc)
print(*[f"id: {word.id}\t word: {word.text}\t POS: {word.pos}\t head id: {word.head}\t head: {sent.words[word.head-1].text if word.head > 0 else 'root'} \t deprel: {word.deprel}" for sent in processed.sentences for word in sent.words],sep='\n')
输出:
id: 1 word: I POS: PRON head id: 2 head: need deprel: nsubj
id: 2 word: need POS: VERB head id: 0 head: root deprel: root
id: 3 word: you POS: PRON head id: 2 head: need deprel: obj
id: 4 word: to POS: PART head id: 5 head: find deprel: mark
id: 5 word: find POS: VERB head id: 2 head: need deprel: xcomp
id: 6 word: the POS: DET head id: 7 head: verbes deprel: det
id: 7 word: verbes POS: NOUN head id: 5 head: find deprel: obj
id: 8 word: in POS: ADP head id: 10 head: sentence deprel: case
id: 9 word: this POS: DET head id: 10 head: sentence deprel: det
id: 10 word: sentence POS: NOUN head id: 5 head: find deprel: obl
但是,在这一行:
id:5个字:查找POS:VERB头id:2头:需要deprel:xcomp
我需要说这是一个不定式动词。
解决方法
我有一个相同的问题,不希望闯入分词器并最终调整节句。单词。
单词.feats表示不定式动词形式,如此处的id 7,我尚未测试其可靠性。
test_resp = "He was a little scared to knock on the door"
res = nlp(test_resp)
res.sentences[0].words[4:8]
为此
[{
"id": 5,"text": "scared","lemma": "scared","upos": "ADJ","xpos": "JJ","feats": "Degree=Pos","head": 0,"deprel": "root","misc": "start_char=16|end_char=22"
},{
"id": 6,"text": "to","lemma": "to","upos": "PART","xpos": "TO","head": 7,"deprel": "mark","misc": "start_char=23|end_char=25"
},{
"id": 7,"text": "knock","lemma": "knock","upos": "VERB","xpos": "VB","feats": "VerbForm=Inf","head": 5,"deprel": "advcl","misc": "start_char=26|end_char=31"
},{
"id": 8,"text": "on","lemma": "on","upos": "ADP","xpos": "IN","head": 10,"deprel": "case","misc": "start_char=32|end_char=34"
}]
就我的目的而言,将字符串“ to verb”视为单个词汇项,并将word.text更新为“ to_verb”,并使动词的字符跨度匹配更有用。这将使动词的word.lemma和word.upos保持为VERB不变,但需要减少动词的头和词位置索引以及后续的词以删除“ to”。
Deepcopy保护原始示例以供说明,如果可能的话,最好避免使用它。
import re
import sys
from copy import deepcopy
def patch_inf_verb(processed):
"""hack the parse to treat 'to VERB' as one word"""
# modified sentence
results = deepcopy(processed)
# regex to captures the text and numerals in word.misc,# e.g.,'start_char=11|stop_char=13'
misc_vals_re = re.compile("(start_char=)(\d+)(\|end_char=)(?P<end>\d+)")
for result in results.sentences:
for wdx,word in enumerate(result.words):
# peek back for "to"
if wdx > 0 and word.pos == "VERB":
one_back = result.words[wdx - 1]
if one_back.text.lower() == "to" and one_back.head == word.id:
word.text = "to_" + word.text
# word.upos = "VERB_INF" # update upos tag or leave as is
# parse verb's character span string
vals = misc_vals_re.match(word.misc).groups()
assert vals is not None
# nudge word.misc start_char back to span one-back "to"
word.misc = f"{vals[0]}{int(vals[1])-3}{vals[2]}{int(vals[3])}"
assert misc_vals_re.match(word.misc) is not None
# decrement the indexes for verb position and beyond,# the character spans don't change
for tdx in range(len(result.words)):
if result.words[tdx].id > wdx: result.words[tdx].id -= 1
if result.words[tdx].head > wdx: result.words[tdx].head -= 1
# clobber the "to" after
del result.words[wdx - 1]
return results
def format_results(results):
"""results in table format"""
results_str = '\n'.join(
[
"\t".join(
[
f"{key:5s}: {val}"
for key,val in word.to_dict().items()
if key not in ["lemma","feats"]
]
)
for sent in results.sentences
for word in sent.words
]
)
return results_str
OP示例:
print("python",sys.version)
print("stanza version:",stanza.__version__)
doc = "I need you to find the verbes in this sentence"
en_nlp = stanza.Pipeline('en',processors='tokenize,lemma,mwt,pos,depparse',verbose=False,use_gpu=False)
processed = en_nlp(doc)
print('OP stanza before\n',format_results(processed))
patched_to_verb = patch_inf_verb(processed)
print("after patch_inf_verb\n",format_results(patched_to_verb))
python 3.7.7 (default,Mar 26 2020,15:48:22)
[GCC 7.3.0]
stanza version: 1.1.1
OP stanza before
id : 1 text : I upos : PRON xpos : PRP head : 2 deprel: nsubj misc : start_char=0|end_char=1
id : 2 text : need upos : VERB xpos : VBP head : 0 deprel: root misc : start_char=2|end_char=6
id : 3 text : you upos : PRON xpos : PRP head : 2 deprel: obj misc : start_char=7|end_char=10
id : 4 text : to upos : PART xpos : TO head : 5 deprel: mark misc : start_char=11|end_char=13
id : 5 text : find upos : VERB xpos : VB head : 2 deprel: xcomp misc : start_char=14|end_char=18
id : 6 text : the upos : DET xpos : DT head : 7 deprel: det misc : start_char=19|end_char=22
id : 7 text : verbes upos : NOUN xpos : NNS head : 5 deprel: obj misc : start_char=23|end_char=29
id : 8 text : in upos : ADP xpos : IN head : 10 deprel: case misc : start_char=30|end_char=32
id : 9 text : this upos : DET xpos : DT head : 10 deprel: det misc : start_char=33|end_char=37
id : 10 text : sentence upos : NOUN xpos : NN head : 5 deprel: obl misc : start_char=38|end_char=46
after patch_inf_verb
id : 1 text : I upos : PRON xpos : PRP head : 2 deprel: nsubj misc : start_char=0|end_char=1
id : 2 text : need upos : VERB xpos : VBP head : 0 deprel: root misc : start_char=2|end_char=6
id : 3 text : you upos : PRON xpos : PRP head : 2 deprel: obj misc : start_char=7|end_char=10
id : 4 text : to_find upos : VERB xpos : VB head : 2 deprel: xcomp misc : start_char=11|end_char=18
id : 5 text : the upos : DET xpos : DT head : 6 deprel: det misc : start_char=19|end_char=22
id : 6 text : verbes upos : NOUN xpos : NNS head : 4 deprel: obj misc : start_char=23|end_char=29
id : 7 text : in upos : ADP xpos : IN head : 9 deprel: case misc : start_char=30|end_char=32
id : 8 text : this upos : DET xpos : DT head : 9 deprel: det misc : start_char=33|end_char=37
id : 9 text : sentence upos : NOUN xpos : NN head : 4 deprel: obl misc : start_char=38|end_char=46
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。