如何解决使用正则表达式标记器进行 POS 标记
我想使用正则表达式标记器编写 POS 规则来修复以下标记。我的代码:
import nltk as nltk
from nltk import word_tokenize,UnigramTagger
from nltk.corpus import treebank
# download missing packages
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('treebank')
sent1 = "My aunt's can opener can open a drum"
tokens1 = word_tokenize(sent1)
tag1 = nltk.pos_tag(tokens1 )
print(tokens1)
print(tag1)
>> output : ['My','aunt',"'s",'can','opener','open','a','drum']
[('My','PRP$'),('aunt','NN'),("'s",'POS'),('can','MD'),('opener','VB'),('open',('a','DT'),('drum','NN')]
patterns = [(r'\w*er\b',(r'.*',(r'(?=<\'s).*','NN')]
default_tagger = nltk.RegexpTagger(patterns)
train_sentences = treebank.tagged_sents()
tagger1 = UnigramTagger(train_sentences,backoff=default_tagger)
tagger1_2=nltk.BigramTagger(train_sentences,backoff=tagger1)
tagger1_3=nltk.TrigramTagger(train_sentences,backoff=tagger1_2)
tagged1_true = tagger1_3.tag(tokens1)
tagged1_true
>> output : [('My','NN')]
我需要修复第一个“can”并使其成为“NN”
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。