微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何修复此代码并制作我自己的 POS 标记器? PYTHON

如何解决如何修复此代码并制作我自己的 POS 标记器? PYTHON

我的程序需要读取一个包含句子的文件并产生这样的输出

输入:Ixé Maria。 输出:Ixé\PRON Maria\N-PR。

直到现在,我写了这个,但是输出文件给了我一个空的文本文件。 (请给我建议):

infile = open('corpus_test.txt','r',encoding='utf-8').read()
outfile = open('tag_test.txt','w',encoding='utf-8')

dicionario = {'mimbira': 'N','anama-itá': 'N-PL','Maria': 'N-PR','sumuara-kunhã': 'N-FEM','sumuara-kunhã-itá': 'N-FEM-PL','sapukaia-apigaua': 'N-MASC','sapukaia-apigaua-itá': 'N-MASC-PL','nhaã': 'DEM','nhaã-itá': 'DEM-PL','ne': 'POS','mukuĩ': 'NUM','muíri': 'QUANT','iepé': 'INDF','pirasua': 'A1','pusé': 'A2','ixé': 'PRON1','se': 'PRON2','. ;': 'PUNCT'
             }

np_words = dicionario.keys()
np_tags = dicionario.values()

for line in infile.splitlines():
   list_of_words = line.split()
   if np_words in list_of_words:
       tag_word = list_of_words.index(np_words)+1
       word_tagged = list_of_words.insert(tag_word,f'\{np_tags}') 
       word_tagged = " ".join(word_tagged)
       print(word_tagged,file=outfile)

outfile.close()

解决方法

简单地从 NLP 开始可以更容易理解并欣赏更先进的系统。

这提供了您要查找的内容:

# Use 'with' so that the file is automatically closed when the 'with' ends.
with open('corpus_test.txt','r',encoding='utf-8') as f:
    # splitlines is not a method,readlines is.
    # infile will contain a list,where each item is a line.
    # e.g. infile[0] = line 1.
    infile = f.readlines()

dicionario = {
    'Maria': 'N-PR','ixé': 'PRON1',}

# Make a list to hold the new lines
outlines = []

for line in infile:
    list_of_words = line.split()
    
    new_line = ''
    # 'if np_words in list_of_words' is asking too much of Python.
    for word in list_of_words:
        # todo: Dictionaries are case-sensitive,so ixé is different to Ixé.
        if word in dicionario:
            new_line += word + '\\' + dicionario[word] + ' '
        else:
            new_line += word + ' '

    # Append the completed new line to the list and add a carriage return.
    outlines.append(new_line.strip() + '\n')

with open('tag_test.txt','w',encoding='utf-8') as f:
    f.writelines(outlines)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。