如何解决如何保存文本中出现不超过 3 次的单词?读取和写入文件
我现在正在处理一个名为“dracula.txt”的文本文件,我必须在 python 中执行以下操作:
将出现次数不超过 3 次的单词按降序保存在名为 less_common_words.txt 的文件中。每个单词及其计数应保存在单独的行中。
我将不胜感激!我已经为此工作了太久了。
我已经标记了我的文件并计算了字数。到目前为止,这是我的代码:
file = open("C:/Users/17733/Downloads/dracula.txt",'r',encoding = 'utf-8-sig')
data = file.read()
data
data_list = data.split('\n')
data_list
new_list = []
for i in data_list:
if i !='':
ans_here = i.split(' ')
new_list.extend(ans_here)
new_list
import string
import re
puncs = list(string.punctuation)
puncs.append('"')
puncs.append('[')
puncs.append('.')
puncs.append('-')
puncs.append('_')
#append each seperately
new_2 = []
for i in new_list:
for p in puncs:
if p in i:
i_new = i.replace(p,' ')
new_2.append(i_new)
new_2
new_2 = [i.replace(' ',' ').strip().lower() for i in new_2]
new_2
解决方法
这应该正是您所需要的 - 我通过将整个 txt 展平为二维列表来修复我之前的错误:
book_open = open('frankenstein.txt','r').readlines()
beauty_book = [i.split() for i in book_open]
flatten = []
for sublist in beauty_book:
for val in sublist:
flatten.append(val)
foo = 0
for i in flatten:
list_open = open('less_common_words.txt','r').readlines()
beauty_list = [i.replace('\n','') for i in list_open]
count = flatten.count(flatten[foo])
compile = str((flatten[foo],count))
if count <= 3:
if compile not in beauty_list:
file = open('less_common_words.txt','a+')
file.write('\n'+compile)
file.close()
foo += 1
,
from pathlib import Path
from collections import Counter
import string
filepath = Path('test.txt')
output_filepath = Path('outfile.txt')
# print(filepath.exists())
with open(filepath) as f:
content = f.readlines()
word_list = sum((
(s.lower().strip('\n').translate(str.maketrans('','',string.punctuation))).split(' ')
for s in content
),[])
less_common_words = sorted([
key for key,value in Counter(word_list).items() if value <= 3
],reverse=True)
with open(output_filepath,mode='wt',encoding='utf-8') as myfile:
myfile.write('\n'.join(less_common_words))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。