如何保存文本中出现不超过 3 次的单词？读取和写入文件

如何解决如何保存文本中出现不超过 3 次的单词？读取和写入文件

我现在正在处理一个名为“dracula.txt”的文本文件，我必须在 python 中执行以下操作：

将出现次数不超过 3 次的单词按降序保存在名为 less_common_words.txt 的文件中。每个单词及其计数应保存在单独的行中。

我将不胜感激！我已经为此工作了太久了。

我已经标记了我的文件并计算了字数。到目前为止，这是我的代码：

file = open("C:/Users/17733/Downloads/dracula.txt",'r',encoding = 'utf-8-sig')
data = file.read()
data
data_list = data.split('\n')
data_list 
new_list = []
for i in data_list:
    if i !='':
        ans_here = i.split(' ')
        new_list.extend(ans_here)
new_list 
import string
import re
puncs = list(string.punctuation)
puncs.append('"')
puncs.append('[')
puncs.append('.')
puncs.append('-')
puncs.append('_')
#append each seperately 
new_2 = []
for i in new_list:
    for p in puncs:
        if p in i:
            i_new = i.replace(p,' ')
            new_2.append(i_new)
new_2
new_2 = [i.replace('  ',' ').strip().lower() for i in new_2]
new_2

解决方法

这应该正是您所需要的 - 我通过将整个 txt 展平为二维列表来修复我之前的错误：

book_open = open('frankenstein.txt','r').readlines()
beauty_book = [i.split() for i in book_open]
flatten = []
for sublist in beauty_book:
    for val in sublist:
        flatten.append(val)
foo = 0
for i in flatten:
    list_open = open('less_common_words.txt','r').readlines()
    beauty_list = [i.replace('\n','') for i in list_open]
    count = flatten.count(flatten[foo])
    compile = str((flatten[foo],count))
    if count <= 3:
        if compile not in beauty_list:
            file = open('less_common_words.txt','a+')
            file.write('\n'+compile)
            file.close()
    foo += 1

from pathlib import Path
from collections import Counter
import string
filepath = Path('test.txt')
output_filepath = Path('outfile.txt')
# print(filepath.exists())
with open(filepath) as f:
    content = f.readlines()

word_list = sum((
    (s.lower().strip('\n').translate(str.maketrans('','',string.punctuation))).split(' ')
    for s in content
),[])

less_common_words = sorted([
    key for key,value in Counter(word_list).items() if value <= 3
],reverse=True)

with open(output_filepath,mode='wt',encoding='utf-8') as myfile:
    myfile.write('\n'.join(less_common_words))

如何保存文本中出现不超过 3 次的单词？读取和写入文件

如何解决如何保存文本中出现不超过 3 次的单词？读取和写入文件

解决方法

相关推荐