如何解决如何停止遍历枚举列表的循环?
我已开始使用 Python 中的 enumerate()
函数,并希望改进我在 this Stackoverflow post 中首次讨论的关键字上下文脚本。
由于初始脚本仅检索每个关键字的第一个实例及其后续单词,因此我尝试编写一个脚本来遍历整个文件并将所有单词与关键字列表进行比较。
然而,发生的情况是,我得到了一个我的 Jupyter Notebook 无法处理的所谓无休止的结果列表。我什至尝试在枚举 break
大于分析的文本文件中的单词数时立即用 i
强制停止。不幸的是,这也不起作用。
我想我还没有完全掌握 enumerate()
函数背后的逻辑,希望得到您的建议。
这是我的当前脚本:
# Find keywords and "n" subsequent words in txt file
# credits to @jasonharper and @xander for prevIoUs updates
# cf. forum discussion on https://stackoverflow.com/questions/66972612/how-to-match-value-in-enumeration-to-a-keyword
import string
# function to find keywords in context
def wordsafter(keyword,source):
wordcount=len(source) # sample text has 5953 words in total
print(wordcount)
res_strings=[]
for i in range(0,wordcount):
if i < wordcount:
print(i) # prints correct range from 0 to 5952
for i,val in enumerate(source):
if val == keyword:
res_str=(' '.join(source[i:i + 10])) # show searchterm and subsequent n words
res_strings.append(res_str)
if i > wordcount:
break # how can I force function to check each word only once?
return(res_strings) # returns endless (?) list of results?
# open input txt file from local path
with open('C:\\somefile.txt','r',encoding='utf-8',errors='ignore') as f: # open file
data1 = f.read() # read content of file as string
data2 = data1.translate(str.maketrans('','',string.punctuation)).lower() # remove punctuation
data3 = " ".join(data2.split()) # remove additional whitespace from text
indata = list(data3.split()) # convert string to list
# define searchterms and call function
searchterms = ["proclamation"]
for keyword in searchterms:
result = wordsafter(keyword,indata)
if result:
print(result[600000]) # prints a valid string although whole file only has 5953 items
with open('C:\\Users\\anotherfile.txt','w',encoding="utf-8-sig") as file:
file.write(str(result)) # output file is so large it crashes when opened
解决方法
不确定发生了什么,但在另一台机器上运行原始脚本会产生完全正确的输出,不需要强制中断:
# Find keywords and five subsequent words
# Updated script with credits to @jasonharper and @xander
# cf. forum discussion on https://stackoverflow.com/questions/66972612/how-to-match-value-in-enumeration-to-a-keyword
import string
# function to find keywords in context
def wordsafter(keyword,source):
wordcount=len(source) # shows number of words in sample text
print(wordcount)
res_strings=[]
for i,val in enumerate(source):
if val == keyword:
res_str=(' '.join(source[i:i + 10])) # shows searchterm and subsequent "n" words
res_strings.append(res_str)
return(res_strings) # returns list of results
# open input txt file from local path
with open('C:\\Users\\input.txt','r',encoding='utf-8',errors='ignore') as f: # open file
data1 = f.read() # read content of file as string
data2 = data1.translate(str.maketrans('','',string.punctuation)).lower() # remove punctuation
data3 = " ".join(data2.split()) # remove additional whitespace from text
indata = list(data3.split()) # convert string to list
# define searchterms and call function
searchterms = ["proclamation","king"]
for keyword in searchterms:
result = wordsafter(keyword,indata)
if result:
print(result)
with open('C:\\Users\\output.txt','w',encoding="utf-8-sig") as file:
file.write(str(result)) # write output to file
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。