微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

在Python中读入文件,并跳过文本文件的标题部分

如何解决在Python中读入文件,并跳过文本文件的标题部分

|| 我从gutenberg.org上取了一本文本格式的书,正在尝试阅读该文本,但是跳过了文件的开头部分,然后使用我编写的处理功能来解析其余部分。我怎样才能做到这一点? 这是文本文件的开始。
> The Project Gutenberg EBook of The Kama Sutra of Vatsyayana,by Vatsyayana

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it,give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.net


Title: The Kama Sutra of Vatsyayana
       Translated From The Sanscrit In Seven Parts With Preface,Introduction and Concluding Remarks

Author: Vatsyayana

Translator: Richard Burton
            Bhagavanlal Indrajit
            Shivaram Parashuram Bhide

Release Date: January 18,2009 [EBook #27827]

Language: English


*** START OF THIS PROJECT GUTENBERG EBOOK THE KAMA SUTRA OF VATSYAYANA ***




Produced by Bruce Albrecht,Carla Foust,Jon noring and
the Online distributed Proofreading Team at
http://www.pgdp.net
和我当前处理整个文件代码
import string

def process_file(filename):
    \"\"\" opens a file and passes back a list of its words\"\"\"
    h = dict()
    fin = open(filename)
    for line in fin:
        process_line(line,h)
    return h

def process_line(line,h):
    line = line.replace(\'-\',\' \')

    for word in line.split():
        word = word.strip(string.punctuation + string.whitespace)
        word = word.lower()

        h[word] = h.get(word,0)+1
    

解决方法

添加此:
for line in fin:
   if \"START OF THIS PROJECT GUTENBERG BOOK\" in line:
       break
就在您自己的\“ for fin:\”循环中。     ,好吧,您只要阅读输入内容,直到符合条件即可跳过开头:
def process_file(filename):
    \"\"\" opens a file and passes back a list of its words\"\"\"
    h = dict()
    fin = open(filename)

    for line in fin:
        if line.rstrip() == \"*** START OF THIS PROJECT GUTENBERG EBOOK THE KAMA SUTRA OF VATSYAYANA ***\":
            break

    for line in fin:
        process_line(line,h)

    return h
请注意,在此示例中,我将ѭ4用作准则,但是您可以完全自己设置准则。     

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。