如何解决Python write参数必须为str
我正在使用this Python脚本将.XML wordpress文件转换为.txt。对于博客帖子,此方法效果很好,但是其他博客类型则不受欢迎。
我已经更改了一些代码,但是它仍然无法像博客文章那样用于其他文章类型。这是我目前拥有的代码:
#!/usr/bin/env python
"""This script converts WXR file to a number of plain text files.
WXR stands for "WordPress eXtended RSS",which basically is just a
regular XML file. This script extracts entries from the WXR file into
plain text files. Output format: article name prefixed by date for
posts,article name for pages.
Usage: wxr2txt.py filename [-o output_dir]
"""
import os
import re
import sys
from xml.etree import ElementTree
NAMESPACES = {
'content': 'http://purl.org/rss/1.0/modules/content/','wp': 'http://wordpress.org/export/1.2/',}
USAGE_STRING = "Usage: wxr2txt.py filename [-o output_dir]"
def main(argv):
filename,output_dir = _parse_and_validate_output(argv)
try:
data = ElementTree.parse(filename).getroot()
except ElementTree.ParseError:
_error("Invalid input file format. Can not parse the input.")
page_counter,post_counter = 0,0
for post in data.find('channel').findall('item'):
post_type = post.find('wp:post_type',namespaces=NAMESPACES).text
content = post.find('content:encoded',namespaces=NAMESPACES).text
date = post.find('wp:post_date',namespaces=NAMESPACES).text
title = post.find('title').text
date = date.split(' ')[0].replace('-','')
title = re.sub(r'[_]+','_',re.sub(r'[^a-z0-9+]',title.lower()))
if post_type == 'post':
post_filename = date + '_' + title + '.txt'
post_counter += 1
else:
post_filename = title + '.txt'
page_counter += 1
with open(os.path.join(output_dir,post_filename),'w') as post_file:
post_file.write(content.encode('utf8'))
post_counter += 1
print("Saved {} posts and {} pages in directory '{}'.".format(
post_counter,page_counter,output_dir))
def _parse_and_validate_output(argv):
if len(argv) not in (2,4):
_error("Wrong number of arguments.")
filename = argv[1]
if not os.path.isfile(filename):
_error("Input file does not exist (or not enough permissions).")
output_dir = argv[3] if len(argv) == 4 and argv[2] == '-o' else os.getcwd()
if not os.path.isdir(output_dir):
_error("Output directory does not exist (or not enough permissions).")
return filename,output_dir
def _error(text):
print(text)
print(USAGE_STRING)
sys.exit(1)
if __name__ == "__main__":
main(sys.argv)
执行脚本时,命令提示符中会弹出以下错误:
Traceback (most recent call last):
File "C:\Users\suppo\Desktop\python\script.py",line 71,in <module>
main(sys.argv)
File "C:\Users\suppo\Desktop\python\script.py",line 46,in main
post_file.write(content.encode('utf8'))
TypeError: write() argument must be str,not bytes
所以我知道我必须对内容变量进行编码/解码。但是我真的似乎无法弄清楚如何在此脚本中执行此操作。有人可以指出我正确的方向吗? :)
解决方法
尝试删除encode('utf-8')。尝试关注
post_file.write(content)
此外,您可以检查type(content)
以确保它是字符串。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。