如何解决从文件中读取 str 包含十六进制字节 str 字符并解码?
我有一个文件 example.log
,其中包含:
<POOR_IN200901UV xmlns="urn:hl7-org:v3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ITSVersion="XML_1.0"
xsi:schemaLocation="urn:hl7-org:v3
../../Schemas/POOR_IN200901UV20.xsd">\n\t<!-- \xe6\xb6\x88\xe6\x81\xafID -
->\n\t<id extension="BS002"/>
我想读取文件并将 str 转换为 utf-8
编码格式并写入新文件。目前我的代码如下:
with open("example_decoded.log",'w') as f:
for line in open("example.log",'r',encoding='utf-8'):
m = re.search("<POOR_IN200901UV",line)
if m:
line = line[m.start():-2]
line_bytes = bytes(line,encoding='raw_unicode_escape')
line_decoded = line_bytes.decode('utf-8')
print(line_decoded)
f.write(line_decoded)
else:
pass
但是 example_decoded.log
的内容:
<POOR_IN200901UV xmlns="urn:hl7-org:v3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ITSVersion="XML_1.0"
xsi:schemaLocation="urn:hl7-org:v3
../../Schemas/POOR_IN200901UV20.xsd">\n\t<!-- \xe6\xb6\x88\xe6\x81\xafID -
->\n\t<id extension="BS002"
\xe6\xb6\x88\xe6\x81\xaf
部分没有被解码,所以我想知道如何处理这个混合类型的 str 解码问题?
解决方法
decodedVal = struct.unpack(">f",bytes.fromhex(encdoded_val))[0]
请参阅以下链接以添加您的字节序并键入而不是 ">f"
https://docs.python.org/3/library/struct.html
,import codecs
decode_hex = codecs.getdecoder("hex_codec")
string = decode_hex(string)[0]
https://docs.python.org/3/library/codecs.html
,参考:Read hex characters and convert them to utf-8 using python 3
解决办法是:
with open("example_decoded.log",'w') as f:
for line in open("example.log",'r',encoding='utf-8'):
m = re.search("<POOR_IN200901UV",line)
if m:
line = line[m.start():-2]
line_decoded = bytes(line,'utf-8').decode('unicode_escape').encode('latin-1').decode('utf8')
print(line_decoded)
f.write(line_decoded)
else:
pass
虽然我不明白为什么encode('latin-1')
首先,
有人能解释一下吗?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。