通过readinto将二进制数据解析为ctypes结构对象

如何解决通过readinto将二进制数据解析为ctypes结构对象

该行定义实际上是用于定义位域的：

...
("more_funky_numbers_7bytes", c_uint, 56),
...

这是错的。位字段的大小应小于或等于类型的大小，因此c_uint最多应为32，多一位会引发异常：

ValueError: number of bits invalid for bit field

使用位域的示例：

from ctypes import *

class MyStructure(Structure):
    _fields_ = [
        # c_uint8 is 8 bits length
        ('a', c_uint8, 4), # first 4 bits of `a`
        ('b', c_uint8, 2), # next 2 bits of `a`
        ('c', c_uint8, 2), # next 2 bits of `a`
        ('d', c_uint8, 2), # since we are beyond the size of `a`
                           # new byte will be create and `d` will
                           # have the first two bits
    ]

mystruct = MyStructure()

mystruct.a = 0b0000
mystruct.b = 0b11
mystruct.c = 0b00
mystruct.d = 0b11

v = c_uint16()

# copy `mystruct` into `v`, I use Windows
cdll.msvcrt.memcpy(byref(v), byref(mystruct), sizeof(v))

print sizeof(mystruct) # 2 bytes, so 6 bits are left floating, you may
                       # want to memset with zeros
print bin(v.value)     # 0b1100110000

您需要的是7个字节，因此最终所做的操作是正确的：

...
("more_funky_numbers_7bytes", c_byte * 7),
...

至于结构的大小，它将是52，我将填充额外的字节以使结构在32位处理器上的4字节或64位上的8字节对齐。这里：

from ctypes import *

class BinaryHeader(BigEndianStructure):
    _fields_ = [
        ("sequence_number_4bytes", c_uint),
        ("ascii_text_32bytes", c_char * 32),
        ("timestamp_4bytes", c_uint),
        ("more_funky_numbers_7bytes", c_byte * 7),
        ("some_flags_1byte", c_byte),
        ("other_flags_1byte", c_byte),
        ("payload_length_2bytes", c_ushort),
    ]

mystruct = BinaryHeader(
    0x11111111,
    '\x22' * 32,
    0x33333333,
    (c_byte * 7)(*([0x44] * 7)),
    0x55,
    0x66,
    0x7777
)

print sizeof(mystruct)

with open('data.txt', 'wb') as f:
    f.write(mystruct)

多余的字节在文件之间other_flags_1byte和payload_length_2bytes中进行填充：

00000000 11 11 11 11 ....
00000004 22 22 22 22 """"
00000008 22 22 22 22 """"
0000000C 22 22 22 22 """"
00000010 22 22 22 22 """"
00000014 22 22 22 22 """"
00000018 22 22 22 22 """"
0000001C 22 22 22 22 """"
00000020 22 22 22 22 """"
00000024 33 33 33 33 3333
00000028 44 44 44 44 dddd
0000002C 44 44 44 55 DDDU
00000030 66 00 77 77 f.ww
            ^
         extra byte

当涉及文件格式和网络协议时，这是一个问题。要更改它，请按1打包：

 ...
class BinaryHeader(BigEndianStructure):
    _pack_ = 1
    _fields_ = [
        ("sequence_number_4bytes", c_uint),
...

该文件将是：

00000000 11 11 11 11 ....
00000004 22 22 22 22 """"
00000008 22 22 22 22 """"
0000000C 22 22 22 22 """"
00000010 22 22 22 22 """"
00000014 22 22 22 22 """"
00000018 22 22 22 22 """"
0000001C 22 22 22 22 """"
00000020 22 22 22 22 """"
00000024 33 33 33 33 3333
00000028 44 44 44 44 dddd
0000002C 44 44 44 55 DDDU
00000030 66 77 77    fww

至于struct，这不会使您的情况变得更容易。可悲的是，它不支持该格式的嵌套元组。例如这里：

>>> from struct import *
>>>
>>> data = '\x11\x11\x11\x11\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22
\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x33
\x33\x33\x33\x44\x44\x44\x44\x44\x44\x44\x55\x66\x77\x77'
>>>
>>> BinaryHeader = Struct('>I32cI7BBBH')
>>>
>>> BinaryHeader.unpack(data)
(286331153, '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"'
, '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"'
, '"', '"', 858993459, 68, 68, 68, 68, 68, 68, 68, 85, 102, 30583)
>>>

无法使用此结果namedtuple，您仍然可以根据索引对其进行解析。如果您可以做类似的事情，那将会起作用'>I(32c)(I)(7B)(B)(B)H'。自2003年以来，就一直要求使用此功能（扩展struct.unpack以生成嵌套元组），但是此后什么也没做。

解决方法

我尝试按照以下示例处理二进制格式：

http://dabeaz.blogspot.jp/2009/08/python-binary-io-
handling.html

>>> from ctypes import *
>>> class Point(Structure):
>>>     _fields_ = [ ('x',c_double),('y',('z',c_double) ]
>>>
>>> g = open("foo","rb") # point structure data
>>> q = Point()
>>> g.readinto(q)
24
>>> q.x
2.0

我已经定义了标头的结构，并试图将数据读入结构中，但是遇到了一些困难。我的结构是这样的：

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes",c_uint),("ascii_text_32bytes",c_char),("timestamp_4bytes",("more_funky_numbers_7bytes",c_uint,56),("some_flags_1byte",c_byte),("other_flags_1byte",("payload_length_2bytes",c_ushort),]

该ctypes的文档说：

对于像c_int这样的整数类型字段，可以给出第三个可选项目。它必须是一个小的正整数，用于定义字段的位宽。

因此，因为("more_funky_numbers_7bytes",我尝试将字段定义为7字节字段，但出现错误：

ValueError：位数对位字段无效

所以我的第一个问题是，如何定义7字节的int字段？

然后，如果我跳过该问题并注释掉“ more_funky_numbers_7bytes”字段，则将生成的数据加载到其中。但是按预期，只有1个字符被加载到“
ascii_text_32bytes”中。出于某种原因16，我假设返回的是它读入结构中的计算字节数…但是如果我注释掉我的“ funky
number”字段，而“ ascii_text_32bytes”仅给出一个字符（1个字节），不应该是13，而不是16 ？？？

然后，我尝试将char字段分解为一个单独的结构，并在Header结构中引用该字段。但这也不起作用…

class StupidStaticCharField(BigEndianStructure):
    _fields_ = [
                ("ascii_text_1",("ascii_text_2",("ascii_text_3",("ascii_text_4",("ascii_text_5",("ascii_text_6",("ascii_text_7",("ascii_text_8",("ascii_text_9",("ascii_text_10",("ascii_text_11",.
                .
                .
                ]

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes",StupidStaticCharField),#("more_funky_numbers_7bytes",]

因此，任何想法如何：

定义一个7字节的字段（我将需要使用定义的函数对其进行解码）
定义一个32字节的静态char字段

更新

我发现一种似乎有效的结构…

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes",c_char * 32),c_byte * 7),]

现在，我剩下的问题是，为什么在使用时.readinto()：

f = open(binaryfile,"rb")

mystruct = BinaryHeader()
f.readinto(mystruct)

它正在返回，52而不是预期的51。多余的字节是从哪里来的？

更新2
对于那些感兴趣的人，这里有一个替代方法的示例，该struct方法将值读入eryksun提到的namedtuple中：

>>> record = 'raymond   \x32\x12\x08\x01\x08'
>>> name,serialnum,school,gradelevel = unpack('<10sHHb',record)

>>> from collections import namedtuple
>>> Student = namedtuple('Student','name serialnum school gradelevel')
>>> Student._make(unpack('<10sHHb',record))
Student(name='raymond   ',serialnum=4658,school=264,gradelevel=8)