Question

我在使用Python方面非常新，而且对C非常生疏，所以我提前道歉，因为我的声音是多么愚蠢和/或丢失。

我在C中有函数创建一个包含数据的.dat文件。我使用Python打开文件来读取文件。我需要阅读的一件事是在C函数中创建并以二进制打印的结构。在我的Python代码中，我在文件的相应行中读取结构。我已经尝试了逐项解开stuct项目并且整体没有成功。结构中的大多数项目在C代码中被声明为“真实”。我正在与其他人一起研究这个代码，主要的源代码是他的，并且已经将变量声明为“真实”。我需要把它放在一个循环中，因为我想读取目录中以'.dat'结尾的所有文件。要开始循环，我有：

for files in os.listdir(path):
  if files.endswith(".dat"):
    part = open(path + files, "rb")
    for line in part:

然后我读取包含结构的那一行之前的所有行。然后我到达那条线并且：

      part_struct = part.readline()
      r = struct.unpack('<d8', part_struct[0])

我正在尝试阅读存储在结构中的第一件事。我在这里看到了一个这样的例子。当我尝试这个时，我收到的错误是：

struct.error: repeat count given without format specifier

我会接受别人可以给我的任何和所有提示。我已经坚持了几天，并尝试了许多不同的东西。老实说，我认为我不理解struct模块，但我已尽可能多地阅读它。

谢谢！

Answer 1

您可以使用ctypes.Structure或struct.Struct来指定文件的格式。从C code in @perreal's answer生成的文件中读取结构：

"""
struct { double v; int t; char c;};
"""
from ctypes import *

class YourStruct(Structure):
    _fields_ = [('v', c_double),
                ('t', c_int),
                ('c', c_char)]

with open('c_structs.bin', 'rb') as file:
    result = []
    x = YourStruct()
    while file.readinto(x) == sizeof(x):
        result.append((x.v, x.t, x.c))

print(result)
# -> [(12.100000381469727, 17, 's'), (12.100000381469727, 17, 's'), ...]

见io.BufferedIOBase.readinto()。 Python 3支持它，但在Python 2.7中没有default file object。

struct.Struct需要明确指定填充字节（x）：

"""
struct { double v; int t; char c;};
"""
from struct import Struct

x = Struct('dicxxx')
with open('c_structs.bin', 'rb') as file:
    result = []
    while True:
        buf = file.read(x.size)
        if len(buf) != x.size:
            break
        result.append(x.unpack_from(buf))

print(result)

它产生相同的输出。

为避免不必要的复制Array.from_buffer(mmap_file)可用于从文件中获取结构数组：

import mmap # Unix, Windows
from contextlib import closing

with open('c_structs.bin', 'rb') as file:
    with closing(mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_COPY)) as mm: 
        result = (YourStruct * 3).from_buffer(mm) # without copying
        print("\n".join(map("{0.v} {0.t} {0.c}".format, result)))

Answer 2

一些C代码：

#include <stdio.h>
typedef struct { double v; int t; char c;} save_type;
int main() {
    save_type s = { 12.1f, 17, 's'};
    FILE *f = fopen("output", "w");
    fwrite(&s, sizeof(save_type), 1, f);
    fwrite(&s, sizeof(save_type), 1, f);
    fwrite(&s, sizeof(save_type), 1, f);
    fclose(f);
    return 0;
}

一些Python代码：

import struct
with open('output', 'rb') as f:
    chunk = f.read(16)
    while chunk != "":
        print len(chunk)
        print struct.unpack('dicccc', chunk)
        chunk = f.read(16)

输出：

(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')

但也有填充问题。填充的save_type大小为16，因此我们再读取3个字符并忽略它们。

Answer 3

格式说明符中的数字表示重复计数，但它必须在字母前面，如'<8d'。但是你说你只想读取结构的一个元素。我猜你只想要'<d'。我猜你试图指定要读取的字节数为8，但你不需要这样做。 d假设。

我也注意到你正在使用readline。这对于读取二进制数据似乎是错误的它将读取直到下一个回车/换行，这将在二进制数据中随机出现。你想要做的是使用read(size)，如下所示：

part_struct = part.read(8)
r = struct.unpack('<d', part_struct)

实际上，您应该小心，因为read可以返回的数据少于您的请求。如果有，你需要重复它。

part_struct = b''
while len(part_struct) < 8:
    data = part.read(8 - len(part_struct))
    if not data: raise IOException("unexpected end of file")
    part_struct += data
r = struct.unpack('<d', part_struct)

Answer 4

我最近遇到了同样的问题，所以我为这个任务创建了模块，存储在这里：http://pastebin.com/XJyZMyHX

示例代码：

String:
u8:1
u16:256
u32:65536
u64:4294967296
i8:-1
i16:-256
i32:-65536
i64:-4294967296
lli:42
flt:2.09999990463
dbl:3.01
string:u'testString\x00\x00'
array:(1, 2, 3, 4, 5)

Bytes in Stuct:102
Named tuple nt:
CStruct(u8=1, u16=256, u32=65536, u64=4294967296L, i8=-1, i16=-256, i32=-65536, i64=-4294967296L, lli=42, flt=2.0999999046325684, dbl=3.01, string="u'testString\\x00\\x00'", array=(1, 2, 3, 4, 5))
nt.string=u'testString\x00\x00'

结果应为：

@Override
public int onStartCommand(Intent intent, int flags, int startId) {
    // Your code here.
    return Service.START_STICKY;
}

Answer 5

Numpy 可用于读/写二进制数据。您只需要定义一个自定义 np.dtype 实例来定义您的 c-struct 的内存布局。

例如，这里有一些定义结构体的 C++ 代码（应该同样适用于 C 结构体，尽管我不是 C 专家）：

struct MyStruct {
    uint16_t FieldA;
    uint16_t pad16[3];
    uint32_t FieldB;
    uint32_t pad32[2];
    char     FieldC[4];
    uint64_t FieldD;
    uint64_t FieldE;
};

void write_struct(const std::string& fname, MyStruct h) {
    // This function serializes a MyStruct instance and
    // writes the binary data to disk.
    std::ofstream ofp(fname, std::ios::out | std::ios::binary);
    ofp.write(reinterpret_cast<const char*>(&h), sizeof(h));

}

根据我在 stackoverflow.com/a/5397638 上找到的建议，我在结构（pad16 和 pad32 字段）中包含了一些填充，以便序列化以更可预测的方式发生.我认为这是 C++ 的事情；使用普通的 ol' C 结构时可能没有必要。

现在，在 python 中，我们创建一个 numpy.dtype 对象来描述 MyStruct 的内存布局：

import numpy as np

my_struct_dtype =  np.dtype([
    ("FieldA"            , np.uint16  ,       ),
    ("pad16"             , np.uint16  , (3,)  ),
    ("FieldB"            , np.uint32          ),
    ("pad32"             , np.uint32  , (2,)  ),
    ("FieldC"            , np.byte    , (4,)  ),
    ("FieldD"            , np.uint64          ),
    ("FieldE"            , np.uint64          ),
])

然后使用 numpy 的 fromfile 读取您保存 c-struct 的二进制文件：

# read data
struct_data = np.fromfile(fpath, dtype=my_struct_dtype, count=1)[0]

FieldA         = struct_data["FieldA"]
FieldB         = struct_data["FieldB"]
FieldC         = struct_data["FieldC"]
FieldD         = struct_data["FieldD"]
FieldE         = struct_data["FieldE"]

if FieldA != expected_value_A:
    raise ValueError("Bad FieldA, got %d" % FieldA)
if FieldB != expected_value_B:
    raise ValueError("Bad FieldB, got %d" % FieldB)
if FieldC.tobytes() != b"expc":
    raise ValueError("Bad FieldC, got %s" % FieldC.tobytes().decode())
...

上面调用count=1中的np.fromfile(..., count=1)参数是为了让返回的数组只有一个元素；这意味着“从文件中读取第一个结构体实例”。请注意，我正在索引 [0] 以将该元素从数组中取出。

如果您将许多 c-struct 中的数据附加到同一个文件中，您可以使用 fromfile(..., count=n) 将 n struct 实例读取到形状为 (n,) 的 numpy 数组中。设置 count=-1 是 np.fromfile 和 np.frombuffer 函数的默认值，表示“读取所有数据”，从而生成形状为 (number_of_struct_instances,) 的一维数组。

您还可以使用 offset 的 np.fromfile 关键字参数来控制读取的数据将从文件中的哪个位置开始。

总而言之，这里有一些 numpy 函数，一旦您的自定义 dtype 被定义，它们就会很有用：

将二进制数据作为 numpy 数组读取：
- np.frombuffer(bytes_data, dtype=...)：解释给定的二进制数据（例如一个 python bytes 实例）作为给定数据类型的 numpy 数组。您可以定义自定义 dtype 描述了您的 c 结构体的内存布局。
- np.fromfile(filename, dtype=...)：从 filename 读取二进制数据。应该是一样的结果 np.frombuffer(open(filename, "rb").read(), dtype=...)。
将 numpy 数组写入二进制数据：
- ndarray.tobytes()：构造一个 python bytes 实例，包含来自给定 numpy 数组的原始数据。如果数组的数据有 dtype 对应于一个 c-struct，那么来自的字节 ndarray.tobytes 可以反序列化通过 c/c++ 并解释为该 c 结构的（数组）实例。
- ndarray.tofile(filename)：数组中的二进制数据写入 filename。然后可以通过 c/c++ 反序列化这些数据。相当于 open("filename", "wb").write(a.tobytes())。

从c中创建的struct读取python中的struct

5 个答案: