Question

语言：Python 3.6 / 操作系统：Mac OS High Sierra / 环境：Xcode

我有一个具有不同数据类型的二进制文件。我使用以下命令阅读它：

fn=open(filePathname,mode='rb')

我创建如下所示的数据类型：

dt=np.dtype([('a','uint'),('b','uint'),('c','uint'),('d','uint'),('e','uint'),('f','uint'),('g',float),('h',np.float32)])

我使用np.fromfile（）从二进制文件进行了转换，如下所示：

numpy_data = np.fromfile(fn, dtype = dt)

我的期望是我将有一个数组显示该数组中的“实际”值，但是我得到的是一堆字节，它们在numpy_data数组中具有适当的类型。

输出

print(numpy_data['h'])

结果

[ 5.8315540e-39  6.0152250e-39  6.0582729e-39 ... -4.2051079e-07

8.4560821e + 17 3.0060693e-10]

问题-我知道数字不正确；我认为它显示字节数据。如果是这样，转换为“实际”价值的正确方法是什么？我使用块状的速度，我不喜欢struct.unpack（）方法。

Answer 1

以下是使用您的dtype的示例，它使用内存中的字节串而不是文件：

In [279]: dt=np.dtype([('a','uint'),('b','uint'),('c','uint'),('d','uint'),('e','uint'),('f','
     ...: uint'),('g',float),('h',np.float32)])
In [280]: 
In [280]: x = np.ones((3,),dt)
In [281]: x
Out[281]: 
array([(1, 1, 1, 1, 1, 1, 1., 1.), (1, 1, 1, 1, 1, 1, 1., 1.),
       (1, 1, 1, 1, 1, 1, 1., 1.)],
      dtype=[('a', '<u8'), ('b', '<u8'), ('c', '<u8'), ('d', '<u8'), ('e', '<u8'), ('f', '<u8'), ('g', '<f8'), ('h', '<f4')])
In [282]: x.tostring()
Out[282]: b'\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x80?\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x80?\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x80?'
In [283]: np.frombuffer(x.tostring(), dtype=dt)
Out[283]: 
array([(1, 1, 1, 1, 1, 1, 1., 1.), (1, 1, 1, 1, 1, 1, 1., 1.),
       (1, 1, 1, 1, 1, 1, 1., 1.)],
      dtype=[('a', '<u8'), ('b', '<u8'), ('c', '<u8'), ('d', '<u8'), ('e', '<u8'), ('f', '<u8'), ('g', '<f8'), ('h', '<f4')])

uint到底应该是什么uint8-1字节无符号整数而不是完整的8字节版本？

In [300]: dt1=np.dtype([('a','uint8'),('b','uint8'),('c','uint8'),('d','uint8'),('e','uint8'),
     ...: ('f','uint8'),('g',float),('h',np.float32)])
In [301]: y = np.ones((10,),dt1)

itemsize对于两个数组是完全不同的：

In [302]: x.itemsize
Out[302]: 60
In [303]: y.itemsize
Out[303]: 18

但是如果总大小正确，则可以与另一个读取（10 * 18 == 3 * 60）：

In [304]: np.frombuffer(y.tostring(), dtype=dt1)
Out[304]: 
array([(1, 1, 1, 1, 1, 1, 1., 1.), (1, 1, 1, 1, 1, 1, 1., 1.),
       (1, 1, 1, 1, 1, 1, 1., 1.), (1, 1, 1, 1, 1, 1, 1., 1.),
       (1, 1, 1, 1, 1, 1, 1., 1.), (1, 1, 1, 1, 1, 1, 1., 1.),
       (1, 1, 1, 1, 1, 1, 1., 1.), (1, 1, 1, 1, 1, 1, 1., 1.),
       (1, 1, 1, 1, 1, 1, 1., 1.), (1, 1, 1, 1, 1, 1, 1., 1.)],
      dtype=[('a', 'u1'), ('b', 'u1'), ('c', 'u1'), ('d', 'u1'), ('e', 'u1'), ('f', 'u1'), ('g', '<f8'), ('h', '<f4')])

不匹配是dtypes

In [305]: np.frombuffer(y.tostring(), dtype=dt)
Out[305]: 
array([(      1103823438081,    70300024700928,   72340172838092672, 4607182418800017408, 72340173886586880,                 257, 7.8598509e-304, 2.3694278e-38),
       (4607182418800017408, 72340173886586880,                 257,   72408888003018736,          16843009, 4575657222481117184, 5.4536124e-312, 0.0000000e+00),
       (  72408888003018736,          16843009, 4575657222481117184,       1103823438081,    70300024700928,   72340172838092672, 1.0000000e+000, 1.0000000e+00)],
      dtype=[('a', '<u8'), ('b', '<u8'), ('c', '<u8'), ('d', '<u8'), ('e', '<u8'), ('f', '<u8'), ('g', '<f8'), ('h', '<f4')])

如果dtypes不匹配，我们很可能会遇到错误：

ValueError: buffer size must be a multiple of element size

在dtype中仍然存在偶然的不匹配现象，可以解释fromfile读取运行但产生错误的值，尤其是看起来相差甚远的值。

根据您的评论，我写了：

In [347]: dt1=np.dtype([('a','u4'),('b','u4'),('c','u4'),('d','u4'),('e','u4'),('f','u4'),('g','f4')])

我暂时跳过h。

现在创建一个包含多个记录的数组：

In [351]: x=np.ones((3,),dt1); x['g'][0]=10
In [352]: x
Out[352]: 
array([(1, 1, 1, 1, 1, 1, 10.), (1, 1, 1, 1, 1, 1,  1.),
       (1, 1, 1, 1, 1, 1,  1.)],
      dtype=[('a', '<u4'), ('b', '<u4'), ('c', '<u4'), ('d', '<u4'), ('e', '<u4'), ('f', '<u4'), ('g', '<f4')])

将其写入，然后仅加载一个计数：

In [353]: np.frombuffer(x.tostring(), count=1,dtype=dt1)
Out[353]: 
array([(1, 1, 1, 1, 1, 1, 10.)],
      dtype=[('a', '<u4'), ('b', '<u4'), ('c', '<u4'), ('d', '<u4'), ('e', '<u4'), ('f', '<u4'), ('g', '<f4')])

我建议使用此dt1，并使用它仅加载一条记录。查看这些值是否合理。

我跳过了h，因为您将其描述为'h' is a float array such that it is 4*g bytes long。如果'g'为浮点，则该长度定义不明确；它应该是某种int类型。

如果此时fn处于打开状态，则np.fromfile(fn, '<f4', count=n)可能会加载该'h'数组。我将从一个小的n开始，只是看这是否有希望，然后再尝试使用更大的值或开放式-1。

换句话说，您的描述听起来像文件包含一个固定大小的标头，然后是一个可变大小的浮点数块。

使用np.fromfile（）从二进制文件浮点数组数据

1 个答案: