Question

我有一个包含内容的ASCII文本文件（格式无法更改）

previous data...
#
# some comment
2 a -0.9989532219119496
1 b 1.8002219998623799
1 c 0.2681232137509927
# 
some other things...

，我想将该文件读入具有自定义dtype（"structured array"）的数组中。当文件为二进制文件时，所有方法都可以工作（删除下面的sep="\n"），但是当文件为ASCII时，则失败：

import numpy as np
import string

# Create some fake data
N = 3
dtype = np.dtype([("a", "i4"), ("b", "S8"), ("c", "f8")])
a = np.zeros(N, dtype)
a["a"] = np.random.randint(0, 3, N)
a["b"] = np.array([x for x in string.ascii_lowercase[:N]])
a["c"] = np.random.normal(size=(N,))

print(a)

a.tofile("test.dat", sep="\n")
b = np.fromfile("test.dat", dtype=dtype, sep="\n")

print(b)

ValueError: Unable to read character files of that array type

这里有提示吗？

（该文件还包含其他数据，因此在现实生活中，我使用的是文件句柄而不是文件名字符串，但我想这在这里没有多大关系。）

Answer 1

In [286]: txt = """previous data... 
     ...: # 
     ...: # some comment 
     ...: 2 a -0.9989532219119496 
     ...: 1 b 1.8002219998623799 
     ...: 1 c 0.2681232137509927 
     ...: #  
     ...: some other things...""".splitlines()

使用我的评论中所述的参数：

In [289]: np.genfromtxt(txt, skip_header=1, max_rows=3, dtype=None, encoding=None)                                                                    
Out[289]: 
array([(2, 'a', -0.99895322), (1, 'b',  1.800222  ),
       (1, 'c',  0.26812321)],
      dtype=[('f0', '<i8'), ('f1', '<U1'), ('f2', '<f8')])
In [290]: _.shape                                                               
Out[290]: (3,)

从ascii文件中读取结构化数组

1 个答案: