我有一个包含内容的ASCII文本文件(格式无法更改)
previous data...
#
# some comment
2 a -0.9989532219119496
1 b 1.8002219998623799
1 c 0.2681232137509927
#
some other things...
,我想将该文件读入具有自定义dtype("structured array")的数组中。当文件为二进制文件时,所有方法都可以工作(删除下面的sep="\n"
),但是当文件为ASCII时,则失败:
import numpy as np
import string
# Create some fake data
N = 3
dtype = np.dtype([("a", "i4"), ("b", "S8"), ("c", "f8")])
a = np.zeros(N, dtype)
a["a"] = np.random.randint(0, 3, N)
a["b"] = np.array([x for x in string.ascii_lowercase[:N]])
a["c"] = np.random.normal(size=(N,))
print(a)
a.tofile("test.dat", sep="\n")
b = np.fromfile("test.dat", dtype=dtype, sep="\n")
print(b)
ValueError: Unable to read character files of that array type
这里有提示吗?
(该文件还包含其他数据,因此在现实生活中,我使用的是文件句柄而不是文件名字符串,但我想这在这里没有多大关系。)
答案 0 :(得分:1)
In [286]: txt = """previous data...
...: #
...: # some comment
...: 2 a -0.9989532219119496
...: 1 b 1.8002219998623799
...: 1 c 0.2681232137509927
...: #
...: some other things...""".splitlines()
使用我的评论中所述的参数:
In [289]: np.genfromtxt(txt, skip_header=1, max_rows=3, dtype=None, encoding=None)
Out[289]:
array([(2, 'a', -0.99895322), (1, 'b', 1.800222 ),
(1, 'c', 0.26812321)],
dtype=[('f0', '<i8'), ('f1', '<U1'), ('f2', '<f8')])
In [290]: _.shape
Out[290]: (3,)