从ascii文件中读取结构化数组

时间:2019-10-21 13:00:14

标签: python arrays numpy io

我有一个包含内容的ASCII文本文件(格式无法更改)

previous data...
#
# some comment
2 a -0.9989532219119496
1 b 1.8002219998623799
1 c 0.2681232137509927
# 
some other things...

,我想将该文件读入具有自定义dtype("structured array")的数组中。当文件为二进制文件时,所有方法都可以工作(删除下面的sep="\n"),但是当文件为ASCII时,则失败:

import numpy as np
import string

# Create some fake data
N = 3
dtype = np.dtype([("a", "i4"), ("b", "S8"), ("c", "f8")])
a = np.zeros(N, dtype)
a["a"] = np.random.randint(0, 3, N)
a["b"] = np.array([x for x in string.ascii_lowercase[:N]])
a["c"] = np.random.normal(size=(N,))

print(a)

a.tofile("test.dat", sep="\n")
b = np.fromfile("test.dat", dtype=dtype, sep="\n")

print(b)
ValueError: Unable to read character files of that array type

这里有提示吗?

(该文件还包含其他数据,因此在现实生活中,我使用的是文件句柄而不是文件名字符串,但我想这在这里没有多大关系。)

1 个答案:

答案 0 :(得分:1)

In [286]: txt = """previous data... 
     ...: # 
     ...: # some comment 
     ...: 2 a -0.9989532219119496 
     ...: 1 b 1.8002219998623799 
     ...: 1 c 0.2681232137509927 
     ...: #  
     ...: some other things...""".splitlines()  

使用我的评论中所述的参数:

In [289]: np.genfromtxt(txt, skip_header=1, max_rows=3, dtype=None, encoding=None)                                                                    
Out[289]: 
array([(2, 'a', -0.99895322), (1, 'b',  1.800222  ),
       (1, 'c',  0.26812321)],
      dtype=[('f0', '<i8'), ('f1', '<U1'), ('f2', '<f8')])
In [290]: _.shape                                                               
Out[290]: (3,)