我有一个文本文件,除其他数据外,还包含
格式的数据215
1 0.0 0.0 0.0
[...]
9 -0.4330127018930699 0.2499999999985268 1.0
10 -0.1366025403783193 -0.03660254037890862 1.0
11 -0.2499999999985268 -0.4330127018930699 1.0
12 0.03660254037890862 -0.1366025403783193 1.0
13 0.4330127018930699 -0.2499999999985268 1.0
14 0.1366025403783193 0.03660254037890862 1.0
15 0.2499999999985268 0.4330127018930699 1.0
[...]
215 1.0 1.0 1.0
[...] # some more data, other format
即,
我想将这些数据转换为numpy数组。由于我可以通过线路使用生成器来最好地访问文件,numpy.fromiter()
会派上用场。但是,我无法正确指定数据类型。此
with open(filename) as f:
line = islice(f, 1).next()
num_nodes = int(line)
points = numpy.fromiter(
islice(f, num_nodes),
dtype=[('idx', int, 1), ('vals', float, 3)],
count=num_nodes
)
不工作。任何提示?
答案 0 :(得分:0)
这个脚本:
import numpy as np
txt = b"""7
9 -0.4330127018930699 0.2499999999985268 1.0
10 -0.1366025403783193 -0.03660254037890862 1.0
11 -0.2499999999985268 -0.4330127018930699 1.0
12 0.03660254037890862 -0.1366025403783193 1.0
13 0.4330127018930699 -0.2499999999985268 1.0
14 0.1366025403783193 0.03660254037890862 1.0
15 0.2499999999985268 0.4330127018930699 1.0
[...] # some more data, other format
"""
dt = np.dtype([('idx', int, 1), ('vals', float, 3)])
#dt = np.dtype('i,f,f,f')
print(dt)
def gentxt(txt, dt):
f = txt.splitlines()
line = f[0]
num_nodes = int(line)
aslice = slice(1,num_nodes+1)
# print(f[aslice])
points = np.genfromtxt(
f[aslice],
dtype=dt)
return points
M = gentxt(txt,dt)
print(repr(M))
产生
1304:~/mypy$ python3 stack33406545.py
[('idx', '<i4'), ('vals', '<f8', (3,))]
array([(9, [-0.4330127018930699, 0.2499999999985268, 1.0]),
(10, [-0.1366025403783193, -0.03660254037890862, 1.0]),
(11, [-0.2499999999985268, -0.4330127018930699, 1.0]),
(12, [0.03660254037890862, -0.1366025403783193, 1.0]),
(13, [0.4330127018930699, -0.2499999999985268, 1.0]),
(14, [0.1366025403783193, 0.03660254037890862, 1.0]),
(15, [0.2499999999985268, 0.4330127018930699, 1.0])],
dtype=[('idx', '<i4'), ('vals', '<f8', (3,))])
我使用了简单的文本行列表切片。我试图像你一样使用islice
,但我认为不值得我把时间做好。中心的是使用产生所需文本行的interable。它是一个列表,一系列文件行还是生成器的输出并不重要。
fromiter
对它接受的内容很挑剔。它必须产生1d数组;
返回单个字符串(可转换为简单dtype)的列表或可迭代工作:
In [233]: np.fromiter(['1', '2', '3', '4'],dtype=int)
Out[233]: array([1, 2, 3, 4])
但是列表(2d)不会:
In [234]: np.fromiter([['1', '2'],['3', '4']],dtype=int)
....
ValueError: setting an array element with a sequence.
复杂的dtype我必须给它元组:
In [236]: np.fromiter([('1', '2'),('3', '4')],dtype=np.dtype('i,i'))
Out[236]:
array([(1, 2), (3, 4)], dtype=[('f0', '<i4'), ('f1', '<i4')])
包含多个数字的字符串或元组不起作用,['1 2','3 4']
,[('1 2',),('3 4',)]
。使用行和列(csv之类)处理文本时,genfromtxt
要好得多。