这类似于How to convert an array of strings to an array of floats in numpy。
我有一个字符串列表:
dat = [
' 1 2 1.040000e+005 0.030000\n',
' 2 7 0.000000e+000 0.030000\n',
' 3 15 0.000000e+000 0.030000\n',
]
以下是我尝试创建numpy记录数组的失败:
import numpy as np
dat_dtype = [
('I', 'i'),
('J', 'i'),
('val1', 'd'),
('val2', 'd'),
]
# Attempt 1
np.array(dat, dat_dtype)
# looks like garbage
# Attempt 2
np.array([x.split() for x in dat], dtype=dat_dtype)
# looks like different garbage
# Attempt 3
string_ndarray = np.array([x.split() for x in dat], dtype='|S15')
# looks good so far
string_ndarray.astype(dat_dtype)
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1.040000e+005'
我放弃了。这是我获得预期输出的唯一方法:
dat_ndarray = np.zeros(len(dat), dat_dtype)
for i, line in enumarate(dat):
dat_ndarray[i] = tuple(line.split())
print(dat_ndarray) # [(1, 2, 104000.0, 0.03) (2, 7, 0.0, 0.03) (3, 15, 0.0, 0.03)]
是否有更直接的方法来获取预期的记录数组?
答案 0 :(得分:1)
您的输入是文本行,因此您可以使用文本阅读器将其转换为数组(结构化或纯文本)。这是使用numpy.genfromtxt
执行此操作的一种方法:
np.genfromtxt(dat, dtype=dat_dtype)
例如,
In [204]: dat
Out[204]:
[' 1 2 1.040000e+005 0.030000\n',
' 2 7 0.000000e+000 0.030000\n',
' 3 15 0.000000e+000 0.030000\n']
In [205]: dat_dtype
Out[205]: [('I', 'i'), ('J', 'i'), ('val1', 'f'), ('val2', 'f')]
In [206]: np.genfromtxt(dat, dtype=dat_dtype)
Out[206]:
array([(1, 2, 104000.0, 0.029999999329447746), (2, 7, 0.0, 0.029999999329447746), (3, 15, 0.0, 0.029999999329447746)],
dtype=[('I', '<i4'), ('J', '<i4'), ('val1', '<f4'), ('val2', '<f4')])
答案 1 :(得分:1)
使用dat
和dat_dtype
,这有效:
In [667]: np.array([tuple(x.strip().split()) for x in dat],dtype=dat_dtype)
Out[667]:
array([(1, 2, 104000.0, 0.03), (2, 7, 0.0, 0.03), (3, 15, 0.0, 0.03)],
dtype=[('I', '<i4'), ('J', '<i4'), ('val1', '<f8'), ('val2', '<f8')])
最好使用元组列表创建结构化数组。我剥离了\n
,将其拆分为白色空间,然后形成了元组
In [668]: [tuple(x.strip().split()) for x in dat]
Out[668]:
[('1', '2', '1.040000e+005', '0.030000'),
('2', '7', '0.000000e+000', '0.030000'),
('3', '15', '0.000000e+000', '0.030000')]
我让dat_dtype
负责将字符串转换为数字转换。