如何将字符串数组转换为记录数组?

时间:2015-08-26 03:40:06

标签: python arrays numpy

这类似于How to convert an array of strings to an array of floats in numpy

我有一个字符串列表:

dat = [
    '  1  2  1.040000e+005  0.030000\n',
    '  2  7  0.000000e+000  0.030000\n',
    '  3  15  0.000000e+000  0.030000\n',
]

以下是我尝试创建numpy记录数组的失败:

import numpy as np
dat_dtype = [
    ('I', 'i'),
    ('J', 'i'),
    ('val1', 'd'),
    ('val2', 'd'),
]

# Attempt 1
np.array(dat, dat_dtype)
# looks like garbage

# Attempt 2
np.array([x.split() for x in dat], dtype=dat_dtype)
# looks like different garbage

# Attempt 3
string_ndarray = np.array([x.split() for x in dat], dtype='|S15')
# looks good so far
string_ndarray.astype(dat_dtype)
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1.040000e+005'

我放弃了。这是我获得预期输出的唯一方法:

dat_ndarray = np.zeros(len(dat), dat_dtype)
for i, line in enumarate(dat):
    dat_ndarray[i] = tuple(line.split())

print(dat_ndarray)  # [(1, 2, 104000.0, 0.03) (2, 7, 0.0, 0.03) (3, 15, 0.0, 0.03)]

是否有更直接的方法来获取预期的记录数组?

2 个答案:

答案 0 :(得分:1)

您的输入是文本行,因此您可以使用文本阅读器将其转换为数组(结构化或纯文本)。这是使用numpy.genfromtxt执行此操作的一种方法:

np.genfromtxt(dat, dtype=dat_dtype)

例如,

In [204]: dat
Out[204]: 
['  1  2  1.040000e+005  0.030000\n',
 '  2  7  0.000000e+000  0.030000\n',
 '  3  15  0.000000e+000  0.030000\n']

In [205]: dat_dtype
Out[205]: [('I', 'i'), ('J', 'i'), ('val1', 'f'), ('val2', 'f')]

In [206]: np.genfromtxt(dat, dtype=dat_dtype)
Out[206]: 
array([(1, 2, 104000.0, 0.029999999329447746), (2, 7, 0.0, 0.029999999329447746), (3, 15, 0.0, 0.029999999329447746)], 
      dtype=[('I', '<i4'), ('J', '<i4'), ('val1', '<f4'), ('val2', '<f4')])

答案 1 :(得分:1)

使用datdat_dtype,这有效:

In [667]: np.array([tuple(x.strip().split()) for x in dat],dtype=dat_dtype)
Out[667]: 
array([(1, 2, 104000.0, 0.03), (2, 7, 0.0, 0.03), (3, 15, 0.0, 0.03)], 
  dtype=[('I', '<i4'), ('J', '<i4'), ('val1', '<f8'), ('val2', '<f8')])

最好使用元组列表创建结构化数组。我剥离了\n,将其拆分为白色空间,然后形成了元组

In [668]: [tuple(x.strip().split()) for x in dat]
Out[668]: 
[('1', '2', '1.040000e+005', '0.030000'),
 ('2', '7', '0.000000e+000', '0.030000'),
 ('3', '15', '0.000000e+000', '0.030000')]

我让dat_dtype负责将字符串转换为数字转换。