我想创建一个包含列表列表的numpy
数组。数据类型应为float, float, string
。 为什么这不起作用?(注意:我已经阅读了此question)。
import numpy
print numpy.array([[u'1.2', u'1.3', u'hello'], [u'1.4', u'1.5', u'hi']], dtype='f,f,str')
输出:
[[(4.2245014868923476e-39, 7.006492321624085e-44, '')
(4.2245014868923476e-39, 7.146622168056567e-44, '')
(9.275530846997402e-39, 9.918384925297198e-39, '')]
[(4.2245014868923476e-39, 7.286752014489049e-44, '')
(4.2245014868923476e-39, 7.42688186092153e-44, '')
(9.642872831629367e-39, 0.0, '')]]
答案 0 :(得分:2)
正如我之前的回答和评论所强调的,复合dtype的正常输入是元组列表。说穿了,这就是np.array
的工作方式。
In [308]: numpy.array([[u'1.2', u'1.3', u'hello'], [u'1.4', u'1.5', u'hi']], dtype='f,f,str')
TypeError: a bytes-like object is required, not 'str'
使用元组列表和改进的dtype
:
In [311]: numpy.array([(u'1.2', u'1.3', u'hello'), (u'1.4', u'1.5', u'hi')], dtype='f8,f8,U10')
Out[311]:
array([( 1.2, 1.3, 'hello'), ( 1.4, 1.5, 'hi')],
dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<U10')])
关于正常元组列表的可能方法(我现在无法测试):
Make a zeros array of the right shape and dtype
Make an object array from the list of lists (or a 2d array of strings)
Assign columns of the 2d array to fields of the structured (a loop)
在少数字段上循环通常比在许多记录上循环更快。
但是,将列表列表转换为元组列表不应该那么昂贵。
In [314]: alist = [[u'1.2', u'1.3', u'hello'], [u'1.4', u'1.5', u'hi']]
In [316]: dt = np.dtype('f8,f8,U10')
使用元组列表进行设置:
In [317]: np.array([tuple(a) for a in alist], dtype=dt)
Out[317]:
array([( 1.2, 1.3, 'hello'), ( 1.4, 1.5, 'hi')],
dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<U10')])
设置字段:
In [319]: res = np.zeros(len(alist), dtype=dt)
In [320]: temp = np.array(alist)
In [321]: temp # default string dtype
Out[321]:
array([['1.2', '1.3', 'hello'],
['1.4', '1.5', 'hi']],
dtype='<U5')
In [322]: for i,n in enumerate(dt.names):
...: res[n] = temp[:,i]
...:
In [323]: res
Out[323]:
array([( 1.2, 1.3, 'hello'), ( 1.4, 1.5, 'hi')],
dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<U10')])
对于这个小案例,元组方法列表更快。使用更长的字段可能会更快,但必须进行测试
In [325]: timeit np.array([tuple(a) for a in alist], dtype=dt)
6.26 µs ± 6.28 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [326]: %%timeit
...: res = np.zeros(len(alist), dtype=dt)
...: temp = np.array(alist)
...: for i,n in enumerate(dt.names):
...: res[n] = temp[:,i]
...:
18.2 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
但即使有很多行,元组转换也会更快:
In [334]: arr = np.random.randint(0,100,(100000,3)).astype('U10')
In [335]: alist = arr.tolist()
In [336]: timeit np.array([tuple(a) for a in alist], dtype=dt)
93.5 ms ± 322 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [337]: %%timeit
...: res = np.zeros(len(alist), dtype=dt)
...: temp = np.array(alist)
...: for i,n in enumerate(dt.names):
...: res[n] = temp[:,i]
...:
124 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
从定时循环中拉出元组理解可节省一些时间:
In [341]: %%timeit temp = [tuple(a) for a in alist]
...: np.array(temp, dtype=dt)
...:
65.4 ms ± 98.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
将str阵列创建拉出时间:
In [342]: %%timeit temp = np.array(alist)
...: res = np.zeros(len(alist), dtype=dt)
...: for i,n in enumerate(dt.names):
...: res[n] = temp[:,i]
...:
71 ms ± 447 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
只需从列表中创建字符串数组比元组转换更昂贵。
答案 1 :(得分:0)
正如我在这篇帖子in this post中所述,它与dtype ='object'
一起使用print(numpy.array([[u'1.2', u'1.3', u'hello'], [u'1.4', u'1.5', u'hi']], dtype='object'))
(适用于python 3.7.1)