我有一个这样的数组:
array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
('6601', 2.2452745388799898e-27, 0.99999999995270605),
('21801', 1.9849650921836601e-31, 0.99999999997999001), ...,
('45164194', 1.0413482803123399e-24, 0.99999999997453404),
('45164198', 1.09470356446595e-24, 0.99999999997635303),
('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
我希望将其转换为:(在第一列的每个值上添加前缀&#39; 2R&#39;)
array([('2R:6506', 4.6725971801473496e-25, 0.99999999995088695),
('2R:6601', 2.2452745388799898e-27, 0.99999999995270605),
('2R:21801', 1.9849650921836601e-31, 0.99999999997999001), ...,
('2R:45164194', 1.0413482803123399e-24, 0.99999999997453404),
('2R:45164198', 1.09470356446595e-24, 0.99999999997635303),
('2R:45164519', 3.7521365799080699e-24, 0.99999999997453404)],
dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
我查了一些关于nditer的东西(但是我想支持早期版本的numpy。)另外我读一个应该避免迭代。
答案 0 :(得分:5)
使用numpy.core.defchararray.add
:
>>> from numpy import array
>>> from numpy.core.defchararray import add
>>>
>>> xs = array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
... ('6601', 2.2452745388799898e-27, 0.99999999995270605),
... ('21801', 1.9849650921836601e-31, 0.99999999997999001),
... ('45164194', 1.0413482803123399e-24, 0.99999999997453404),
... ('45164198', 1.09470356446595e-24, 0.99999999997635303),
... ('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
... dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
>>> xs['pos'] = add('2R:', xs['pos'])
>>> xs
array([('2R:6506', 4.67259718014735e-25, 0.999999999950887),
('2R:6601', 2.24527453887999e-27, 0.999999999952706),
('2R:21801', 1.98496509218366e-31, 0.99999999997999),
('2R:45164194', 1.04134828031234e-24, 0.999999999974534),
('2R:45164198', 1.09470356446595e-24, 0.999999999976353),
('2R:45164519', 3.75213657990807e-24, 0.999999999974534)],
dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])
答案 1 :(得分:2)
一个简单的(尽管可能不是最优的)解决方案就是:
a = np.array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
('6601', 2.2452745388799898e-27, 0.99999999995270605),
('21801', 1.9849650921836601e-31, 0.99999999997999001),
('45164194', 1.0413482803123399e-24, 0.99999999997453404),
('45164198', 1.09470356446595e-24, 0.99999999997635303),
('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
a['pos'] = [''.join(('2R:',x)) for x in a['pos']]
In [11]: a
Out[11]:
array([('2R:6506', 4.67259718014735e-25, 0.999999999950887),
('2R:6601', 2.24527453887999e-27, 0.999999999952706),
('2R:21801', 1.98496509218366e-31, 0.99999999997999),
('2R:45164194', 1.04134828031234e-24, 0.999999999974534),
('2R:45164198', 1.09470356446595e-24, 0.999999999976353),
('2R:45164519', 3.75213657990807e-24, 0.999999999974534)],
dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])
虽然我喜欢@fattru的使用核心numpy例程的答案,但令人惊讶的是,列表理解似乎更快一些:
In [19]: a = np.empty(20000, dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])
In [20]: %timeit a['pos'] = [''.join(('2R:',x)) for x in a['pos']]
100 loops, best of 3: 11.1 ms per loop
In [21]: %timeit a['pos'] = add('2R:', a['pos'])
100 loops, best of 3: 15.7 ms per loop
绝对对您自己的用例和硬件进行基准测试,看看哪个对您的实际应用更有意义。我学到的一件事是,在某些情况下,基本的python构造可以胜过numpy内置函数,具体取决于手头的任务。
答案 2 :(得分:0)
另一种更快的解决方案是将列表理解与+
运算符一起使用。虽然我不明白为什么它更快。但这绝对是非常优雅和基本的。
a['pos'] = ["2R:" + x for x in a['pos']]
时间:
%timeit a['pos'] = ["2R:" + x for x in a['pos']]
8.07 ms ± 64.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit a['pos'] = [''.join(('2R:',x)) for x in a['pos']]
9.53 ms ± 391 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit a['pos'] = add('2R:', a['pos'])
14.2 ms ± 337 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
PS:我使用稍微不同的定义创建了数组a
:
a = np.empty(20000, dtype=[('pos', 'U5'), ('par1', '<f8'), ('par2', '<f8')])
就像我为Sxxx
使用类型pos
一样,串联会为我产生类型错误。