是否可以使用numpy ufunc.at(特别是add.at)连接字符串数组? add.at或char.add.at均不适用于字符串/字符数组。
方法需要使用n维数组,因此基于索引进行拆分然后再进行合并不是理想的方法
a = np.array(['a', 'b'])
ixs = np.array([0, 1, 1])
vals = np.array(['e', 'f', 'g])
# Neither of these options work
np.add.at(a, ixs, vals)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-29-fb8e3bd48930> in <module>()
2 ixs = np.array([0, 1])
3 vals = np.array(['e', 'e'])
----> 4 np.add.at(a, ixs, vals)
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')
np.char.add.at(a, ixs, vals)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-30-e1bb1f7868dd> in <module>()
2 ixs = np.array([0, 1])
3 vals = np.array(['e', 'e'])
----> 4 np.char.add.at(a, ixs, vals)
AttributeError: 'function' object has no attribute 'at'
所需的输出:['ae','bfg']
非常感谢!
答案 0 :(得分:1)
In [279]: a = np.array(['a', 'b'])
...: ixs = np.array([0, 1, 1])
...: vals = np.array(['e', 'f', 'g'])
...:
您的错误:
In [280]: np.char.add.at(a, ixs, vals)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-280-f06ad4d86cfb> in <module>
----> 1 np.char.add.at(a, ixs, vals)
AttributeError: 'function' object has no attribute 'at'
In [281]: np.add.at(a, ixs, vals)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-281-683423808141> in <module>
----> 1 np.add.at(a, ixs, vals)
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')
但是让我们尝试add.at
使用对象dtype数组。
In [282]: ao=a.astype(object)
In [283]: ao
Out[283]: array(['a', 'b'], dtype=object)
In [284]: vo=vals.astype(object)
In [285]: vo
Out[285]: array(['e', 'f', 'g'], dtype=object)
In [286]: np.add.at(ao, ixs, vo)
In [287]: ao
Out[287]: array(['ae', 'bfg'], dtype=object)
numpy
函数通常通过将操作委托给对象的相应方法来对对象dtype数组进行操作(总是?)。 add
是为Python字符串定义的,因此add.at
可以根据需要工作。
答案 1 :(得分:0)
好吧,您总是可以指望数字!因此,这是一种使用数组操作的方法,应该适合大型数据集-
def char_add_at(a, ixs, vals):
an = a.view('i1').reshape(len(a),-1)
vn = vals.view('i1').reshape(len(vals),-1)
s = (vn!=0).sum(1)
vnc = np.bincount(ixs,s).astype(int)
anc = (an!=0).sum(1)
tnc = anc + vnc
r = len(anc)
c = tnc.max()+1
out_ar = np.zeros((r,c), dtype=np.uint8)
out_ar[:,:anc.max()] = an
fill_mask = tnc[:,None] > np.arange(c)
fill_mask &= out_ar==0
out_ar[fill_mask] = vn[vn!=0]
out = out_ar.view('S'+str(c)).ravel()
return out
样品运行-
In [671]: a = np.array(['a', 'bz', 'cer'])
...: ixs = np.array([0, 1, 1, 2])
...: vals = np.array(['ez', 'fieabcdef', 'gwop', 'H'])
In [672]: char_add_at(a, ixs, vals)
Out[672]: array(['aez', 'bzfieabcdefgwop', 'cerH'], dtype='|S16')
时间-
案例1:将样本数据集按100倍放大
In [675]: # Sample setup
...: a = np.array(['a', 'bz', 'cer'])
...: ixs = np.array([0, 1, 1, 2])
...: vals = np.array(['ez', 'fieabcdef', 'gwop', 'H'])
...:
...: # Scale up sample dataset
...: N = 100 # scale up factor
...: a = np.hstack(([a]*N))
...: ixs = (ixs + (ixs.max()+1)*np.arange(N)[:,None]).ravel()
...: vals = np.hstack(([vals]*N))
# @hpaulj's soln
In [676]: %%timeit
...: ao=a.astype(object)
...: vo=vals.astype(object)
...: np.add.at(ao, ixs, vo)
10000 loops, best of 3: 56.3 µs per loop
In [677]: %timeit char_add_at(a, ixs, vals)
10000 loops, best of 3: 72.6 µs per loop
案例2:将样本数据集按1000倍放大
# @hpaulj's soln
In [679]: %%timeit
...: ao=a.astype(object)
...: vo=vals.astype(object)
...: np.add.at(ao, ixs, vo)
1000 loops, best of 3: 483 µs per loop
In [680]: %timeit char_add_at(a, ixs, vals)
1000 loops, best of 3: 364 µs per loop
案例3:将样本数据集按10000x放大
# @hpaulj's soln
In [682]: %%timeit
...: ao=a.astype(object)
...: vo=vals.astype(object)
...: np.add.at(ao, ixs, vo)
100 loops, best of 3: 5.28 ms per loop
In [683]: %timeit char_add_at(a, ixs, vals)
100 loops, best of 3: 3.34 ms per loop