带字符数组的Numpy ufunc.at

时间:2019-05-21 15:37:00

标签: python string numpy

是否可以使用numpy ufunc.at(特别是add.at)连接字符串数组? add.at或char.add.at均不适用于字符串/字符数组。

方法需要使用n维数组,因此基于索引进行拆分然后再进行合并不是理想的方法

a = np.array(['a', 'b'])
ixs =  np.array([0, 1, 1])
vals = np.array(['e', 'f', 'g])

# Neither of these options work

np.add.at(a, ixs, vals)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-fb8e3bd48930> in <module>()
      2 ixs =  np.array([0, 1])
      3 vals = np.array(['e', 'e'])
----> 4 np.add.at(a, ixs, vals)

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')



np.char.add.at(a, ixs, vals)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-30-e1bb1f7868dd> in <module>()
      2 ixs =  np.array([0, 1])
      3 vals = np.array(['e', 'e'])
----> 4 np.char.add.at(a, ixs, vals)

AttributeError: 'function' object has no attribute 'at'


所需的输出:['ae','bfg']

非常感谢!

2 个答案:

答案 0 :(得分:1)

In [279]: a = np.array(['a', 'b']) 
     ...: ixs =  np.array([0, 1, 1]) 
     ...: vals = np.array(['e', 'f', 'g']) 
     ...:                      

您的错误:

In [280]: np.char.add.at(a, ixs, vals)                                       
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-280-f06ad4d86cfb> in <module>
----> 1 np.char.add.at(a, ixs, vals)

AttributeError: 'function' object has no attribute 'at'

In [281]: np.add.at(a, ixs, vals)                                            
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-281-683423808141> in <module>
----> 1 np.add.at(a, ixs, vals)

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')

但是让我们尝试add.at使用对象dtype数组。

In [282]: ao=a.astype(object)                                                
In [283]: ao                                                                 
Out[283]: array(['a', 'b'], dtype=object)
In [284]: vo=vals.astype(object)                                             
In [285]: vo                                                                 
Out[285]: array(['e', 'f', 'g'], dtype=object)
In [286]: np.add.at(ao, ixs, vo)                                                                                                                
In [287]: ao                                                                 
Out[287]: array(['ae', 'bfg'], dtype=object)

numpy函数通常通过将操作委托给对象的相应方法来对对象dtype数组进行操作(总是?)。 add是为Python字符串定义的,因此add.at可以根据需要工作。

答案 1 :(得分:0)

好吧,您总是可以指望数字!因此,这是一种使用数组操作的方法,应该适合大型数据集-

def char_add_at(a, ixs, vals):
    an = a.view('i1').reshape(len(a),-1)
    vn = vals.view('i1').reshape(len(vals),-1)

    s = (vn!=0).sum(1)
    vnc = np.bincount(ixs,s).astype(int)
    anc = (an!=0).sum(1)
    tnc = anc + vnc

    r = len(anc)
    c = tnc.max()+1
    out_ar = np.zeros((r,c), dtype=np.uint8)

    out_ar[:,:anc.max()] = an

    fill_mask = tnc[:,None] > np.arange(c)
    fill_mask &= out_ar==0
    out_ar[fill_mask] = vn[vn!=0]

    out = out_ar.view('S'+str(c)).ravel()
    return out

样品运行-

In [671]: a = np.array(['a', 'bz', 'cer'])
     ...: ixs =  np.array([0, 1, 1, 2])
     ...: vals = np.array(['ez', 'fieabcdef', 'gwop', 'H'])

In [672]: char_add_at(a, ixs, vals)
Out[672]: array(['aez', 'bzfieabcdefgwop', 'cerH'], dtype='|S16')

时间-

案例1:将样本数据集按100倍放大

In [675]: # Sample setup
     ...: a = np.array(['a', 'bz', 'cer'])
     ...: ixs =  np.array([0, 1, 1, 2])
     ...: vals = np.array(['ez', 'fieabcdef', 'gwop', 'H'])
     ...: 
     ...: # Scale up sample dataset
     ...: N = 100 # scale up factor
     ...: a = np.hstack(([a]*N))
     ...: ixs = (ixs + (ixs.max()+1)*np.arange(N)[:,None]).ravel()
     ...: vals = np.hstack(([vals]*N))

# @hpaulj's soln
In [676]: %%timeit
     ...: ao=a.astype(object)                                                
     ...: vo=vals.astype(object)                                             
     ...: np.add.at(ao, ixs, vo)
10000 loops, best of 3: 56.3 µs per loop

In [677]: %timeit char_add_at(a, ixs, vals)
10000 loops, best of 3: 72.6 µs per loop

案例2:将样本数据集按1000倍放大

# @hpaulj's soln
In [679]: %%timeit
     ...: ao=a.astype(object)                                                
     ...: vo=vals.astype(object)                                             
     ...: np.add.at(ao, ixs, vo)
1000 loops, best of 3: 483 µs per loop

In [680]: %timeit char_add_at(a, ixs, vals)
1000 loops, best of 3: 364 µs per loop

案例3:将样本数据集按10000x放大

# @hpaulj's soln
In [682]: %%timeit
     ...: ao=a.astype(object)                                                
     ...: vo=vals.astype(object)                                             
     ...: np.add.at(ao, ixs, vo)
100 loops, best of 3: 5.28 ms per loop

In [683]: %timeit char_add_at(a, ixs, vals)
100 loops, best of 3: 3.34 ms per loop