将尾随零设置为X FAST

时间:2018-03-14 20:06:15

标签: performance numpy optimization indexing scipy

我有一个2D整数数组。每行中间可能有0,每行以一定数量的尾随0结束。

如何将所有尾随零设置为某个整数X?

import numpy as np

def generateTestData(N, K, INDEX_SIZE):
    # Start with a flat array to easily place zeros inside
    data1 = np.random.randint(0, INDEX_SIZE, N*K)

    # Add zeros at random locations
    idx = np.random.randint(0, N*K, int(N*K/3))
    data1[idx] = 0

    # Make data1 a (N,K) array
    data1 = np.reshape(data1, (N,K))

    # Add trailing zeros
    for i in range(N):
        data1[i,np.random.randint(0,K):] = 0

    return data1

if __name__=='__main__':
    N = 10000; K = 150; INDEX_SIZE = 500; X = -1
    # Test data
    data1 = generateTestData(N, K, INDEX_SIZE)
    # Save a copy for the test
    data2 = np.copy(data1)

    for i in range(N):
        for j in reversed(range(K)):
            if data1[i,j] == 0:
                data1[i,j] = X
            else:
                break

    # Faster code here on 'data2'
    # ...

    def diff(a,b):
        return np.mean(np.abs(a-b))

    # Verification:
    print('Diff(data1,data2) = '+str(diff(data2,data1)))

1 个答案:

答案 0 :(得分:1)

这是一个利用broadcasting -

的矢量化解决方案
def replace_trailing_num(a, compare_val=0, assign_val=-1):    
    idx = a.shape[1] - (a[:,::-1]!=compare_val).argmax(axis=1)
    idx[(a==compare_val).all(1)] = 0
    mask = np.arange(a.shape[1]) >= idx[:,None]
    a[mask] = assign_val
    return a

示例运行 -

In [60]: a
Out[60]: 
array([[2, 3, 0, 4, 6, 0, 0],
       [0, 5, 8, 0, 0, 0, 0]])

In [61]: replace_trailing_num(a, compare_val=0, assign_val=-1)
Out[61]: 
array([[ 2,  3,  0,  4,  6, -1, -1],
       [ 0,  5,  8, -1, -1, -1, -1]])

或者,我们可以使用np.minimum.accumulate来获取掩码 -

mask = np.minimum.accumulate(a[:,::-1]==compare_val,axis=1)[:,::-1]

如果你没有足够的循环或者列的数量比行数大一些,那么基于切片的循环可能会更好,其中一个列在下面 -

def replace_trailing_num_loopy(a, compare_val=0, assign_val=-1):
    idx = (a[:,::-1]!=compare_val).argmax(axis=1)
    for i,c in enumerate(idx):
        a[i,-c:] = assign_val
    return a