Question

我试图找到一个矢量化/快速/多块友好的方式将A列中的以下值转换为B列：

定义列＆＃39; B＆＃39;将填充1和-1组之间的所有间隙，值为1，跳过每对中的第一行。也就是说，对于ID4-ID7，列B用1填充（给定列A @ ID3中的初始1）。接下来，从ID10-ID14填充1（因为列A @ ID9 = 1）。

虽然这很容易用for循环，但我想知道是否存在非循环解决方案？基于O（n）循环的解决方案如下：

import numpy as np
import pandas as pd
x = np.array([ 0, 0, 1, 1, 0 ,0, -1, 0, 1, 0 , 0, 1, 0, -1, 0])


def make_y(x,showminus=False):
    y = x * 0
    state = 0 # are we in 1 or 0 or -1
    for i,n in enumerate(x):
        if n == 1 and n != state:
            state = n
            if i < len(y)-1:
                y[i+1] = state
        elif n == -1 and n != state:
            y[i] = state
            if showminus:
                state = -1
            else:
                state = 0
        else:
            y[i] = state
    return y

y = make_y(x)
print pd.DataFrame([x,y]).T

上述功能在我的机器上产生以下性能：

%timeit y = make_y(x)
10000 loops, best of 3: 28 µs per loop

我猜测必须有一些方法可以让整个事情变得更快，因为我最终需要处理1000万+元素长的数组...

Answer 1

可能的矢量化解决方案如下

idx_1s, = np.where(x == -1)  # find the positions of the -1's
idx1s, = np.where(x == 1)  # find the positions of the 1's

要查找哪个1应变为0并标记1的块的开头：

idx0s = np.concatenate(([0], np.searchsorted(idx1s, idx_1s[:-1])))
idx0s = idx1s[idx0s]

我们现在有两个长度相等的数组，idx0s和idx_1s，标记每个块的第一个和最后一个项的位置，所以我们现在可以这样做：

y = x.copy()
y[idx0s] = 0
idx0s += 1
idx_1s += 1
mask = np.zeros_like(y, dtype=np.bool)
mask[idx0s] = True
mask[idx_1s] = True
mask = np.logical_xor.accumulate(mask)
y[mask] = 1

产生所需的结果：

>>> y
array([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0])

输入格式错误可能有点脆弱，而且我认为它不会优雅地处理尾随-1。但唯一的非O（n）操作是对searchsorted的调用，但是searchsorted具有更快的搜索排序键的优化，因此它可能不会引人注意。

如果我在你的x上计时，它并没有超过循环版本，但对于更大的阵列，它可能会。

Answer 2

这很好用，

A=[0,0,1,1,0,0,-1,0,1,0,0,1,0,-1,0]
B=[]
#initializing column with same number of zeros 
for j in range(len(A)):
    B.append(0)
print A
for i in range(len(A)):
    #retrieve the indices of pair (1 to -1)
    try:
            one_index=A.index(1)
            neg_one_index=A.index(-1)
    except:
            pass 
    one_index=one_index+1
    #replacing the zeros in column B by 1 at correct locations
    while one_index<=neg_one_index:
            B[one_index]=1
            A[one_index-1]=0
            A[one_index]=0
            one_index=one_index+1
print B
#output->[0,0,0,1,1,1,1,0,0,1,1,1,1,1,0] (i.e correct)

Python / Numpy - 填补非连续点之间的差距？

2 个答案: