Question

我在python中使用collections.deque实现循环缓冲区，以便将其用于某些计算。这是我的原始代码：

clip=moviepy.editor.VideoFileClip('file.mp4')
clip_size= clip.size[::-1]
Depth=30
dc=5
TempKern = # some array of size Depth
RingBuffer=deque(np.zeros(clip_size, dtype=float),maxlen=NewDepth)
modified_clip = clip.fl_image(new_filtered_output)
modified_clip.write_videofile('output.mp4'))

def new_filtered_output(image):
   global RingBuffer
   inter_frame=somefunction(image)# inter_frame and image shape is same as clip_size
   RingBuffer.append(inter_frame)
   # Apply kernel
   Output =  dc + np.sum([np.asarray(RingBuffer)[j]*TempKern[j] for j in range(Depth)],axis=0)
   return Output

这是最快的方法吗？我听说numpy roll是一种选择。但我不知道如何使其表现得像上面的代码？

Answer 1

我注意到您更改了上面的代码，但原始代码是：

def one():
    TempKern=np.array([1,2,3,4,5])
    depth=len(TempKern)
    buf=deque(np.zeros((2,3)),maxlen=5)
    for i in range(10):
        buf.append([[i,i+1,i+2],[i+3,i+4,i+5]])
    total=  + np.sum([np.asarray(buf)[j]*TempKern[j] for j in range(depth)],axis=0)
    print('total')
    print(total)
    return total

如果你首先将数组展平以进行计算，你可以大大简化事情并使其运行得更快。

def two():
    buf = np.zeros((5,6), dtype=np.int32)
    for idx, i in enumerate(range(5, 10)):
        buf[idx] = np.array([[i,i+1,i+2,i+3,i+4,i+5]], dtype=np.int32)
    return (buf.T * np.array([1, 2, 3, 4, 5])).sum(axis=1).reshape((2,3))

第二个实现返回相同的值，并在我的机器上运行快4倍

one()

>> [[115 130 145]
    [160 175 190]]   ~ 100µs / loop

two()

>> array([[115, 130, 145],
          [160, 175, 190]])    ~~ 26µs / loop

您可以进一步简化和参数化：

def three(n, array_shape):
    buf = np.zeros((n,array_shape[0]*array_shape[1]), dtype=np.int32)
    addit = np.arange(1, n+1, dtype=np.int32)
    for idx, i in enumerate(range(n, 2*n)):
        buf[idx] = np.arange(i, i+n+1)
    return (buf.T * addit).sum(axis=1).reshape(array_shape)

three(5, (2,3))

    >> array([[115, 130, 145],
             [160, 175, 190]])   ~ 17µs / loop

请注意，第二个和第三个版本返回一个numpy数组。如果需要，您可以使用.tolist()将其强制转换为列表。

根据您的反馈 - 编辑如下：

def four(array_shape):
    n = array_shape[0] * array_shape[1] - 1
    buf = []
    addit = np.arange(1, n+1, dtype=np.int32)
    for idx, i in enumerate(range(n, 2*n)):
        buf.append(np.arange(i, i+n+1))
    buf = np.asarray(buf)
    summed = (buf.T * addit).sum(axis=1)
    return summed.reshape(array_shape)

Answer 2

你可以将环形缓冲区作为一个numpy数组，通过加倍大小和切片：

clipsize = clip.size[::-1]
depth = 30
ringbuffer = np.zeros((2*depth,) + clipsize)

framecounter = 0

def new_filtered_output(image):
   global ringbuffer, framecounter
   inter_frame = somefunction(image)

   idx = framecounter % depth
   ringbuffer[idx] = ringbuffer[idx + depth] = inter_frame
   buffer = ringbuffer[idx + 1 : idx + 1 + depth]
   framecounter += 1

   # Apply kernel
   output =  dc + np.sum([buffer[j]*kernel[j] for j in range(depth)], axis=0)
   return output

现在你没有将每个帧的deque转换为numpy数组（以及每个循环迭代..）。

如评论中所述，您可以更有效地应用内核：

output = dc + np.einsum('ijk,i->jk', buffer, kernel)

或者：

output = dc + np.tensordot(kernel, buffer, axes=1)

python中的快速循环缓冲区比使用deque的快速循环缓冲区？

2 个答案: