如何使用python numpy.bincount对代码进行矢量化,使用沿轴应用

时间:2017-04-29 18:32:55

标签: python numpy vectorization

我正在尝试使用numpy对代码进行矢量化,以使用多处理来运行它,但我无法理解numpy.apply_along_axis的工作原理。这是使用map

进行矢量化的代码示例
import numpy
from scipy import sparse
import multiprocessing
from matplotlib import pyplot

#first i build a matrix of some x positions vs time datas in a sparse format
matrix = numpy.random.randint(2, size = 100).astype(float).reshape(10,10)
x = numpy.nonzero(matrix)[0]
times = numpy.nonzero(matrix)[1]
weights = numpy.random.rand(x.size)

#then i define an array of y positions
nStepsY = 5
y = numpy.arange(1,nStepsY+1)

#now i build an image using x-y-times coordinates and x-times weights
def mapIt(ithStep):
    ncolumns = 80
    image = numpy.zeros(ncolumns)

    yTimed = y[ithStep]*times
    positions = (numpy.round(x-yTimed)+50).astype(int)

    values = numpy.bincount(positions,weights)
    values = values[numpy.nonzero(values)]
    positions = numpy.unique(positions)
    image[positions] = values
    return image


image = list(map(mapIt, range(nStepsY)))
image = numpy.array(image)

a = pyplot.imshow(image, aspect = 10)

Here the output plot

我尝试使用numpy.apply_along_axis,但是这个函数只允许我沿着image的行进行迭代,而我也需要沿着ithStep索引进行迭代。 E.g:

#now i build an image using x-y-times coordinates and x-times weights
nrows = nStepsY
ncolumns = 80
matrix = numpy.zeros(nrows*ncolumns).reshape(nrows,ncolumns)

def applyIt(image):

    image = numpy.zeros(ncolumns)

    yTimed = y[ithStep]*times
    positions = (numpy.round(x-yTimed)+50).astype(int)

    values = numpy.bincount(positions,weights)
    values = values[numpy.nonzero(values)]
    positions = numpy.unique(positions)
    image[positions] = values

    return image


imageApplied = numpy.apply_along_axis(applyIt,1,matrix)
a = pyplot.imshow(imageApplied, aspect = 10)

它显然只返回第一行nrows次,因为没有迭代ithStepAnd here the wrong plot

有一种方法可以迭代索引,或者在numpy.apply_along_axis迭代时使用索引吗?

此处的代码仅包含matricial操作:它比mapapply_along_axis快得多,但使用了大量内存。

(在这个函数中我使用scipy.sparse的技巧,当你试图对同一个元素上的数字求和时,它比numpy数组更直观地工作)

def fullmatrix(nRows, nColumns):
    y = numpy.arange(1,nStepsY+1)
    image = numpy.zeros((nRows, nColumns))

    yTimed = numpy.outer(y,times)
    x3d = numpy.outer(numpy.ones(nStepsY),x)
    weights3d = numpy.outer(numpy.ones(nStepsY),weights)
    y3d = numpy.outer(y,numpy.ones(x.size))
    positions = (numpy.round(x3d-yTimed)+50).astype(int)

    matrix = sparse.coo_matrix((numpy.ravel(weights3d), (numpy.ravel(y3d), numpy.ravel(positions)))).todense()
    return matrix

image = fullmatrix(nStepsY, 80)
a = pyplot.imshow(image, aspect = 10)

这种方式更简单,速度更快!非常感谢你。

nStepsY = 5
nRows = nStepsY
nColumns = 80
y = numpy.arange(1,nStepsY+1)
image = numpy.zeros((nRows, nColumns))
fakeRow = numpy.zeros(positions.size)

def itermatrix(ithStep):
    yTimed = y[ithStep]*times
    positions = (numpy.round(x-yTimed)+50).astype(int)

    matrix = sparse.coo_matrix((weights, (fakeRow, positions))).todense()
    matrix = numpy.ravel(matrix)
    missColumns = (nColumns-matrix.size)
    zeros = numpy.zeros(missColumns)
    matrix = numpy.concatenate((matrix, zeros))
    return matrix

for i in numpy.arange(nStepsY):
    image[i] = itermatrix(i)

#or, without initialization of image:
imageMapped = list(map(itermatrix, range(nStepsY)))
imageMapped = numpy.array(imageMapped)

1 个答案:

答案 0 :(得分:0)

尝试使用mapapply_along_axis会模糊问题的基本迭代。

我将您的代码重写为y上的显式循环:

nStepsY = 5
y = numpy.arange(1,nStepsY+1)
image = numpy.zeros((nStepsY, 80))
for i, yi in enumerate(y):
    yTimed = yi*times
    positions = (numpy.round(x-yTimed)+50).astype(int)
    values = numpy.bincount(positions,weights)
    values = values[numpy.nonzero(values)]
    positions = numpy.unique(positions)
    image[i, positions] = values
a = pyplot.imshow(image, aspect = 10)
pyplot.show()

查看代码,我想我可以计算出positions数组的所有y值的(y.shape[0],times.shape[0])。但其余的,bincountunique仍需要工作row by row

使用2d数组时

apply_along_axis,而axis = 1基本上执行:

res = np.zeros_like(arr)
for i in range....:
   res[i,:] = func1d(arr[i,:])

如果输入数组具有更多维度,则构造更精细的索引对象[i,j,k,:]。它可以处理func1d返回与输入不同大小的数组的情况。但无论如何它只是一个通用的迭代工具。

将初始positions创建移出循环:

yTimed = y[:,None]*times
positions = (numpy.round(x-yTimed)+50).astype(int)
image = numpy.zeros((positions.shape[0], 80))
for i, pos in enumerate(positions):
    values = numpy.bincount(pos,weights)
    values = values[numpy.nonzero(values)]
    pos = numpy.unique(pos)
    image[i, pos] = values

现在我可以将其转换为apply_along_axis问题,applyIt采用positions向量(包含所有yTimed信息)而非空白{{1}矢量。

image

时间明智我希望它比我的显式迭代慢一点。它必须做更多的设置工作,包括测试调用def applyIt(pos, size, weights): acolumn = numpy.zeros(size) values = numpy.bincount(pos,weights) values = values[numpy.nonzero(values)] pos = numpy.unique(pos) acolumn[pos] = values return acolumn image = numpy.apply_along_axis(applyIt, 1, positions, 80, weights) 以确定其返回数组的大小(即applyIt(positions[0,:],...)具有与image不同的形状。)

positions