Question

说我有一个(row, col)坐标的列表（或numpy.array），例如：

[(0, 0), (1, 1), (0, 0)]

我想像这样构建2x2数组：

2 0
0 1

计算每个列出的坐标并将其放置在数组中的正确位置。即(0, 0)出现两次，因此a[0, 0] == 2。

我知道我可以通过迭代和戳每个元素的数组来构建它，但是我想检查numpy中是否有关于像这样构建数组的支持，主要是出于性能方面的考虑。如果可以，你能指出我正确的方向吗？

此外，上述几行是否有类似reduce的功能？即做new = f(acc, el)而不是new = acc + el。

Answer 1

移动到平面索引并使用np.bincount。

>>> import numpy as np                                                   
>>>                                                                                                                 
>>> coords = [(0, 0), (1, 1), (0, 0)]                                       
>>> 
>>> shp = np.max(coords, axis=0) + 1     
>>> flt = np.ravel_multi_index(np.moveaxis(coords, -1, 0), shp)               
>>> result = np.bincount(flt, minlength=shp.prod()).reshape(shp)                         
>>>                                                                                                                 
>>> result                                                                                                          
array([[2, 0],                                                                                                      
       [0, 1]])

编辑正如@MikeMiller moveaxis所指出的那样，在这里过分杀伤力； np.transpose(coords)，或者如果坐标恰好是数组coords.T，则更好。如果出于某些原因，moveaxis比coords多一些，那么2D将会更普遍，但这似乎不太可能。

Answer 2

使用np.unique()来计算唯一坐标的数量（但是~~，我不知道这是否是最快的方法~~，不是，请参见下面的计时）：< / p>

import numpy as np

a = [(0,0), (1,1), (1,0), (0,0)]

b = np.array(a)
u, c = np.unique(b, axis=0, return_counts=True)
m = np.max(b)+1
ans = np.zeros((m, m))
ans[u[:,0], u[:,1]] = c

# ans
array([[ 2.,  0.],
       [ 1.,  1.]])

我做了一些时间：

# data preparation
max_coord = 10000
max_size = 100000

# this is awful, I know it can be done much better...
coords = [(int(np.random.randint(max_coord, size=1)),
           int(np.random.randint(max_coord, size=1))) for _ in range(max_size)]

# timings using %timeit

# my solution
139 ms ± 592 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Paul Panzer's solution
142 ms ± 461 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# with max_size = 1000000
# my solution
827 ms ± 19.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Paul's solution
748 ms ± 4.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

几乎相同（尽管我不知道它们的内存占用量；对于max_size=1000000和max_coord=100000，这两种解决方案都在我的机器上提供MemoryError）。但是，我将使用@Paul的解决方案，它更加简洁（当数据很大时，速度更快）。

从坐标和构建numpy数组的最快方法

2 个答案: