Question

我想将一个np.array与一个集合相交，而不必先将np.array转换为一个列表（将程序减慢到一个不可行的级别）。

这是我当前的代码:(请注意，我从b，g，r rawCapture获取此数据，而selection_data只是预先设置的一组。）

def GreenCalculations(data):
    data.reshape(1,-1,3)
    data={tuple(item) for item in data[0]}
    ColourCount=selection_data & set(data)
    Return ColourCount

现在我认为我的当前问题是由于数据[0]，我只是比较图片的第一个顶部。是否可以遍历所有行？

注意：tolist（）需要很多时间。

Answer 1

首先是样本data;我猜它是一个nxnx3数组，dtype为uint8

In [791]: data=np.random.randint(0,256,(8,8,3),dtype=np.uint8)

reshape方法返回一个具有新形状的新数组，但不会在inplace中更改它：

In [793]: data.reshape(1,-1,3)

data.shape=(1,-1,3)会做到这一点。但为什么最初1？

相反：

In [795]: aset={tuple(item) for item in data.reshape(-1,3)}
In [796]: aset
Out[796]: 
{(3, 92, 60),
 (5, 211, 227),
 (6, 185, 183),
 (9, 37, 0),
 ....

 In [797]: len(aset)
 Out[797]: 64

在我的情况下，一组64个独特的项目 - 考虑到我如何生成值

，这并不奇怪

您的无操作data.reshape行和{tuple(item) for item in data[0]}说明了为什么它似乎只是在图片的第一行处理。

我猜selection_data是类似的3项元组，例如：

In [801]: selection_data = {tuple(data[1,3,:]), (1,2,3), tuple(data[5,5,:])}
In [802]: selection_data
Out[802]: {(1, 2, 3), (49, 132, 26), (76, 131, 16)}
In [803]: selection_data&aset
Out[803]: {(49, 132, 26), (76, 131, 16)}

你没有说你尝试使用tolist的位置，但我猜测生成一组元组。

但奇怪的是，tolist加快了转化速度：

In [808]: timeit {tuple(item) for item in data.reshape(-1,3).tolist()}
10000 loops, best of 3: 57.7 µs per loop
In [809]: timeit {tuple(item) for item in data.reshape(-1,3)}
1000 loops, best of 3: 239 µs per loop
In [815]: timeit data.reshape(-1,3).tolist()
100000 loops, best of 3: 19.8 µs per loop
In [817]: timeit {tuple(item.tolist()) for item in data.reshape(-1,3)}
10000 loops, best of 3: 100 µs per loop

因此，为了进行这种列表和设置操作，我们不妨立即跳转到列表格式。

numpy有一些设置函数，例如np.in1d。这只是在1d阵列上的操作，但正如在一些unique row问题中已经证明的那样，我们可以通过将2d阵列视为结构化数组来解决这个问题。我不得不摆弄这个目标：

In [880]: dt=np.dtype('uint8,uint8,uint8')
In [881]: data1=data.reshape(-1,3).view(dt).ravel()
In [882]: data1
Out[882]: 
array([(41, 145, 254), (138, 144, 7), (192, 241, 203), (42, 177, 215),
       (78, 132, 87), (221, 176, 87), (107, 171, 147), (231, 13, 53),
       ... 
      dtype=[('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1')])

构造具有相同结构化数组性质的选择：

In [883]: selection=[data[1,3,:],[1,2,3],data[5,5,:]]
In [885]: selection=np.array(selection,np.uint8).view(dt)
In [886]: selection
Out[886]: 
array([[(49, 132, 26)],
       [(1, 2, 3)],
       [(76, 131, 16)]], 
      dtype=[('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1')])

因此，selection中data1中的项目也是：

In [888]: np.in1d(selection,data1)
Out[888]: array([ True, False,  True], dtype=bool)

以及data1中正在选择的项目是：

In [890]: np.where(np.in1d(data1,selection))
Out[890]: (array([11, 45], dtype=int32),)

或解开的形状

In [891]: np.where(np.in1d(data1,selection).reshape(8,8))
Out[891]: (array([1, 5], dtype=int32), array([3, 5], dtype=int32))

我用来生成selection的相同（1,3）和（5,5）项。

in1d时间具有竞争力：

In [892]: %%timeit
     ...: data1=data.reshape(-1,3).view(dt).ravel()
     ...: np.in1d(data1,selection)
     ...: 
10000 loops, best of 3: 65.7 µs per loop

In [894]: timeit selection_data&{tuple(item) for item in data.reshape(-1,3).tolist()}
10000 loops, best of 3: 91.5 µs per loop

Answer 2

如果我正确理解你的问题（并且我不是100％确定我这样做;但使用与hpaulj相同的假设），那么你的问题可以通过numpy_indexed包来解决：

import numpy_indexed as npi
ColourCount = npi.intersection(data.reshape(-1, 3), np.asarray(selection_data))

也就是说，它将重新整形的数组和集合视为长度为3的ndarray的序列，它以矢量化的方式找到交集。

np.array和set的交集

2 个答案: