Question

我想计算一个大的索引权重和（1,000,000 x 3,000）布尔numpy数组。大布尔数组发生了变化很少，但权重来自查询时间，我需要答案非常快，没有复制整个大阵列，或扩大小重量阵列到大阵列的大小。

结果应该是一个包含1,000,000个条目的数组，每个条目都有对应于该行的True的权重数组条目的总和值。

我研究过使用蒙面数组，但它们似乎需要构建一个 weights数组我的大布尔数组的大小。

下面的代码给出了正确的结果，但我买不起该副本在乘法步骤中。因此，甚至不需要乘法 values数组是布尔值，但至少它处理广播正常。

我刚接触numpy，喜欢它，但我准备放弃它这个特殊的问题。我已经学会了足够多的知识留下来远离任何在python中循环的东西。

我的下一步是在C中编写此例程（已添加让我通过使用位而不是字节来节省内存的好处方式。）

除非你们中的一个笨拙的大师能把我从cython中救出来？

from numpy import array, multiply, sum

# Construct an example values array, alternating True and False.
# This represents four records of three attributes each:
#    array([[False,  True, False],
#           [ True, False,  True],
#           [False,  True, False],
#           [ True, False,  True]], dtype=bool)
values = array([(x % 2) for x in range(12)], dtype=bool).reshape((4,3))

# Construct example weights, one for each attribute:
#    array([1, 2, 3])
weights = array(range(1, 4))

# Create expensive NEW array with the weights for the True attributes.
# Broadcast the weights array into the values array.
#    array([[0, 2, 0],
#           [1, 0, 3],
#           [0, 2, 0],
#           [1, 0, 3]])
weighted = multiply(values, weights)

# Add up the weights:
#    array([2, 4, 2, 4])
answers = sum(weighted, axis=1)

print answers

# Rejected masked_array solution is too expensive (and oddly inverts
# the results):
masked = numpy.ma.array([[1,2,3]] * 4, mask=values)

Answer 1

点积（或内积）就是你想要的。它允许你取一个大小为m×n的矩阵和一个长度为n的向量，并将它们相乘，得到一个长度为m的向量，其中每个条目是一行的加权和带有矢量作为权重的条目的矩阵。

Numpy将其实现为array1.dot(array2)（或旧版本中的numpy.dot(array1, array2)）。 e.g：

from numpy import array

values = array([(x % 2) for x in range(12)], dtype=bool).reshape((4,3))

weights = array(range(1, 4))

answers = values.dot(weights)
print answers
# output: [ 2 4 2 4 ]

（你应该使用timeit module对此进行基准测试。）

Answer 2

dbaupp的答案似乎是正确答案。但仅仅为了多样性，这是另一种节省内存的解决方案。即使对于没有内置numpy等效项的操作，这也适用。

>>> values = numpy.array([(x % 2) for x in range(12)], dtype=bool).reshape((4,3))
>>> weights = numpy.array(range(1, 4))
>>> weights_stretched = numpy.lib.stride_tricks.as_strided(weights, (4, 3), (0, 8))

numpy.lib.stride_tricks.as_strided是一个很棒的小功能！它允许您指定允许小数组模拟更大数组的shape和strides值。观察 - 这里没有真正的四行;它只是看起来那样：

>>> weights_stretched[0][0] = 4
>>> weights_stretched 
array([[4, 2, 3],
       [4, 2, 3],
       [4, 2, 3],
       [4, 2, 3]])

因此，您可以传递较小的数组，而不是将大数组传递给MaskedArray。（但正如你已经注意到的那样，numpy掩蔽的工作方式与你预期的相反;真实掩盖，而不是揭示，所以你必须存储你的values倒置。）你可以请注意，MaskedArray不会复制任何数据;它只反映weights_stretched中的任何内容：

>>> masked = numpy.ma.MaskedArray(weights_stretched, numpy.logical_not(values))
>>> weights_stretched[0][0] = 1
>>> masked
masked_array(data =
 [[-- 2 --]
 [1 -- 3]
 [-- 2 --]
 [1 -- 3]],
      mask =
 [[ True False  True]
 [False  True False]
 [ True False  True]
 [False  True False]],
      fill_value=999999)

现在我们可以将它传递给总和：

>>> sum(masked, axis=1)
masked_array(data = [2 4 2 4],
      mask = [False False False False],
      fill_value=999999)

我将numpy.dot和上面的内容与1,000,000 x 30阵列进行了对比。这是相对现代的MacBook Pro的结果（numpy.dot是dot1;我的是dot2）：

>>> %timeit dot1(values, weights)
1 loops, best of 3: 194 ms per loop
>>> %timeit dot2(values, weights)
1 loops, best of 3: 459 ms per loop

如您所见，内置numpy解决方案更快。但stride_tricks无论如何都值得了解，所以我要离开这个。

Answer 3

这对你有用吗？

a = np.array([sum(row * weights) for row in values])

这使用sum()立即求和row * weights值，因此您不需要内存来存储所有中间值。然后列表理解收集所有值。

你说你想避免任何“Python中的循环”。这至少使用Python的C guts循环，而不是显式的Python循环，但它不能像NumPy解决方案一样快，因为它使用编译的C或Fortran。

Answer 4

我认为你不需要这样的东西。 1000000乘3000是一个庞大的阵列;这很可能不适合你的RAM。

我会这样做：

假设您的数据最初位于文本文件中：

False,True,False
True,False,True
False,True,False
True,False,True

我的代码：

weight = range(1,4)    
dicto = {'True':1, 'False':0}

with open ('my_data.txt') as fin:

    a = sum(sum(dicto[ele]*w for ele,w in zip(line.strip().split(','),weight)) for line in fin)

结果：

>>> a
12

编辑：

我想我第一次有点误读了这个问题，并将所有内容总结在一起。以下解决方案提供了OP所遵循的确切解决方案：

weight = range(1,4)
dicto = {'True':1, 'False':0}

with open ('my_data.txt') as fin:

    a = [sum(dicto[ele]*w for ele,w in zip(line.strip().split(','),weight)) for line in fin]

结果：

>>> a
[2, 4, 2, 4]

有效地总结一个小的numpy数组，广播在一个巨大的numpy数组？

4 个答案: