得到平均(row,col)单元格给定的数组值作为权重?

时间:2013-12-17 08:09:43

标签: python arrays numpy

我正在尝试使用数组值作为权重来获取平均行/列位置。 这似乎有效,但感觉不对:

a = numpy.random.ranf(size=(5,5))
normalized_a = a/numpy.nansum(a)

row_values = []
col_values = []
for row, col in numpy.ndindex(normalized_a.shape):
    weight = int(normalized_a[row, col] * 100)
    row_values.extend([row] * weight)
    col_values.extend([col] * weight)

print "average row:", sum(row_values)/float(len(row_values))
print "average col:", sum(col_values)/float(len(col_values))

在numpy中有更有效的方法吗?

2 个答案:

答案 0 :(得分:2)

加速计算的一个基本见解是,因为在进行行(列)计算时,同一列(行)中的所有项都会乘以相同的值,将它们加在一起会更快,然后相乘行(列)编号的结果。如果您的数组为m x n,则会减少从2 * m * nm + n所需的乘法次数。由于您正在进行乘法和加法,因此您可以使用np.dot来尝试划分最后一点性能。以@ mgilson的测试为基础:

def new3(normlized_a):
    weights  = numpy.floor(normalized_a * 100)
    total_wt = np.sum(weights)
    rows, cols = weights.shape
    row_values = np.dot(weights.sum(axis=1), np.arange(rows)) / total_wt
    col_values = np.dot(weights.sum(axis=0), np.arange(cols)) / total_wt
    return row_values, col_values

这些是我的结果和时间:

(1.8352941176470587, 2.388235294117647)
(1.8352941176470587, 2.388235294117647)
(1.8352941176470587, 2.388235294117647)
(1.8352941176470587, 2.388235294117647)
timing!!!
2.59478258085
1.33357909978
1.0771122333
0.487124971828 #new3

答案 1 :(得分:1)

这些看起来好一点:

import numpy

a = numpy.random.ranf(size=(5,6))
normalized_a = a/numpy.nansum(a)

def original(a, normalized_a):
  row_values = []
  col_values = []
  for row, col in numpy.ndindex(normalized_a.shape):
    weight = int(normalized_a[row, col] * 100)
    row_values.extend([row] * weight)
    col_values.extend([col] * weight)

  return sum(row_values)/float(len(row_values)), sum(col_values)/float(len(col_values))


def new(a, normalized_a):
  weights = numpy.floor(normalized_a * 100)
  nx, ny = a.shape
  rows, columns = numpy.mgrid[:nx, :ny]
  row_values = numpy.sum(rows * weights)/numpy.sum(weights)
  col_values = numpy.sum(columns * weights)/numpy.sum(weights)
  return row_values, col_values


def new2(a, normalized_a):
  weights = numpy.floor(normalized_a * 100)
  nx, ny = a.shape
  rows, columns = numpy.ogrid[:nx, :ny]
  row_values = numpy.sum(rows * weights)/numpy.sum(weights)
  col_values = numpy.sum(columns * weights)/numpy.sum(weights)
  return row_values, col_values


print original(a, normalized_a)
print new(a, normalized_a)
print new2(a, normalized_a)


print "timing!!!"

import timeit
print timeit.timeit('original(a, normalized_a)', 'from __main__ import original, a, normalized_a', number=10000)
print timeit.timeit('new(a, normalized_a)', 'from __main__ import new, a, normalized_a', number=10000)
print timeit.timeit('new2(a, normalized_a)', 'from __main__ import new2, a, normalized_a', number=10000)

我的电脑上的结果:

(1.8928571428571428, 2.630952380952381)
(1.8928571428571428, 2.6309523809523809)
(1.8928571428571428, 2.6309523809523809)
timing!!!
1.05751299858
0.64871096611
0.497050046921

我使用了一些numpy的索引技巧来对计算进行矢量化。我实际上有点惊讶,我们没有做得更好。 np.ogrid只是测试矩阵上原始速度的两倍。 np.mgrid介于两者之间。