Question

我正在做数据分析，涉及最小化一组点和一组对应的正交函数之间的最小平方误差。换句话说，我要获取一组y值和一组函数，并尝试将x值归零，以使所有函数最接近其对应的y值。一切都在“ data_set”类中完成。我要比较的功能全部存储在一个列表中，并且我使用一种类方法来计算所有这些功能的总lsq-error：

self.fits = [np.poly1d(np.polyfit(self.x_data, self.y_data[n],10)) for n in range(self.num_points)]

def error(self, x, y_set):
    arr = [(y_set[n] - self.fits[n](x))**2 for n in range(self.num_points)]
    return np.sum(arr)

当我的时间远远超过数据时，这很好，但是现在我要使用数千个x值，每个x值都有一千个y值，并且for循环的速度慢得令人无法接受。我一直在尝试使用np.vectorize：

#global scope
def func(f,x):
    return f(x)
vfunc = np.vectorize(func, excluded=['x'])
…
…
#within data_set class
    def error(self, x, y_set):
        arr = (y_set - vfunc(self.fits, x))**2
        return np.sum(arr)

func(self.fits[n], x)只要n有效就可以正常工作，而且据我从docs得知，vfunc(self.fits, x)应该等于

[self.fits[n](x) for n in range(self.num_points)]

但是它抛出：

ValueError: cannot copy sequence with size 10 to array axis with dimension 11

10是多项式拟合的程度，而11（根据定义）是其中多项式的数量，但我不知道为什么它们出现在这里。如果我更改拟合顺序，则错误消息将反映出更改。似乎np.vectorize正在将self.fits的每个元素都作为一个列表而不是一个np.poly1d函数。

无论如何，如果有人可以帮助我更好地理解np.vectorize，或者提出另一种消除该循环的方法，那将会是很不错的。

Answer 1

由于所讨论的函数都具有非常相似的结构，一旦我们提取了多项式系数，就可以“手动”矢量化。实际上，该函数只是一个非常简单的单行代码，下面的build.gradle：

eval_many

样品运行：

import numpy as np

def poly_vec(list_of_polys):
    O = max(p.order for p in list_of_polys)+1
    C = np.zeros((len(list_of_polys), O))
    for p, c in zip(list_of_polys, C):
        c[len(c)-p.order-1:] = p.coeffs
    return C

def eval_many(x,C):
    return C@np.vander(x,11).T

# make example
list_of_polys = [np.poly1d(v) for v in np.random.random((1000,11))]
x = np.random.random((2000,))

# put all coeffs in one master matrix
C = poly_vec(list_of_polys)

# test
assert np.allclose(eval_many(x,C), [p(x) for p in list_of_polys])

from timeit import timeit

print('vectorized', timeit(lambda: eval_many(x,C), number=100)*10)
print('loopy     ', timeit(lambda: [p(x) for p in list_of_polys], number=10)*100)

如何调用不带for循环的numpy函数列表？

1 个答案: