我正在研究一种算法(用Python / Cython编写),它使用可变窗口大小来估计噪声数据中每个点的梯度。它工作得很好,但似乎算法受到回归部分的限制。这是我使用的:
cdef double regression(np.ndarray[DTYPE_t, ndim=1] data, np.ndarray[DTYPE_t, ndim=1] time, unsigned int leftlim2, unsigned int rightlim2):
cdef unsigned int length, j
cdef double x, y, sumx, sumy, xy, xx, result, a, b, invlen
length = 0
sumx = 0
sumy = 0
xy = 0
xx = 0
for j from leftlim2 <= j < rightlim2:
x = time[j]
y = data[j]
sumx += x
sumy += y
xy += x*y
xx += x*x
length = rightlim2 - leftlim2
invlen = 1.0/length
a = xy-(sumx*sumy)*invlen
b = xx-(sumx*sumx)*invlen
result = a/b
return result
输入:
输出:近似数据集的直线斜率(y = a * x + b)。
我只对坡度感兴趣,而不是截距,因此使用循环而不是使用矩阵向量乘法进行回归。我想知道是否有人知道提高回归效率的方法,而不牺牲准确性。也许有一种方法可以利用时间数组的相等间距?
答案 0 :(得分:0)
我不熟悉Cython语法,但是这样的事情可以加快速度:
def my_regression(data, time, leftlim, rightlim):
timeslice = time[leftlim:rightlim]
dataslice = data[leftlim:rightlim]
sumx = sum(timeslice)
a = sum(timeslice*dataslice)-sum(dataslice)*sumx/(rightlim-leftlim)
b = sum(timeslice**2)-sumx**2/(rightlim-leftlim)
return a/b
计时结果:
n = 1000
data = np.random.random(n)
time = np.arange(n,dtype=float)/n
leftlim = 10
rightlim = 900
%timeit my_regression(data,time,leftlim,rightlim)
>> 10000 loops, best of 3: 74.3 µs per loop
%timeit your_regression(data,time,leftlim,rightlim)
>> 100 loops, best of 3: 2.88 ms per loop