我的数据基本上是随机抽样的。我想用numpy(或其他python包)来计算加权移动平均线。我有一个移动平均线的粗略实现,但我很难找到一个好的方法来做加权移动平均线,因此朝向边框中心的值加权的值大于边缘的值。
在这里,我生成一些样本数据,然后采用移动平均线。我怎样才能最轻松地实现加权移动平均线?谢谢!
import numpy as np
import matplotlib.pyplot as plt
#first generate some datapoint for a randomly sampled noisy sinewave
x = np.random.random(1000)*10
noise = np.random.normal(scale=0.3,size=len(x))
y = np.sin(x) + noise
#plot the data
plt.plot(x,y,'ro',alpha=0.3,ms=4,label='data')
plt.xlabel('Time')
plt.ylabel('Intensity')
#define a moving average function
def moving_average(x,y,step_size=.1,bin_size=1):
bin_centers = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size
bin_avg = np.zeros(len(bin_centers))
for index in range(0,len(bin_centers)):
bin_center = bin_centers[index]
items_in_bin = y[(x>(bin_center-bin_size*0.5) ) & (x<(bin_center+bin_size*0.5))]
bin_avg[index] = np.mean(items_in_bin)
return bin_centers,bin_avg
#plot the moving average
bins, average = moving_average(x,y)
plt.plot(bins, average,label='moving average')
plt.show()
输出:
使用crs17的建议在np.average函数中使用“weights =”,我得出加权平均函数,它使用高斯函数来加权数据:
def weighted_moving_average(x,y,step_size=0.05,width=1):
bin_centers = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size
bin_avg = np.zeros(len(bin_centers))
#We're going to weight with a Gaussian function
def gaussian(x,amp=1,mean=0,sigma=1):
return amp*np.exp(-(x-mean)**2/(2*sigma**2))
for index in range(0,len(bin_centers)):
bin_center = bin_centers[index]
weights = gaussian(x,mean=bin_center,sigma=width)
bin_avg[index] = np.average(y,weights=weights)
return (bin_centers,bin_avg)
结果看起来不错:
答案 0 :(得分:7)
您可以使用numpy.average来指定权重:
>>> bin_avg[index] = np.average(items_in_bin, weights=my_weights)
因此,要计算权重,您可以找到bin中每个数据点的x坐标,并计算它们到bin中心的距离。
答案 1 :(得分:4)
这不会给出确切的解决方案,但它会让您的生活更轻松,并且可能足够好......首先,将您的样品放在小容器中。重新采样数据以进行等间隔后,您可以使用步幅技巧和np.average
进行加权平均:
from numpy.lib.stride_tricks import as_strided
def moving_weighted_average(x, y, step_size=.1, steps_per_bin=10,
weights=None):
# This ensures that all samples are within a bin
number_of_bins = int(np.ceil(np.ptp(x) / step_size))
bins = np.linspace(np.min(x), np.min(x) + step_size*number_of_bins,
num=number_of_bins+1)
bins -= (bins[-1] - np.max(x)) / 2
bin_centers = bins[:-steps_per_bin] + step_size*steps_per_bin/2
counts, _ = np.histogram(x, bins=bins)
vals, _ = np.histogram(x, bins=bins, weights=y)
bin_avgs = vals / counts
n = len(bin_avgs)
windowed_bin_avgs = as_strided(bin_avgs,
(n-steps_per_bin+1, steps_per_bin),
bin_avgs.strides*2)
weighted_average = np.average(windowed_bin_avgs, axis=1, weights=weights)
return bin_centers, weighted_average
您现在可以执行以下操作:
#plot the moving average with triangular weights
weights = np.concatenate((np.arange(0, 5), np.arange(0, 5)[::-1]))
bins, average = moving_weighted_average(x, y, steps_per_bin=len(weights),
weights=weights)
plt.plot(bins, average,label='moving average')
plt.show()