我想在一定时期内计算信号随时间的聚合平均值。我不知道这是如何被科学地称为。
示例:我有15分钟的全年用电量。我想知道我一天中的平均消耗量(24个值)。但它更复杂:在15分钟的步骤之间有更多的测量,我无法预见它们在哪里。但是,它们应该被考虑在内,并且具有正确的“重量”。
我写了一个有效的功能,但它非常慢。这是一个测试设置:
import numpy as np
signal = np.arange(6)
time = np.array([0, 2, 3.5, 4, 6, 8])
period = 4
interval = 2
def aggregate(signal, time, period, interval):
pass
aggregated = aggregate(signal, time, period, interval)
# This should be the result: aggregated = array([ 2. , 3.125])
aggregated
应该有period/interval
个值。这是手动计算:
aggregated[0] = (np.trapz(y=np.array([0, 1]), x=np.array([0, 2]))/interval + \
np.trapz(y=np.array([3, 4]), x=np.array([4, 6]))/interval) / (period/interval)
aggregated[1] = (np.trapz(y=np.array([1, 2, 3]), x=np.array([2, 3.5, 4]))/interval + \
np.trapz(y=np.array([4, 5]), x=np.array([6, 8]))/interval) / (period/interval)
最后一个细节:它必须高效,这就是为什么我自己的解决方案没用。也许我会忽略一个numpy或scipy方法?或者这是熊猫可以做的事情吗? 非常感谢你的帮助。
答案 0 :(得分:4)
我强烈推荐使用Pandas。这里我使用的是0.8版本(即将发布)。我认为这很接近你想要的。
import pandas as p
import numpy as np
import matplotlib as plt
# Make up some data:
time = p.date_range(start='2011-05-23', end='2012-05-23', freq='min')
watts = np.linspace(0, 3.14 * 365, time.size)
watts = 38 * (1.5 + np.sin(watts)) + 8 * np.sin(5 * watts)
# Create a time series
ts = p.Series(watts, index=time, name='watts')
# Resample down to 15 minute pieces, using mean values
ts15 = ts.resample('15min', how='mean')
ts15.plot()
Pandas可以轻松地对您的数据做许多其他事情(比如确定每周的平均能量曲线)。查看p.read_csv()
以了解您的数据。
答案 1 :(得分:2)
我认为这非常接近你所需要的。我不确定我是否正确地解释了间隔和周期,但我认为我是在一些常数因素内写的。
import numpy as np
def aggregate(signal, time, period, interval):
assert (period % interval) == 0
ipp = period / interval
midpoint = np.r_[time[0], (time[1:] + time[:-1])/2., time[-1]]
cumsig = np.r_[0, (np.diff(midpoint) * signal).cumsum()]
grid = np.linspace(0, time[-1], np.floor(time[-1]/period)*ipp + 1)
cumsig = np.interp(grid, midpoint, cumsig)
return np.diff(cumsig).reshape(-1, ipp).sum(0) / period
答案 2 :(得分:1)
我根据以前的答案和熊猫制定了一个完全符合我想要的功能。
def aggregate_by_time(signal, time, period=86400, interval=900, label='left'):
"""
Function to calculate the aggregated average of a timeseries by
period (typical a day) in bins of interval seconds (default = 900s).
label = 'left' or 'right'. 'Left' means that the label i contains data from
i till i+1, 'right' means that label i contains data from i-1 till i.
Returns an array with period/interval values, one for each interval
of the period.
Note: the period has to be a multiple of the interval
"""
def make_datetimeindex(array_in_seconds, year):
"""
Create a pandas DateIndex from a time vector in seconds and the year.
"""
start = pandas.datetime(year, 1, 1)
datetimes = [start + pandas.datetools.timedelta(t/86400.) for t in array_in_seconds]
return pandas.DatetimeIndex(datetimes)
interval_string = str(interval) + 'S'
dr = make_datetimeindex(time, 2012)
df = pandas.DataFrame(data=signal, index=dr, columns=['signal'])
df15min = df.resample(interval_string, closed=label, label=label)
# now create bins for the groupby() method
time_s = df15min.index.asi8/1e9
time_s -= time_s[0]
df15min['bins'] = np.mod(time_s, period)
df_aggr = df15min.groupby(['bins']).mean()
# if you only need the numpy array: take df_aggr.values
return df_aggr