在R中,您可以计算具有指定窗口的滚动平均值,该窗口可以每次移动指定量。
但是,也许我只是在任何地方都找不到它,但似乎无法在熊猫或其他python库中做到这一点?
有人知道解决此问题的方法吗?我给你一个我的意思的例子:
这里有半个月的数据,我正在计算两个月的移动平均值,每个月都在变化。
因此,在R中,我将执行以下操作:two_month__movavg=rollapply(mydata,4,mean,by = 2,na.pad = FALSE)
Python中没有等效的东西吗?
EDIT1:
DATE A DEMAND ... AA DEMAND A Price
0 2006/01/01 00:30:00 8013.27833 ... 5657.67500 20.03
1 2006/01/01 01:00:00 7726.89167 ... 5460.39500 18.66
2 2006/01/01 01:30:00 7372.85833 ... 5766.02500 20.38
3 2006/01/01 02:00:00 7071.83333 ... 5503.25167 18.59
4 2006/01/01 02:30:00 6865.44000 ... 5214.01500 17.53
答案 0 :(得分:2)
您可以再次使用滚动,只需分配索引就可以进行操作
这里by = 2
by = 2
df.loc[df.index[np.arange(len(df))%by==1],'New']=df.Price.rolling(window=4).mean()
df
Price New
0 63 NaN
1 92 NaN
2 92 NaN
3 5 63.00
4 90 NaN
5 3 47.50
6 81 NaN
7 98 68.00
8 100 NaN
9 58 84.25
10 38 NaN
11 15 52.75
12 75 NaN
13 19 36.75
答案 1 :(得分:2)
如果数据大小不太大,这是一种简单的方法:
by = 2
win = 4
start = 3 ## it is the index of your 1st valid value.
df.rolling(win).mean()[start::by] ## calculate all, choose what you need.
答案 2 :(得分:1)
现在,对于一维数据数组来说,这有点过头了,但是您可以简化它并提取所需的内容。由于熊猫可以依赖numpy,因此您可能需要检查一下熊猫的滚动/跨步功能(如果实现了)。 结果为20个连续数字。 7天的窗口,以2的幅度大步/滑动
z = np.arange(20)
z #array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
s = stride(z, (7,), (2,))
np.mean(s, axis=1) # array([ 3., 5., 7., 9., 11., 13., 15.])
这是我使用的没有文档主要内容的代码。它源自numpy中stridd函数的许多实现,可以在此站点上找到。有变体和化身,这只是另一个。
def stride(a, win=(3, 3), stepby=(1, 1)):
"""Provide a 2D sliding/moving view of an array.
There is no edge correction for outputs. Use the `pad_` function first."""
err = """Array shape, window and/or step size error.
Use win=(3,) with stepby=(1,) for 1D array
or win=(3,3) with stepby=(1,1) for 2D array
or win=(1,3,3) with stepby=(1,1,1) for 3D
---- a.ndim != len(win) != len(stepby) ----
"""
from numpy.lib.stride_tricks import as_strided
a_ndim = a.ndim
if isinstance(win, int):
win = (win,) * a_ndim
if isinstance(stepby, int):
stepby = (stepby,) * a_ndim
assert (a_ndim == len(win)) and (len(win) == len(stepby)), err
shp = np.array(a.shape) # array shape (r, c) or (d, r, c)
win_shp = np.array(win) # window (3, 3) or (1, 3, 3)
ss = np.array(stepby) # step by (1, 1) or (1, 1, 1)
newshape = tuple(((shp - win_shp) // ss) + 1) + tuple(win_shp)
newstrides = tuple(np.array(a.strides) * ss) + a.strides
a_s = as_strided(a, shape=newshape, strides=newstrides, subok=True).squeeze()
return a_s
我未能指出您可以创建输出,并将其作为列附加到熊猫中。回到上面使用的原始定义
nans = np.full_like(z, np.nan, dtype='float') # z is the 20 number sequence
means = np.mean(s, axis=1) # results from the strided mean
# assign the means to the output array skipping the first and last 3 and striding by 2
nans[3:-3:2] = means
nans # array([nan, nan, nan, 3., nan, 5., nan, 7., nan, 9., nan, 11., nan, 13., nan, 15., nan, nan, nan, nan])
答案 3 :(得分:1)
所以,我知道这个问题已经很久了,因为我遇到了同样的问题,并且在处理长时间序列时,您确实希望避免对您不感兴趣的值进行不必要的计算。由于 Pandas 滚动方法没有实现 step
参数,我使用 numpy 编写了一个解决方法。
它基本上是 this link 中的解决方案和 BENY 提出的索引的组合。
def apply_rolling_data(data, col, function, window, step=1, labels=None):
"""Perform a rolling window analysis at the column `col` from `data`
Given a dataframe `data` with time series, call `function` at
sections of length `window` at the data of column `col`. Append
the results to `data` at a new columns with name `label`.
Parameters
----------
data : DataFrame
Data to be analyzed, the dataframe must stores time series
columnwise, i.e., each column represent a time series and each
row a time index
col : str
Name of the column from `data` to be analyzed
function : callable
Function to be called to calculate the rolling window
analysis, the function must receive as input an array or
pandas series. Its output must be either a number or a pandas
series
window : int
length of the window to perform the analysis
step : int
step to take between two consecutive windows
labels : str
Name of the column for the output, if None it defaults to
'MEASURE'. It is only used if `function` outputs a number, if
it outputs a Series then each index of the series is going to
be used as the names of their respective columns in the output
Returns
-------
data : DataFrame
Input dataframe with added columns with the result of the
analysis performed
"""
x = _strided_app(data[col].to_numpy(), window, step)
rolled = np.apply_along_axis(function, 1, x)
if labels is None:
labels = [f"metric_{i}" for i in range(rolled.shape[1])]
for col in labels:
data[col] = np.nan
data.loc[
data.index[
[False]*(window-1)
+ list(np.arange(len(data) - (window-1)) % step == 0)],
labels] = rolled
return data
def _strided_app(a, L, S): # Window len = L, Stride len/stepsize = S
"""returns an array that is strided
"""
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(
a, shape=(nrows, L), strides=(S*n, n))