我正在尝试使用pandas.DataFrame.rolling
来实现以下目标:
在索引i
,我希望使用{{1}来滚动sum
,mean
,median
,...以获取最后size_win
个值窗口。只考虑过去的值(即索引parzen
)而不考虑未来的任何值(这是&#34;什么是至关重要我们在时间<i
?&#34;情景中得到的信息。第二个约束是:我想要一个单侧i
窗口,即索引parzen
的值应该得到最大权重,i
一个较小的权重,i-1
一个更小的权重,...,i-2
最小重量。
使用标准
i-size_win
对我不起作用,因为它会给出最小权重索引df.rolling(window=size_win, win_type='parzen').sum()
和最大权重i
。提供i-(size_win/2)
参数将为索引center
提供最大权重,但也会使用未来的i
值进行计算。
我找到了使用>i
的解决方案,但这当然是非常慢的。
请参阅以下示例:
pandas.DataFrame.rolling(...).apply
在我的情况下内置滚动需要1.3秒(产生的不是我想要的结果),我自己的解决方案需要54秒。
如何更有效地解决这个问题?
答案 0 :(得分:2)
在推理中发现了我自己的错误:
df_rolled = df.rolling(window=size_win).apply(lambda x: custom_rolling_sum(x, window_single_sided_parzen(size_win)))
我天真地想,它只会召唤一次昂贵的函数window_single_sided_parzen(size_win)
。事实上,每一行都需要它。切换到
win = window_single_sided_parzen(size_win)
df_rolled = df.rolling(window=size_win).apply(lambda x: custom_rolling_sum(x, win))
快得多。没有内置功能那么快,但足够快。
答案 1 :(得分:0)
我认为这可能很糟糕......但我对你的单边历史滚动平均值有类似的需求。我希望能够以正常的方式使用内在函数......我想我完成了这样做:
# %% Import Base Packages
import pandas as pd
import re
import numpy as np
import matplotlib.pyplot as plt
# end%%
# %% Import other packages to overwrite
from pandas.core import window as rwindow
from pandas.core.dtypes.generic import (ABCSeries,ABCDataFrame)
from pandas.core.dtypes.common import is_integer
# end%%
# %% Overwrite Functions and methods
class Window_single_sided(rwindow.Window):
def _prep_window(self, **kwargs):
"""
provide validation for our window type, return the window
we have already been validated
"""
window = self._get_window()
if isinstance(window, (list, tuple, np.ndarray)):
return _asarray_tuplesafe(window).astype(float)
elif is_integer(window):
import scipy.signal as sig
# the below may pop from kwargs
def _validate_win_type(win_type, kwargs):
arg_map = {'kaiser': ['beta'],
'gaussian': ['std'],
'general_gaussian': ['power', 'width'],
'slepian': ['width']}
if win_type in arg_map:
return tuple([win_type] + _pop_args(win_type,
arg_map[win_type],
kwargs))
return win_type
def _pop_args(win_type, arg_names, kwargs):
msg = '%s window requires %%s' % win_type
all_args = []
for n in arg_names:
if n not in kwargs:
raise ValueError(msg % n)
all_args.append(kwargs.pop(n))
return all_args
win_type = _validate_win_type(self.win_type, kwargs)
# GH #15662. `False` makes symmetric window, rather than periodic.
#----Only Line I changed to get a single sided window----
return sig.get_window(win_type, (window-1)*2+1, False).astype(float)[0:window]
def rolling_new(obj, win_type=None, **kwds):
if not isinstance(obj, (ABCSeries, ABCDataFrame)):
raise TypeError('invalid type: %s' % type(obj))
if win_type is not None:
# ---Updated to use the new single_sided class when appropriate
if win_type.endswith('_single_sided'):
return Window_single_sided(obj, win_type=re.sub('\_single_sided$', '',win_type), **kwds)
#----Had to rwindow prefaces here...
return rwindow.Window(obj, win_type=win_type, **kwds)
return rwindow.Rolling(obj, **kwds)
# Here we set this new method instead of the existing one.
rwindow.rolling = rolling_new
# end%%
# %% Here we test it out
df = pd.DataFrame([0,1,2,3,4,5,6,7,8])
df['triang'] = df[0].rolling(5,win_type='triang').sum()
df['triang_single_sided'] = df[0].rolling(5,win_type='triang_single_sided').sum()
df['boxcar'] = df[0].rolling(5,win_type='boxcar').sum()
ax = df.plot(x=0,y=['triang','triang_single_sided','boxcar'])
ax.set_ylabel('Sum with different Methods')
# end%%
# %% Here we test it out
from scipy.stats import norm
t = np.linspace(0,2*np.pi*2,5000)
y = np.sin(t)*10 + norm.rvs(size=5000)
df = pd.DataFrame({'t':t,'y':y})
df
df['triang'] = df['y'].rolling(50,win_type='triang').mean()
df['triang_single_sided'] = df['y'].rolling(50,win_type='triang_single_sided').mean()
df['boxcar'] = df['y'].rolling(50,win_type='boxcar').mean()
ax = df.plot(x=t,y=['y','triang','triang_single_sided','boxcar'])
ax.set_ylabel('Mean with different Methods')
plt.show()
# end%%
答案 2 :(得分:0)
试试numpy.convolve
。它很快。您可以构建任何您想要的窗口函数作为内核,并将其应用于您的系列。为防止“未来”值影响滚动函数,请填充内核,使其一半包含零。
以下是计算加权移动平均数的示例:
import pandas as pd
import numpy as np
def wma(arr, period):
kernel = np.arange(period, 0, -1)
kernel = np.concatenate([np.zeros(period - 1), kernel / kernel.sum()])
return np.convolve(arr, kernel, 'same')
df = pd.DataFrame({'value':np.arange(11)})
df['wma'] = wma(df['value'], 4)
如果您反转内核,您也可以使用 numpy.correlate
。