熊猫用单面窗户滚动

时间:2017-09-24 17:16:51

标签: python pandas dataframe

我正在尝试使用pandas.DataFrame.rolling来实现以下目标:

在索引i,我希望使用{{1}来滚动summeanmedian,...以获取最后size_win个值窗口。只考虑过去的值(即索引parzen)而考虑未来的任何值(这是&#34;什么是至关重要我们在时间<i?&#34;情景中得到的信息。第二个约束是:我想要一个单侧i窗口,即索引parzen的值应该得到最大权重,i一个较小的权重,i-1一个更小的权重,...,i-2最小重量。

使用标准

i-size_win

对我不起作用,因为它会给出最小权重索引df.rolling(window=size_win, win_type='parzen').sum()和最大权重i。提供i-(size_win/2)参数将为索引center提供最大权重,但也会使用未来的i值进行计算。

我找到了使用>i的解决方案,但这当然是非常慢的。

请参阅以下示例:

pandas.DataFrame.rolling(...).apply

在我的情况下内置滚动需要1.3秒(产生的不是我想要的结果),我自己的解决方案需要54秒。

如何更有效地解决这个问题?

3 个答案:

答案 0 :(得分:2)

在推理中发现了我自己的错误:

df_rolled = df.rolling(window=size_win).apply(lambda x: custom_rolling_sum(x, window_single_sided_parzen(size_win)))

我天真地想,它只会召唤一次昂贵的函数window_single_sided_parzen(size_win)。事实上,每一行都需要它。切换到

win = window_single_sided_parzen(size_win)
df_rolled = df.rolling(window=size_win).apply(lambda x: custom_rolling_sum(x, win))

快得多。没有内置功能那么快,但足够快。

答案 1 :(得分:0)

我认为这可能很糟糕......但我对你的单边历史滚动平均值有类似的需求。我希望能够以正常的方式使用内在函数......我想我完成了这样做:

# %% Import Base Packages
import pandas as pd
import re
import numpy as np
import matplotlib.pyplot as plt
# end%%

# %% Import other packages to overwrite
from pandas.core import window as rwindow
from pandas.core.dtypes.generic import (ABCSeries,ABCDataFrame)
from pandas.core.dtypes.common import is_integer
# end%%

# %% Overwrite Functions and methods
class Window_single_sided(rwindow.Window):
    def _prep_window(self, **kwargs):
        """
        provide validation for our window type, return the window
        we have already been validated
        """

        window = self._get_window()
        if isinstance(window, (list, tuple, np.ndarray)):
            return _asarray_tuplesafe(window).astype(float)
        elif is_integer(window):
            import scipy.signal as sig

            # the below may pop from kwargs
            def _validate_win_type(win_type, kwargs):
                arg_map = {'kaiser': ['beta'],
                           'gaussian': ['std'],
                           'general_gaussian': ['power', 'width'],
                           'slepian': ['width']}
                if win_type in arg_map:
                    return tuple([win_type] + _pop_args(win_type,
                                                        arg_map[win_type],
                                                        kwargs))
                return win_type

            def _pop_args(win_type, arg_names, kwargs):
                msg = '%s window requires %%s' % win_type
                all_args = []
                for n in arg_names:
                    if n not in kwargs:
                        raise ValueError(msg % n)
                    all_args.append(kwargs.pop(n))
                return all_args

            win_type = _validate_win_type(self.win_type, kwargs)
            # GH #15662. `False` makes symmetric window, rather than periodic.
            #----Only Line I changed to get a single sided window----
            return sig.get_window(win_type, (window-1)*2+1, False).astype(float)[0:window]

def rolling_new(obj, win_type=None, **kwds):
    if not isinstance(obj, (ABCSeries, ABCDataFrame)):
        raise TypeError('invalid type: %s' % type(obj))


    if win_type is not None:

        # ---Updated to use the new single_sided class when appropriate
        if win_type.endswith('_single_sided'):
            return Window_single_sided(obj, win_type=re.sub('\_single_sided$', '',win_type), **kwds)
        #----Had to rwindow prefaces here...
        return rwindow.Window(obj, win_type=win_type, **kwds)

    return rwindow.Rolling(obj, **kwds)

# Here we set this new method instead of the existing one.
rwindow.rolling = rolling_new
# end%%

# %% Here we test it out
df = pd.DataFrame([0,1,2,3,4,5,6,7,8])

df['triang'] = df[0].rolling(5,win_type='triang').sum()
df['triang_single_sided'] = df[0].rolling(5,win_type='triang_single_sided').sum()
df['boxcar'] = df[0].rolling(5,win_type='boxcar').sum()
ax = df.plot(x=0,y=['triang','triang_single_sided','boxcar'])
ax.set_ylabel('Sum with different Methods')
# end%%

# %% Here we test it out
from scipy.stats import norm
t = np.linspace(0,2*np.pi*2,5000)
y = np.sin(t)*10 + norm.rvs(size=5000)

df = pd.DataFrame({'t':t,'y':y})
df
df['triang'] = df['y'].rolling(50,win_type='triang').mean()
df['triang_single_sided'] = df['y'].rolling(50,win_type='triang_single_sided').mean()
df['boxcar'] = df['y'].rolling(50,win_type='boxcar').mean()
ax = df.plot(x=t,y=['y','triang','triang_single_sided','boxcar'])
ax.set_ylabel('Mean with different Methods')
plt.show()
# end%%

答案 2 :(得分:0)

试试numpy.convolve。它很快。您可以构建任何您想要的窗口函数作为内核,并将其应用于您的系列。为防止“未来”值影响滚动函数,请填充内核,使其一半包含零。

以下是计算加权移动平均数的示例:

import pandas as pd
import numpy as np

def wma(arr, period):
    kernel = np.arange(period, 0, -1)
    kernel = np.concatenate([np.zeros(period - 1), kernel / kernel.sum()])
    return np.convolve(arr, kernel, 'same')

df = pd.DataFrame({'value':np.arange(11)})
df['wma'] = wma(df['value'], 4)

如果您反转内核,您也可以使用 numpy.correlate