Python pandas滚动winsorize

时间:2018-01-17 11:17:48

标签: python pandas apply rolling-computation

我有一个时间序列pandas数据帧,我已经计算了一个新列

main::

然而,在我标准化和滚动之前,我希望将其提升到5%的水平。因此,对于任何数据点,如果它超出5%分位数,则回顾252天,将其剪切为5%分位数,然后进行标准化。 我无法弄清楚如何使其与df['std_series']= ( df['series1']-df['series1'].rolling(252).mean() )/ df['series1'].rolling(252).std() 一起使用。

例如(滚动10个元素):
   rolling.apply
并假设我剪辑(df = pd.DataFrame({'series1':[78, 1, 3, 4, 5, 6, 7, 8, 99]})0.15)。然后剪辑级别为:0.85。 然后在标准化之前预期的winsorized窗口将是 (min=3.2, max=64)

我发现的所有示例都是对数据框或整个列进行了优化。

1 个答案:

答案 0 :(得分:1)

使用df.iterrows的解决方案:

首先设置参数:

import pandas as pd
import numpy as np

#Sample:
df = pd.DataFrame({'series1':[78, 1, 3, 4, 5, 6, 7, 8, 99]})

#Parameters:
win_size = 9 #size of the rolling window
p = (5,85) #percentile (min,max) between (0,100)

然后进行迭代:

window = [] #the rolling window
output = [] #the output

# Iterate over your df
for index, row in df.iterrows():
    #Update your output
    output = np.append(output,row.series1)

    #Manage the window
    window = np.append(window,row.series1) #append the element
    if len(window) > win_size: #skip the first if window is full
        window = np.delete(window,0)

    #Winsorize
    if len(window) == win_size:
        ll = np.round(np.percentile(window,p[0]),2) #Find the lower limit
        ul = np.round(np.percentile(window,p[1]),2) #Find the upper limit

        window = np.clip(window, ll , ul) #Clip the window

    output[-win_size:] = window #Update your output with the winsorized data

df['winsorized'] = output #Append to your dataframe
print(df)

结果:

   series1  winsorized
0       78        64.0
1        1         3.2
2        3         3.2
3        4         4.0
4        5         5.0
5        6         6.0
6        7         7.0
7        8         8.0
8       99        64.0

如果您想要对第一个数据进行winsorize,即使窗口未满,也可以删除if len(window) == win_size: