Question

我有一个时间序列pandas数据帧，我已经计算了一个新列

main::

然而，在我标准化和滚动之前，我希望将其提升到5％的水平。因此，对于任何数据点，如果它超出5％分位数，则回顾252天，将其剪切为5％分位数，然后进行标准化。我无法弄清楚如何使其与df['std_series']= ( df['series1']-df['series1'].rolling(252).mean() )/ df['series1'].rolling(252).std()一起使用。

例如（滚动10个元素）：
rolling.apply
并假设我剪辑（df = pd.DataFrame({'series1':[78, 1, 3, 4, 5, 6, 7, 8, 99]})和0.15）。然后剪辑级别为：0.85。然后在标准化之前预期的winsorized窗口将是 (min=3.2, max=64)

我发现的所有示例都是对数据框或整个列进行了优化。

Answer 1

使用df.iterrows的解决方案：

首先设置参数：

import pandas as pd
import numpy as np

#Sample:
df = pd.DataFrame({'series1':[78, 1, 3, 4, 5, 6, 7, 8, 99]})

#Parameters:
win_size = 9 #size of the rolling window
p = (5,85) #percentile (min,max) between (0,100)

然后进行迭代：

window = [] #the rolling window
output = [] #the output

# Iterate over your df
for index, row in df.iterrows():
    #Update your output
    output = np.append(output,row.series1)

    #Manage the window
    window = np.append(window,row.series1) #append the element
    if len(window) > win_size: #skip the first if window is full
        window = np.delete(window,0)

    #Winsorize
    if len(window) == win_size:
        ll = np.round(np.percentile(window,p[0]),2) #Find the lower limit
        ul = np.round(np.percentile(window,p[1]),2) #Find the upper limit

        window = np.clip(window, ll , ul) #Clip the window

    output[-win_size:] = window #Update your output with the winsorized data

df['winsorized'] = output #Append to your dataframe
print(df)

结果：

   series1  winsorized
0       78        64.0
1        1         3.2
2        3         3.2
3        4         4.0
4        5         5.0
5        6         6.0
6        7         7.0
7        8         8.0
8       99        64.0

如果您想要对第一个数据进行winsorize，即使窗口未满，也可以删除if len(window) == win_size:。

Python pandas滚动winsorize

1 个答案: