我有一个时间序列pandas数据帧,我已经计算了一个新列
main::
然而,在我标准化和滚动之前,我希望将其提升到5%的水平。因此,对于任何数据点,如果它超出5%分位数,则回顾252天,将其剪切为5%分位数,然后进行标准化。
我无法弄清楚如何使其与df['std_series']= ( df['series1']-df['series1'].rolling(252).mean() )/ df['series1'].rolling(252).std()
一起使用。
例如(滚动10个元素):
rolling.apply
并假设我剪辑(df = pd.DataFrame({'series1':[78, 1, 3, 4, 5, 6, 7, 8, 99]})
和0.15
)。然后剪辑级别为:0.85
。
然后在标准化之前预期的winsorized窗口将是
(min=3.2, max=64)
我发现的所有示例都是对数据框或整个列进行了优化。
答案 0 :(得分:1)
使用df.iterrows
的解决方案:
首先设置参数:
import pandas as pd
import numpy as np
#Sample:
df = pd.DataFrame({'series1':[78, 1, 3, 4, 5, 6, 7, 8, 99]})
#Parameters:
win_size = 9 #size of the rolling window
p = (5,85) #percentile (min,max) between (0,100)
然后进行迭代:
window = [] #the rolling window
output = [] #the output
# Iterate over your df
for index, row in df.iterrows():
#Update your output
output = np.append(output,row.series1)
#Manage the window
window = np.append(window,row.series1) #append the element
if len(window) > win_size: #skip the first if window is full
window = np.delete(window,0)
#Winsorize
if len(window) == win_size:
ll = np.round(np.percentile(window,p[0]),2) #Find the lower limit
ul = np.round(np.percentile(window,p[1]),2) #Find the upper limit
window = np.clip(window, ll , ul) #Clip the window
output[-win_size:] = window #Update your output with the winsorized data
df['winsorized'] = output #Append to your dataframe
print(df)
结果:
series1 winsorized
0 78 64.0
1 1 3.2
2 3 3.2
3 4 4.0
4 5 5.0
5 6 6.0
6 7 7.0
7 8 8.0
8 99 64.0
如果您想要对第一个数据进行winsorize,即使窗口未满,也可以删除if len(window) == win_size:
。