我在分组数据框架上计算了滚动总和,但是当我需要过去的总和时,它是错误的方式,它是未来的总和。
我在这里做错了什么?
我导入数据并按维度和日期排序(我已尝试删除日期排序)
df = pd.read_csv('Input.csv', parse_dates=True)
df.sort_values(['Dimension','Date'])
print(df)
然后我创建一个新列,它是一个按滚动窗口分组的多索引
new_column = df.groupby('Dimension').Value1.apply(lambda x:
x.rolling(window=3).sum())
然后我将索引重置为与原始
相同df['Sum_Value1'] = new_column.reset_index(level=0, drop=True)
print(df)
我还尝试在计算之前反转索引,但也失败了。
输入
Dimension,Date,Value1,Value2
1,4/30/2002,10,20
1,1/31/2002,10,20
1,10/31/2001,10,20
1,7/31/2001,10,20
1,4/30/2001,10,20
1,1/31/2001,10,20
1,10/31/2000,10,20
2,4/30/2002,10,20
2,1/31/2002,10,20
2,10/31/2001,10,20
2,7/31/2001,10,20
2,4/30/2001,10,20
2,1/31/2001,10,20
2,10/31/2000,10,20
3,4/30/2002,10,20
3,1/31/2002,10,20
3,10/31/2001,10,20
3,7/31/2001,10,20
3,1/31/2001,10,20
3,10/31/2000,10,20
输出:
Dimension Date Value1 Value2 Sum_Value1
0 1 4/30/2002 10 20 NaN
1 1 1/31/2002 10 20 NaN
2 1 10/31/2001 10 20 30.0
3 1 7/31/2001 10 20 30.0
4 1 4/30/2001 10 20 30.0
5 1 1/31/2001 10 20 30.0
6 1 10/31/2000 10 20 30.0
7 2 4/30/2002 10 20 NaN
8 2 1/31/2002 10 20 NaN
9 2 10/31/2001 10 20 30.0
10 2 7/31/2001 10 20 30.0
11 2 4/30/2001 10 20 30.0
12 2 1/31/2001 10 20 30.0
13 2 10/31/2000 10 20 30.0
目标输出:
Dimension Date Value1 Value2 Sum_Value1
0 1 4/30/2002 10 20 30.0
1 1 1/31/2002 10 20 30.0
2 1 10/31/2001 10 20 30.0
3 1 7/31/2001 10 20 30.0
4 1 4/30/2001 10 20 30.0
5 1 1/31/2001 10 20 NaN
6 1 10/31/2000 10 20 NaN
7 2 4/30/2002 10 20 30.0
8 2 1/31/2002 10 20 30.0
9 2 10/31/2001 10 20 30.0
10 2 7/31/2001 10 20 30.0
11 2 4/30/2001 10 20 30.0
12 2 1/31/2001 10 20 Nan
13 2 10/31/2000 10 20 NaN
答案 0 :(得分:4)
你需要一个向后的总和,因此在总结它之前反转你的系列:
lambda x: x[::-1].rolling(window=3).sum()
答案 1 :(得分:2)
向后滚动与向前滚动相同,然后移动结果:
x.rolling(window=3).sum().shift(-2)
答案 2 :(得分:1)
您可以按window-1
移动结果以获得左对齐结果:
df["sum_value1"] = (df.groupby('Dimension').Value1
.apply(lambda x: x.rolling(window=3).sum().shift(-2)))
答案 3 :(得分:0)
def reverse_rolling(series, window, func):
index = series.index
series = pd.DataFrame(series.iloc[::-1])
series = series.rolling(window, 1).apply(func)
series = series.iloc[::-1]
series['index'] = index
series = series.set_index('index')
return series[0]
答案 4 :(得分:0)
你可以使用
import pandas as pd
from pandas.api.indexers import FixedForwardWindowIndexer
df = pd.read_csv(r'C:\Users\xxxx\python\data.txt')
indexer = FixedForwardWindowIndexer(window_size=3)
df1 = df.join(df.groupby('Dimension')['Value1'].rolling(indexer, min_periods=3).sum().to_frame().reset_index(), rsuffix='_sum')
del df1['Dimension_sum']
del df1['level_1']
df1
输入:
Dimension Date Value1 Value2
0 1 4/30/2002 10 20
1 1 1/31/2002 10 20
2 1 10/31/2001 10 20
3 1 7/31/2001 10 20
4 1 4/30/2001 10 20
5 1 1/31/2001 10 20
6 1 10/31/2000 10 20
7 2 4/30/2002 10 20
8 2 1/31/2002 10 20
9 2 10/31/2001 10 20
10 2 7/31/2001 10 20
11 2 4/30/2001 10 20
12 2 1/31/2001 10 20
13 2 10/31/2000 10 20
14 3 4/30/2002 10 20
15 3 1/31/2002 10 20
16 3 10/31/2001 10 20
17 3 7/31/2001 10 20
18 3 1/31/2001 10 20
19 3 10/31/2000 10 20
输出:
Dimension Date Value1 Value2 Value1_sum
0 1 4/30/2002 10 20 30.0
1 1 1/31/2002 10 20 30.0
2 1 10/31/2001 10 20 30.0
3 1 7/31/2001 10 20 30.0
4 1 4/30/2001 10 20 30.0
5 1 1/31/2001 10 20 NaN
6 1 10/31/2000 10 20 NaN
7 2 4/30/2002 10 20 30.0
8 2 1/31/2002 10 20 30.0
9 2 10/31/2001 10 20 30.0
10 2 7/31/2001 10 20 30.0
11 2 4/30/2001 10 20 30.0
12 2 1/31/2001 10 20 NaN
13 2 10/31/2000 10 20 NaN
14 3 4/30/2002 10 20 30.0
15 3 1/31/2002 10 20 30.0
16 3 10/31/2001 10 20 30.0
17 3 7/31/2001 10 20 30.0
18 3 1/31/2001 10 20 NaN
19 3 10/31/2000 10 20 NaN