我有以下数据:
import pandas as pd
import numpy as np
data = pd.DataFrame({
'proj': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'date': ['2018-08-01', '2018-09-01', '2018-10-01', '2018-11-01', '2018-12-01', '2019-01-01', '2018-06-01', '2018-07-01', '2018-08-01', '2018-09-01'],
'value': [10, 3, 15, 16, -20, 2, 1, 3, 3, 0]
})
最后我想拥有:
expected = pd.DataFrame({
'proj': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'date': ['2018-08-01', '2018-09-01', '2018-10-01', '2018-11-01', '2018-12-01', '2019-01-01', '2018-06-01', '2018-07-01', '2018-08-01', '2018-09-01'],
'value': [10, 3, 15, 16, -20, 2, 1, 3, 3, 0],
'prev_month_value': [np.NaN, 10, 3, 15, 16, -20, np.NaN, 1, 3, 3],
'prev_prev_month_value': [np.NaN, np.NaN, 10, 3, 15, 16, np.NaN, np.NaN, 1, 3]
})
如何在熊猫中做到这一点?
答案 0 :(得分:1)
您可以在dict理解内调用GroupBy.shift
,然后在以下情况下concat
进行搜索:
N = 2
g = data.groupby('proj')
u = pd.DataFrame({
('prev_'*i) + 'month_value': g['value'].shift(i) for i in range(1, N + 1)})
pd.concat([data, u], axis=1)
proj date value prev_month_value prev_prev_month_value
0 A 2018-08-01 10 NaN NaN
1 A 2018-09-01 3 10.0 NaN
2 A 2018-10-01 15 3.0 10.0
3 A 2018-11-01 16 15.0 3.0
4 A 2018-12-01 -20 16.0 15.0
5 A 2019-01-01 2 -20.0 16.0
6 B 2018-06-01 1 NaN NaN
7 B 2018-07-01 3 1.0 NaN
8 B 2018-08-01 3 3.0 1.0
9 B 2018-09-01 0 3.0 3.0