前几个月将1 ... N个值的值移为单独的列

时间:2019-01-10 15:15:56

标签: python pandas datetime group-by pandas-groupby

我有以下数据:

import pandas as pd
import numpy as np

data = pd.DataFrame({
    'proj': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
    'date': ['2018-08-01', '2018-09-01', '2018-10-01', '2018-11-01', '2018-12-01', '2019-01-01', '2018-06-01', '2018-07-01', '2018-08-01', '2018-09-01'],
    'value': [10, 3, 15, 16, -20, 2, 1, 3, 3, 0]
})

data

最后我想拥有:

expected = pd.DataFrame({
    'proj': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
    'date': ['2018-08-01', '2018-09-01', '2018-10-01', '2018-11-01', '2018-12-01', '2019-01-01', '2018-06-01', '2018-07-01', '2018-08-01', '2018-09-01'],
    'value': [10, 3, 15, 16, -20, 2, 1, 3, 3, 0],
    'prev_month_value': [np.NaN, 10, 3, 15, 16, -20, np.NaN, 1, 3, 3],
    'prev_prev_month_value': [np.NaN, np.NaN, 10, 3, 15, 16, np.NaN, np.NaN, 1, 3]
})

expected

如何在熊猫中做到这一点?

1 个答案:

答案 0 :(得分:1)

您可以在dict理解内调用GroupBy.shift,然后在以下情况下concat进行搜索:

N = 2

g = data.groupby('proj')
u = pd.DataFrame({
    ('prev_'*i) + 'month_value': g['value'].shift(i) for i in range(1, N + 1)})
pd.concat([data, u], axis=1)

 proj        date  value  prev_month_value  prev_prev_month_value
0    A  2018-08-01     10               NaN                    NaN
1    A  2018-09-01      3              10.0                    NaN
2    A  2018-10-01     15               3.0                   10.0
3    A  2018-11-01     16              15.0                    3.0
4    A  2018-12-01    -20              16.0                   15.0
5    A  2019-01-01      2             -20.0                   16.0
6    B  2018-06-01      1               NaN                    NaN
7    B  2018-07-01      3               1.0                    NaN
8    B  2018-08-01      3               3.0                    1.0
9    B  2018-09-01      0               3.0                    3.0