我是熊猫的新手,我正在尝试建立队列分析。我需要包含此群组之前期间累积值的列。 例如,对于此数据框
Canceled CohortGroup NewCustomers CancelPeriod 2016-05 75 2016-07 2 2016-08 5 2016-09 6 2016-10 7 2016-11 6 2016-12 2 2017-01 5 2017-02 6 2017-03 1 2017-04 5 2017-05 6 2017-06 1 2016-06 81 2016-07 1 2016-08 3 2016-09 4 2016-10 1 2016-11 6 2016-12 2 2017-01 5 2017-02 3 2017-03 3 2017-04 4 2017-05 4 2017-06 4 2016-07 139 2016-07 1 2016-08 6 2016-09 4 2016-10 8 2016-11 13 2016-12 5
我希望看到这样的输出:
CanceledCustomers TotalCancCust CohortGroup NewCustomers CancelPeriod 2016-05 75 2016-07 2 2 2016-08 5 7 2016-09 6 13 2016-10 7 19 2016-11 6 25 2016-12 2 27 2017-01 5 32 2017-02 6 38 2017-03 1 39 2017-04 5 44 2017-05 6 50 2017-06 1 51 2016-06 81 2016-07 1 1 2016-08 3 4 2016-09 4 8 2016-10 1 9 2016-11 6 15 2016-12 2 17 2017-01 5 22 2017-02 3 25 2017-03 3 28 2017-04 4 32 2017-05 4 36 2017-06 4 40 2016-07 139 2016-07 1 1 2016-08 6 7 2016-09 4 11 2016-10 8 19 2016-11 13 32 2016-12 5 37
我该怎么做?
答案 0 :(得分:0)
#by first level
df['TotalCancCust'] = df.groupby(level=0)['CanceledCustomers'].cumsum()
#by level with name CohortGroup
df['TotalCancCust'] = df.groupby(level='CohortGroup')['CanceledCustomers'].cumsum()
#in last version of pandas (0.20.0+) level can be omit
df['TotalCancCust'] = df.groupby('CohortGroup')['CanceledCustomers'].cumsum()
print (df)
CanceledCustomers TotalCancCust
CohortGroup NewCustomers CancelPeriod
2016-05 75 2016-07 2 2
2016-08 5 7
2016-09 6 13
2016-10 7 20
2016-11 6 26
2016-12 2 28
2017-01 5 33
2017-02 6 39
2017-03 1 40
2017-04 5 45
2017-05 6 51
2017-06 1 52
2016-06 81 2016-07 1 1
2016-08 3 4
2016-09 4 8
2016-10 1 9
2016-11 6 15
2016-12 2 17
2017-01 5 22
2017-02 3 25
2017-03 3 28
2017-04 4 32
2017-05 4 36
2017-06 4 40
2016-07 139 2016-07 1 1
2016-08 6 7
2016-09 4 11
2016-10 8 19
2016-11 13 32
2016-12 5 37
答案 1 :(得分:0)
首先向前填充您的Dataframe并执行groupby
df = df.fillna(method='ffill')
df['TotalCancCust'] = df.groupby(['CohortGroup'])['CanceledCustomers'].cumsum()