我有一个交易数据,如下所示。这是3个月的数据。
Card_Number Card_type Category Amount Date
0 1 PLATINUM GROCERY 100 10-Jan-18
1 1 PLATINUM HOTEL 2000 14-Jan-18
2 1 PLATINUM GROCERY 500 17-Jan-18
3 1 PLATINUM GROCERY 300 20-Jan-18
4 1 PLATINUM RESTRAUNT 400 22-Jan-18
5 1 PLATINUM HOTEL 500 5-Feb-19
6 1 PLATINUM GROCERY 400 11-Feb-19
7 1 PLATINUM RESTRAUNT 600 21-Feb-19
8 1 PLATINUM GROCERY 800 17-Mar-17
9 1 PLATINUM GROCERY 200 21-Mar-17
10 2 GOLD GROCERY 1000 12-Jan-18
11 2 GOLD HOTEL 3000 14-Jan-18
12 2 GOLD RESTRAUNT 500 19-Jan-18
13 2 GOLD GROCERY 300 20-Jan-18
14 2 GOLD GROCERY 400 25-Jan-18
15 2 GOLD HOTEL 1500 5-Feb-19
16 2 GOLD GROCERY 400 11-Feb-19
17 2 GOLD RESTRAUNT 600 21-Mar-17
18 2 GOLD GROCERY 200 21-Mar-17
19 2 GOLD HOTEL 700 25-Mar-17
20 3 SILVER RESTRAUNT 1000 13-Jan-18
21 3 SILVER HOTEL 1000 16-Jan-18
22 3 SILVER GROCERY 500 18-Jan-18
23 3 SILVER GROCERY 300 23-Jan-18
24 3 SILVER GROCERY 400 28-Jan-18
25 3 SILVER HOTEL 500 5-Feb-19
26 3 SILVER GROCERY 400 11-Feb-19
27 3 SILVER HOTEL 600 25-Mar-17
28 3 SILVER GROCERY 200 29-Mar-17
29 3 SILVER RESTRAUNT 700 30-Mar-17
我正在努力低于数据框。
Card_No Card_Type D 2018_Sp 2018_N 2019_Sp 2019_N 2018_Sp
1 PLATINUM 70 3300 5 1500 3 1000
2 GOLD 72 5200 5 1900 2 1500
3 SILVER . 76 2900 5 900 2 1500
D =从第一次交易到最后一次交易的天数。
2018_Sp = 2018年的总支出。
2019_Sp = 2019年的总支出。
2017_Sp = 2017年的总支出。
2018_N = 2018年的交易数量。
2019_N = 2019年的交易数量。
答案 0 :(得分:1)
使用:
#convert to datetimes
df['Date'] = pd.to_datetime(df['Date'])
#sorting if necessary
df = df.sort_values(['Card_Number','Card_type', 'Date'])
#aggregate count and sum
df1 = (df.groupby(['Card_Number','Card_type', df['Date'].dt.year])['Amount']
.agg([('Sp','size'),('N','sum')])
.unstack()
.sort_index(axis=1, level=1))
#MultiIndex to columns
df1.columns = [f'{b}_{a}' for a, b in df1.columns]
#difference (different output, because different years)
s = df.groupby('Card_type').Date.apply(lambda x: (x.max()-x.min()).days).rename('D')
#join together
df1 = df1.join(s).reset_index()
print (df1)
Card_Number Card_type 2017_N 2017_Sp 2018_N 2018_Sp 2019_N 2019_Sp \
0 1 PLATINUM 1000 2 3300 5 1500 3
1 2 GOLD 1500 3 5200 5 1900 2
2 3 SILVER 1500 3 3200 5 900 2
D
0 706
1 692
2 688