我有如下数据集。我正在尝试按地区对它们进行分组,以找到每种产品的总金额。我想扩展计算范围以找到该地区的累计销售额以及总销售额。
数据集:
district item salesAmount
Arba pen 10
Arba pen 20
Arba pencil 30
Arba laptop 10000
Arba coil 100
Arba coil 200
Cebu pen 100
Cebu pen 20
Cebu laptop 20000
Cebu laptop 20000
Cebu fruit 800
Cebu oil 300
我可以按地区分组并找到以下每种产品的总金额
df.groupby(['district', 'item']).agg({'salesAmount': 'sum'})
结果如下:
district item salesAmount
Arba laptop 10000
Arba coil 300
Arba pencil 30
Arba pen 30
Cebu laptop 40000
Cebu fruit 800
Cebu oil 300
Cebu pen 120
我想首先为每个地区从最高金额到最低金额订购。
然后添加累计和总销售额列,如下所示:(按地区)
district item salesAmount cumsalesAmount totaldistrictAmount
Arba laptop 10000 10000 10360
Arba coil 300 10300 10360
Arba pencil 30 10330 10360
Arba pen 30 10360 10360
Cebu laptop 40000 40000 41220
Cebu fruit 800 40800 41220
Cebu oil 300 41100 41220
Cebu pen 120 41220 41220
谢谢。
答案 0 :(得分:3)
每两列的第一个聚合sum
:
print (df.dtypes)
district object
item object
salesAmount int64
dtype: object
df1 = df.groupby(['district', 'item'], as_index=False)['salesAmount'].sum()
或者:
df1 = df.groupby(['district', 'item'], as_index=False).agg({'salesAmount': 'sum'})
print (df1)
district item salesAmount
0 Arba coil 300
1 Arba laptop 10000
2 Arba pen 30
3 Arba pencil 30
4 Cebu fruit 800
5 Cebu laptop 40000
6 Cebu oil 300
7 Cebu pen 120
然后使用DataFrame.sort_values
对两列进行排序,使用GroupBy.cumsum
,最后使用GroupBy.transform
和sum
:
df1 = df1.sort_values(['district','salesAmount'], ascending=[True, False])
df1['cumsalesAmount'] = df1.groupby('district')['salesAmount'].cumsum()
df1['totaldistrictAmount'] = df1.groupby('district')['salesAmount'].transform('sum')
#alternative
#df1['totaldistrictAmount'] = df1.groupby('district')['cumsalesAmount'].transform('last')
print (df1)
district item salesAmount cumsalesAmount totaldistrictAmount
1 Arba laptop 10000 10000 10360
0 Arba coil 300 10300 10360
2 Arba pen 30 10330 10360
3 Arba pencil 30 10360 10360
5 Cebu laptop 40000 40000 41220
4 Cebu fruit 800 40800 41220
6 Cebu oil 300 41100 41220
7 Cebu pen 120 41220 41220