熊猫列转换以获得累积美元金额

时间:2020-01-31 08:14:02

标签: python pandas dataframe math

我有如下数据集。我正在尝试按地区对它们进行分组,以找到每种产品的总金额。我想扩展计算范围以找到该地区的累计销售额以及总销售额。

数据集:

district      item       salesAmount
Arba          pen        10
Arba          pen        20
Arba          pencil     30
Arba          laptop     10000
Arba          coil       100
Arba          coil       200
Cebu          pen        100
Cebu          pen        20
Cebu          laptop     20000
Cebu          laptop     20000
Cebu          fruit      800
Cebu          oil        300

我可以按地区分组并找到以下每种产品的总金额

df.groupby(['district', 'item']).agg({'salesAmount': 'sum'}) 结果如下:

district      item       salesAmount
Arba          laptop     10000
Arba          coil       300
Arba          pencil     30
Arba          pen        30
Cebu          laptop     40000
Cebu          fruit      800
Cebu          oil        300
Cebu          pen        120

我想首先为每个地区从最高金额到最低金额订购。

然后添加累计和总销售额列,如下所示:(按地区)

district    item    salesAmount cumsalesAmount  totaldistrictAmount
Arba        laptop  10000       10000           10360
Arba        coil    300         10300           10360
Arba        pencil  30          10330           10360
Arba        pen     30          10360           10360
Cebu        laptop  40000       40000           41220
Cebu        fruit   800         40800           41220
Cebu        oil     300         41100           41220
Cebu        pen     120         41220           41220

谢谢。

1 个答案:

答案 0 :(得分:3)

每两列的第一个聚合sum

print (df.dtypes)
district       object
item           object
salesAmount     int64
dtype: object

df1 = df.groupby(['district', 'item'], as_index=False)['salesAmount'].sum()

或者:

df1 = df.groupby(['district', 'item'], as_index=False).agg({'salesAmount': 'sum'})
print (df1)
  district    item  salesAmount
0     Arba    coil          300
1     Arba  laptop        10000
2     Arba     pen           30
3     Arba  pencil           30
4     Cebu   fruit          800
5     Cebu  laptop        40000
6     Cebu     oil          300
7     Cebu     pen          120

然后使用DataFrame.sort_values对两列进行排序,使用GroupBy.cumsum,最后使用GroupBy.transformsum

df1 = df1.sort_values(['district','salesAmount'], ascending=[True, False])
df1['cumsalesAmount'] = df1.groupby('district')['salesAmount'].cumsum()
df1['totaldistrictAmount'] = df1.groupby('district')['salesAmount'].transform('sum')
 #alternative
 #df1['totaldistrictAmount'] = df1.groupby('district')['cumsalesAmount'].transform('last')
print (df1)
  district    item  salesAmount  cumsalesAmount  totaldistrictAmount
1     Arba  laptop        10000           10000                10360
0     Arba    coil          300           10300                10360
2     Arba     pen           30           10330                10360
3     Arba  pencil           30           10360                10360
5     Cebu  laptop        40000           40000                41220
4     Cebu   fruit          800           40800                41220
6     Cebu     oil          300           41100                41220
7     Cebu     pen          120           41220                41220