熊猫按一列求和,另一列对结果求和

时间:2019-03-22 16:46:53

标签: python pandas dataframe

我希望获得我的数据框(请参见数据框1),以按商品分组并汇总销售量,并按最早的销售日期排序(即,参见数据框2)

Dataframe 1

Dataframe 2

到目前为止,我的代码如下:

cusips_df = cusips_df.sort_values(by='settle_date', ascending=True)

cusips_df = cusips_df.groupby(['cusip'], as_index=False).agg({"principal":sum})

但这会产生下面的数据框(看起来像是按商品的字母顺序排列,而不是按最旧的日期排列)

enter image description here

2 个答案:

答案 0 :(得分:0)

尝试一下

cusips_df['settle_date'] = pd.to_datetime(cusips_df['settle_date'], format='%d/%m/%Y')
cusips_df = cusips_df.groupby(['cusip'], as_index=False).agg({'principal':sum, 'settle_date': min}).sort_values('settle_date', ascending=True)[['cusip', 'principal']]

答案 1 :(得分:0)

您还可以在进行分组时汇总日期的最小值,然后按该最小值对分组进行排序(如果需要,可以从结果中删除日期列):

import numpy as np
import pandas as pd

d = { "Item" : ["Apple", "Apple", "Pear", "Pear", "Orange", "Orange"],
      "Amount": [1000, 2000, 30, 40, 400, 50],
      "DateSold": ["2018-02-01", "2018-06-01", "2018-01-01", "2018-02-20", "2018-04-15", "2018-04-30"]}
df = pd.DataFrame(data=d)
grouped_df = df.groupby(['Item'], as_index=False).agg({"Amount":np.sum, "DateSold":np.min})
grouped_and_sorted_df = grouped_df.sort_values('DateSold', ascending=True)[["Item","Amount"]]

在此示例中,df为:

     Item  Amount    DateSold
0   Apple    1000  2018-02-01
1   Apple    2000  2018-06-01
2    Pear      30  2018-01-01
3    Pear      40  2018-02-20
4  Orange     400  2018-04-15
5  Orange      50  2018-04-30

grouped_and_sorted_df将是:

     Item  Amount
2    Pear      70
0   Apple    3000
1  Orange     450