我希望获得我的数据框(请参见数据框1),以按商品分组并汇总销售量,并按最早的销售日期排序(即,参见数据框2)
到目前为止,我的代码如下:
cusips_df = cusips_df.sort_values(by='settle_date', ascending=True)
cusips_df = cusips_df.groupby(['cusip'], as_index=False).agg({"principal":sum})
但这会产生下面的数据框(看起来像是按商品的字母顺序排列,而不是按最旧的日期排列)
答案 0 :(得分:0)
尝试一下
cusips_df['settle_date'] = pd.to_datetime(cusips_df['settle_date'], format='%d/%m/%Y')
cusips_df = cusips_df.groupby(['cusip'], as_index=False).agg({'principal':sum, 'settle_date': min}).sort_values('settle_date', ascending=True)[['cusip', 'principal']]
答案 1 :(得分:0)
您还可以在进行分组时汇总日期的最小值,然后按该最小值对分组进行排序(如果需要,可以从结果中删除日期列):
import numpy as np
import pandas as pd
d = { "Item" : ["Apple", "Apple", "Pear", "Pear", "Orange", "Orange"],
"Amount": [1000, 2000, 30, 40, 400, 50],
"DateSold": ["2018-02-01", "2018-06-01", "2018-01-01", "2018-02-20", "2018-04-15", "2018-04-30"]}
df = pd.DataFrame(data=d)
grouped_df = df.groupby(['Item'], as_index=False).agg({"Amount":np.sum, "DateSold":np.min})
grouped_and_sorted_df = grouped_df.sort_values('DateSold', ascending=True)[["Item","Amount"]]
在此示例中,df
为:
Item Amount DateSold
0 Apple 1000 2018-02-01
1 Apple 2000 2018-06-01
2 Pear 30 2018-01-01
3 Pear 40 2018-02-20
4 Orange 400 2018-04-15
5 Orange 50 2018-04-30
和grouped_and_sorted_df
将是:
Item Amount
2 Pear 70
0 Apple 3000
1 Orange 450