Python-按季度汇总groupby中的多个dataframe列

时间:2020-06-23 20:15:16

标签: python pandas time-series pandas-groupby

我正在尝试基于季度,半年度和年度汇总数据。我有一个如下所示的数据框

输入

+----+----------+-----------+----------------+--------------+---------------------+
| ID | Name     | Date      | Submission_Amt | Approved_Amt | Observation         |
+----+----------+-----------+----------------+--------------+---------------------+
| 1  | John Doe | 1/1/2019  | 100            | 90           | Exceeding Limit     |
+----+----------+-----------+----------------+--------------+---------------------+
| 1  | John Doe | 2/1/2019  | 50             | 50           | Not Exceeding Limit |
+----+----------+-----------+----------------+--------------+---------------------+
| 1  | John Doe | 3/15/2019 | 120            | 90           | Exceeding Limit     |
+----+----------+-----------+----------------+--------------+---------------------+
| 1  | John Doe | 4/2/2019  | 150            | 90           | Exceeding Limit     |
+----+----------+-----------+----------------+--------------+---------------------+
| 1  | John Doe | 5/7/2019  | 80             | 80           | Not Exceeding Limit |
+----+----------+-----------+----------------+--------------+---------------------+
| 1  | John Doe | 6/7/2019  | 50             | 40           | Not Exceeding Limit |
+----+----------+-----------+----------------+--------------+---------------------+

预期结果

+----+----------+-------------------+----------------------+-----------------------+--------------------------------------+-------------------+
| ID | Name     | Period            | Total Submission Amt | Total Approved Amount | Count of Submissions exceeding Limit | Total Submissions |
+----+----------+-------------------+----------------------+-----------------------+--------------------------------------+-------------------+
| 1  | John Doe | First Half - 2019 | 560                  | 440                   | 3                                    | 5                 |
+----+----------+-------------------+----------------------+-----------------------+--------------------------------------+-------------------+
| 1  | John Doe | Q1-2019           | 420                  | 320                   | 3                                    | 4                 |
+----+----------+-------------------+----------------------+-----------------------+--------------------------------------+-------------------+

代码

到目前为止,这是我所取得的进步。

df=df.groupby(['ID','Name','Date','Observation']).agg({'Submission_Amt':'sum','Approved_Amt':'sum'}).reset_index()

我能够执行sun()聚合,但无法执行以下操作。

  • 按季度汇总-我尝试使用groupby(.....).resample('Q'),但没有用。
  • 合计并获取提交计数和提交计数 超过了限制。
  • 每年半年一次汇总。我认为,如果resample()有效,我可以将其更改为“ Y”。

1 个答案:

答案 0 :(得分:1)

resamplegroupby不兼容。您应该执行以下操作:

df.resample('Q', on='Date').agg({'Submission_Amt':'sum','Approved_Amt':'sum'}).reset_index()

要计算超出限制的提交数量,您可以在agg词典中传递一个函数:

df.resample('Q', on='Date').agg({
    'Submission_Amt':'sum',
    'Approved_Amt':'sum',
    'Observation': lambda x: x.value_counts()['Exceeding Limit']
}).reset_index()