我正在尝试基于季度,半年度和年度汇总数据。我有一个如下所示的数据框
输入
+----+----------+-----------+----------------+--------------+---------------------+
| ID | Name | Date | Submission_Amt | Approved_Amt | Observation |
+----+----------+-----------+----------------+--------------+---------------------+
| 1 | John Doe | 1/1/2019 | 100 | 90 | Exceeding Limit |
+----+----------+-----------+----------------+--------------+---------------------+
| 1 | John Doe | 2/1/2019 | 50 | 50 | Not Exceeding Limit |
+----+----------+-----------+----------------+--------------+---------------------+
| 1 | John Doe | 3/15/2019 | 120 | 90 | Exceeding Limit |
+----+----------+-----------+----------------+--------------+---------------------+
| 1 | John Doe | 4/2/2019 | 150 | 90 | Exceeding Limit |
+----+----------+-----------+----------------+--------------+---------------------+
| 1 | John Doe | 5/7/2019 | 80 | 80 | Not Exceeding Limit |
+----+----------+-----------+----------------+--------------+---------------------+
| 1 | John Doe | 6/7/2019 | 50 | 40 | Not Exceeding Limit |
+----+----------+-----------+----------------+--------------+---------------------+
预期结果
+----+----------+-------------------+----------------------+-----------------------+--------------------------------------+-------------------+
| ID | Name | Period | Total Submission Amt | Total Approved Amount | Count of Submissions exceeding Limit | Total Submissions |
+----+----------+-------------------+----------------------+-----------------------+--------------------------------------+-------------------+
| 1 | John Doe | First Half - 2019 | 560 | 440 | 3 | 5 |
+----+----------+-------------------+----------------------+-----------------------+--------------------------------------+-------------------+
| 1 | John Doe | Q1-2019 | 420 | 320 | 3 | 4 |
+----+----------+-------------------+----------------------+-----------------------+--------------------------------------+-------------------+
代码
到目前为止,这是我所取得的进步。
df=df.groupby(['ID','Name','Date','Observation']).agg({'Submission_Amt':'sum','Approved_Amt':'sum'}).reset_index()
我能够执行sun()聚合,但无法执行以下操作。
groupby(.....).resample('Q')
,但没有用。答案 0 :(得分:1)
resample
与groupby
不兼容。您应该执行以下操作:
df.resample('Q', on='Date').agg({'Submission_Amt':'sum','Approved_Amt':'sum'}).reset_index()
要计算超出限制的提交数量,您可以在agg
词典中传递一个函数:
df.resample('Q', on='Date').agg({
'Submission_Amt':'sum',
'Approved_Amt':'sum',
'Observation': lambda x: x.value_counts()['Exceeding Limit']
}).reset_index()