按月和其他列汇总每日数据

时间:2019-05-24 04:36:27

标签: python pandas group-by pandas-groupby

我有一个DataFrame,用于存储基于每日的数据,如下所示:

   Date        Product Number  Description        Revenue
2010-01-04       4219-057       Product A        39.299999    
2010-01-04       4219-056       Product A        39.520000
2010-01-04       4219-100       Product B        39.520000
2010-01-04       4219-056       Product A        39.520000
2010-01-05       4219-059       Product A        39.520000
2010-01-05       4219-056       Product A        39.520000
2010-01-05       4219-056       Product B        39.520000
2010-02-08       4219-123       Product A        39.520000
2010-02-08       4219-345       Product A        39.520000
2010-02-08       4219-456       Product B        39.520000
2010-02-08       4219-567       Product C        39.520000
2010-02-08       4219-789       Product D        39.520000

(产品编号仅供参考) 我打算将其合并到基于月度的数据中。 像这样:

Date        Description        Revenue
2010-01-01    Product A        157.85000 (Sum of all Product A in Month 01)    
              Product B        79.040000
              Product C        00.000000
              Product D        00.000000
2010-02-01    Product A        39.299999 (Sum of all Product A in Month 02)   
              Product B        39.520000
              Product C        39.520000
              Product D        39.520000  

问题是我每个月有500多种产品

我是python的新手,不知道如何实现它。目前,我正在使用

import pandas as pd
import numpy as np
import matplotlib
%matplotlib inline

data.groupby(['DATE','REVENUE']).sum().unstack()

但不将其与产品分组。

我该如何实现?

2 个答案:

答案 0 :(得分:1)

将“日期”转换为datetime,然后使用groupbysum

# Do this first, if necessary.
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

(df.groupby([pd.Grouper(key='Date', freq='MS'), 'Description'])['Revenue']
   .sum()
   .reset_index())

        Date Description     Revenue
0 2010-01-01           A  197.379999
1 2010-01-01           B   79.040000
2 2010-02-01           A   79.040000
3 2010-02-01           B   39.520000
4 2010-02-01           C   39.520000
5 2010-02-01           D   39.520000

频率“ MS”指定对日期进行分组并将偏移量设置为每个月的开始。

答案 1 :(得分:0)

使用以下代码:

data.groupby(['Date','Description'])['Revenue']。sum()