Question

我有一个DataFrame，用于存储基于每日的数据，如下所示：

   Date        Product Number  Description        Revenue
2010-01-04       4219-057       Product A        39.299999    
2010-01-04       4219-056       Product A        39.520000
2010-01-04       4219-100       Product B        39.520000
2010-01-04       4219-056       Product A        39.520000
2010-01-05       4219-059       Product A        39.520000
2010-01-05       4219-056       Product A        39.520000
2010-01-05       4219-056       Product B        39.520000
2010-02-08       4219-123       Product A        39.520000
2010-02-08       4219-345       Product A        39.520000
2010-02-08       4219-456       Product B        39.520000
2010-02-08       4219-567       Product C        39.520000
2010-02-08       4219-789       Product D        39.520000

（产品编号仅供参考）我打算将其合并到基于月度的数据中。像这样：

Date        Description        Revenue
2010-01-01    Product A        157.85000 (Sum of all Product A in Month 01)    
              Product B        79.040000
              Product C        00.000000
              Product D        00.000000
2010-02-01    Product A        39.299999 (Sum of all Product A in Month 02)   
              Product B        39.520000
              Product C        39.520000
              Product D        39.520000

问题是我每个月有500多种产品

我是python的新手，不知道如何实现它。目前，我正在使用

import pandas as pd
import numpy as np
import matplotlib
%matplotlib inline

data.groupby(['DATE','REVENUE']).sum().unstack()

但不将其与产品分组。

我该如何实现？

Answer 1

将“日期”转换为datetime，然后使用groupby和sum：

# Do this first, if necessary.
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

(df.groupby([pd.Grouper(key='Date', freq='MS'), 'Description'])['Revenue']
   .sum()
   .reset_index())

        Date Description     Revenue
0 2010-01-01           A  197.379999
1 2010-01-01           B   79.040000
2 2010-02-01           A   79.040000
3 2010-02-01           B   39.520000
4 2010-02-01           C   39.520000
5 2010-02-01           D   39.520000

频率“ MS”指定对日期进行分组并将偏移量设置为每个月的开始。

Answer 2

使用以下代码：

data.groupby（['Date'，'Description']）['Revenue']。sum（）

按月和其他列汇总每日数据

2 个答案: