我有一个DataFrame,用于存储基于每日的数据,如下所示:
Date Product Number Description Revenue 2010-01-04 4219-057 Product A 39.299999 2010-01-04 4219-056 Product A 39.520000 2010-01-04 4219-100 Product B 39.520000 2010-01-04 4219-056 Product A 39.520000 2010-01-05 4219-059 Product A 39.520000 2010-01-05 4219-056 Product A 39.520000 2010-01-05 4219-056 Product B 39.520000 2010-02-08 4219-123 Product A 39.520000 2010-02-08 4219-345 Product A 39.520000 2010-02-08 4219-456 Product B 39.520000 2010-02-08 4219-567 Product C 39.520000 2010-02-08 4219-789 Product D 39.520000
(产品编号仅供参考) 我打算将其合并到基于月度的数据中。 像这样:
Date Description Revenue 2010-01-01 Product A 157.85000 (Sum of all Product A in Month 01) Product B 79.040000 Product C 00.000000 Product D 00.000000 2010-02-01 Product A 39.299999 (Sum of all Product A in Month 02) Product B 39.520000 Product C 39.520000 Product D 39.520000
问题是我每个月有500多种产品
我是python的新手,不知道如何实现它。目前,我正在使用
import pandas as pd
import numpy as np
import matplotlib
%matplotlib inline
data.groupby(['DATE','REVENUE']).sum().unstack()
但不将其与产品分组。
我该如何实现?
答案 0 :(得分:1)
将“日期”转换为datetime
,然后使用groupby
和sum
:
# Do this first, if necessary.
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
(df.groupby([pd.Grouper(key='Date', freq='MS'), 'Description'])['Revenue']
.sum()
.reset_index())
Date Description Revenue
0 2010-01-01 A 197.379999
1 2010-01-01 B 79.040000
2 2010-02-01 A 79.040000
3 2010-02-01 B 39.520000
4 2010-02-01 C 39.520000
5 2010-02-01 D 39.520000
频率“ MS”指定对日期进行分组并将偏移量设置为每个月的开始。
答案 1 :(得分:0)
使用以下代码:
data.groupby(['Date','Description'])['Revenue']。sum()