从当前日期起按月重新采样数据帧

时间:2020-02-04 19:16:08

标签: python pandas

我正尝试将我的数据集修改为按月日期进行组织,以便稍后进行预测。我遇到的问题是,我按时间顺序(一月,二月等)按月组织了一次,但我希望从当前日期起每隔30天组织一次。最后,我希望我的代码能再使用5个最近的30天。

我的数据集如下:

data1 = pd.DataFrame({'Id' : ['001','001','001','001','001','001','001','001','001',
                              '002','002','002','002','002','002','002','002','002',],
                     'Date': ['2020-01-12', '2019-12-30', '2019-12-01','2019-11-01', '2019-08-04', '2019-08-04', '2019-08-01', '2019-07-20', '2019-06-04',
                               '2020-01-11', '2019-12-12', '2019-12-01','2019-12-01', '2019-09-10', '2019-08-10', '2019-08-01', '2019-06-20', '2019-06-01'],
                      'Quantity' :[18,5,6,8,12,14,16,19,20,           21,7,6,5,4,3,2,1,0]
                      })

我的代码如下:

data1['Date'] =pd.to_datetime(data1['Date'])
data1 = data1.groupby('Id').apply(lambda x: x.set_index('Date').resample('M').sum())
data1 = data1.groupby(level='Id').tail(5)

预期输出类似于(带有groupby(Id))

    Id        Date  Quantity
0  001  2020-02-04        18
1  001  2020-01-05         5
2  001  2019-12-06         6
3  001  2019-11-07         8
4  001  2019-11-08        12
5  002  2020-02-04        21
6  002  2020-01-05         7
7  002  2019-12-06        11
8  002  2019-11-07         0
9  002  2019-11-08         3

目前,这并没有任何实际意义,因为如果我要预测下个月的需求(比如说三月),实际上距今天已经有2个月了,尽管三月已经过去了一个月。

我希望我的问题很清楚,我花了很多时间试图弄清楚,我需要一些帮助。如果有人有暗示,我将非常感谢!

2 个答案:

答案 0 :(得分:1)

您可以使用pd.cut对从今天开始过去30天的时间段进行分组。

import pandas as pd

today = pd.to_datetime('today').normalize()
freq = '30D'  # Size of the bins

Nbin = (today - data1['Date'].min())//pd.Timedelta(freq) + 1  # Number of bins
bins = [today - n*pd.Timedelta(freq) for n in range(Nbin, -1, -1)]

data1.groupby(['Id', pd.cut(data1['Date'], bins=bins)]).sum()

Id  Date                              
001 (2019-06-09, 2019-07-09]       NaN
    (2019-07-09, 2019-08-08]      61.0
    (2019-08-08, 2019-09-07]       NaN
    (2019-09-07, 2019-10-07]       NaN
    (2019-10-07, 2019-11-06]       8.0
    (2019-11-06, 2019-12-06]       6.0
    (2019-12-06, 2020-01-05]       5.0
    (2020-01-05, 2020-02-04]      18.0
002 (2019-06-09, 2019-07-09]       1.0
    (2019-07-09, 2019-08-08]       2.0
    (2019-08-08, 2019-09-07]       3.0
    (2019-09-07, 2019-10-07]       4.0
    (2019-10-07, 2019-11-06]       NaN
    (2019-11-06, 2019-12-06]      11.0
    (2019-12-06, 2020-01-05]       7.0
    (2020-01-05, 2020-02-04]      21.0

答案 1 :(得分:1)

您可以使用pandas.Series.dt.days将日期转换为自今天以来的天数:

import numpy as np
import pandas as pd

today = pd.to_datetime('2019-05-13')

data1 = pd.DataFrame({'Id' : ['001','001','001','001','001','001','001','001','001',
                              '002','002','002','002','002','002','002','002','002',],
                     'Date': ['2020-01-12', '2019-12-30', '2019-12-01','2019-11-01', '2019-08-04', '2019-08-04', '2019-08-01', '2019-07-20', '2019-06-04',
                               '2020-01-11', '2019-12-12', '2019-12-01','2019-12-01', '2019-09-10', '2019-08-10', '2019-08-01', '2019-06-20', '2019-06-01'],
                      'Quantity' :[18,5,6,8,12,14,16,19,20,           21,7,6,5,4,3,2,1,0]
                      })

data1['Period from Today'] = (pd.to_datetime(data1['Date'])-today).dt.days // 30
data1 = data1.groupby(['Id', 'Period from Today'])

for key,group in data1:
    print(group)
    Id        Date  Quantity  Period from Today
8  001  2019-06-04        20                  0
    Id        Date  Quantity  Period from Today
4  001  2019-08-04        12                  2
5  001  2019-08-04        14                  2
6  001  2019-08-01        16                  2
7  001  2019-07-20        19                  2
    Id        Date  Quantity  Period from Today
3  001  2019-11-01         8                  5
    Id        Date  Quantity  Period from Today
2  001  2019-12-01         6                  6
    Id        Date  Quantity  Period from Today
1  001  2019-12-30         5                  7
    Id        Date  Quantity  Period from Today
0  001  2020-01-12        18                  8
     Id        Date  Quantity  Period from Today
17  002  2019-06-01         0                  0
     Id        Date  Quantity  Period from Today
16  002  2019-06-20         1                  1
     Id        Date  Quantity  Period from Today
14  002  2019-08-10         3                  2
15  002  2019-08-01         2                  2
     Id        Date  Quantity  Period from Today
13  002  2019-09-10         4                  4
     Id        Date  Quantity  Period from Today
11  002  2019-12-01         6                  6
12  002  2019-12-01         5                  6
     Id        Date  Quantity  Period from Today
10  002  2019-12-12         7                  7
    Id        Date  Quantity  Period from Today
9  002  2020-01-11        21                  8

我不清楚您希望如何组织数据,但希望能对您有所帮助。