Python - Pandas数据框中日期的日历视图

时间:2017-08-23 13:48:59

标签: python pandas date numpy matrix

我需要从包含事件列表的pandas数据框中获取7天日历视图。以下是日期的样本。

DatetimeIndex(['2017-05-15', '2017-05-12', '2017-05-07', '2017-05-15',
               '2017-05-17', '2017-05-17', '2017-05-07', '2017-05-01',
               '2017-05-07', '2017-05-04', '2017-05-02', '2017-05-01',
               '2017-05-06', '2017-05-15', '2017-05-13', '2017-05-06',
               '2017-05-03', '2017-04-21', '2017-04-10', '2017-04-10',
               '2017-04-18', '2017-03-13', '2017-04-13', '2017-05-04',
               '2017-03-16', '2017-05-01', '2017-04-15', '2017-04-01',
               '2017-04-01', '2017-04-01'],
              dtype='datetime64[ns]', name=u'Date', freq=None)

我需要将上面的数据帧设置为n x 7矩阵。其中n是周数。列是(星期一,星期二,星期三,星期四,星期五,星期六和星期日)。

由于缺少日期,我列出了所有可能的日期。

min_date = min(df['Date'])
max_date = max(df['Date'])
idx = pd.date_range(min_date, max_date)

DatetimeIndex(['2017-04-01', '2017-04-02', '2017-04-03', '2017-04-04',
               '2017-04-05', '2017-04-06', '2017-04-07', '2017-04-08',
               '2017-04-09', '2017-04-10', '2017-04-11', '2017-04-12',
               '2017-04-13', '2017-04-14', '2017-04-15', '2017-04-16',
               '2017-04-17', '2017-04-18', '2017-04-19', '2017-04-20',
               '2017-04-21', '2017-04-22', '2017-04-23', '2017-04-24',
               '2017-04-25', '2017-04-26', '2017-04-27', '2017-04-28',
               '2017-04-29', '2017-04-30', '2017-05-01', '2017-05-02',
               '2017-05-03', '2017-05-04', '2017-05-05', '2017-05-06',
               '2017-05-07'],
              dtype='datetime64[ns]', freq='D')

然后使用以下行,我已经知道日期将在实际矩阵中的哪一列

week = idx.dayofweek
>> array([5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6,
       0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6])

是否存在将idx转换为n x 7矩阵的pythonic方法?这样我就可以检查原始数据框中的日期是否等于(i,j)处的日期,然后我可以填充矩阵。

1 个答案:

答案 0 :(得分:2)

如果您修改idx的初始创建,以确保它在周一和周一开始通过改变

在星期日结束
idx = pd.date_range(min_date, max_date)

idx = pd.date_range(min_date-dt.timedelta(days=min_date.weekday()),
                    max_date+dt.timedelta(days=6-max_date.weekday()))

您可以使用np.reshape将其重新排列为七列:

idx.values.reshape(len(idx)//7, 7)

如果需要,您可以将其转换回DataFrame。

使用您的示例,

date = pd.DatetimeIndex(['2017-05-15', '2017-05-12', '2017-05-07', '2017-05-15',
                         '2017-05-17', '2017-05-17', '2017-05-07', '2017-05-01',
                         '2017-05-07', '2017-05-04', '2017-05-02', '2017-05-01',
                         '2017-05-06', '2017-05-15', '2017-05-13', '2017-05-06',
                         '2017-05-03', '2017-04-21', '2017-04-10', '2017-04-10',
                         '2017-04-18', '2017-03-13', '2017-04-13', '2017-05-04',
                         '2017-03-16', '2017-05-01', '2017-04-15', '2017-04-01',
                         '2017-04-01', '2017-04-01'],
                        dtype='datetime64[ns]', name=u'Date', freq=None)

min_date = min(date)
max_date = max(date)
idx = pd.date_range(min_date-dt.timedelta(days=min_date.weekday()),
                    max_date+dt.timedelta(days=6-max_date.weekday()))
pd.DataFrame(idx.values.reshape(len(idx)//7, 7), columns=idx[:7].strftime('%A'))
Out[222]: 
      Monday    Tuesday  Wednesday   Thursday     Friday   Saturday     Sunday
0 2017-03-13 2017-03-14 2017-03-15 2017-03-16 2017-03-17 2017-03-18 2017-03-19
1 2017-03-20 2017-03-21 2017-03-22 2017-03-23 2017-03-24 2017-03-25 2017-03-26
2 2017-03-27 2017-03-28 2017-03-29 2017-03-30 2017-03-31 2017-04-01 2017-04-02
3 2017-04-03 2017-04-04 2017-04-05 2017-04-06 2017-04-07 2017-04-08 2017-04-09
4 2017-04-10 2017-04-11 2017-04-12 2017-04-13 2017-04-14 2017-04-15 2017-04-16
5 2017-04-17 2017-04-18 2017-04-19 2017-04-20 2017-04-21 2017-04-22 2017-04-23
6 2017-04-24 2017-04-25 2017-04-26 2017-04-27 2017-04-28 2017-04-29 2017-04-30
7 2017-05-01 2017-05-02 2017-05-03 2017-05-04 2017-05-05 2017-05-06 2017-05-07
8 2017-05-08 2017-05-09 2017-05-10 2017-05-11 2017-05-12 2017-05-13 2017-05-14
9 2017-05-15 2017-05-16 2017-05-17 2017-05-18 2017-05-19 2017-05-20 2017-05-21